热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

LightweightalternativestoGoogleAnalytics

June17,2020ThisarticlewascontributedbyBenHoytMoreandmoreweb-siteownersareconcernedaboutthe"all-seeingGoogle"trackingusersastheybrowsearo

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

June 17, 2020

This article was contributed by Ben Hoyt

More and more web-site owners are concerned about the "all-seeing Google" tracking users as they browse around the web. Google Analytics (GA) is a full-featured web-analytics system that is available for free and, despite the privacy concerns, has become the de facto analytics tool for small and large web sites alike. However, in recent years, a growing number of alternatives are helping break Google's dominance. In this article we'll look at two of the lightweight open-source options, namely GoatCounter and Plausible. In a subsequent article, we'll look at a few of the larger tools.

GA is by far the biggest player here: BuiltWith shows that around 86% of the top 100,000 web sites use it. This figure goes down to 64% for the top one-million web sites. These figures have grown steadily for the past 15 years, since Google acquired Urchin and rebranded it as Google Analytics. In addition to privacy concerns, GA is more complex and feature-heavy than some web-site owners need; many of them just want to see how much traffic is going to the pages on their site, and where that traffic is coming from. So it's not surprising that a number of simpler, more open tools have taken off in the past few years.

It should be noted that LWN does use GA, though we are evaluating other choices. Those who turn off ads in their preferences will not be served with the GA code, however.

What Google tracks, and why it's concerning

If asked what information Google tracks, a cynic might say, "everything". Part of the problem is that this isn't too far from the truth: Google tracks and stores a huge amount of information about users.

A 2018 paper [PDF] by Douglas Schmidt highlights the extent of Google's tracking, with location tracking on Android devices as one example:

Both Android and Chrome send data to Google even in the absence of any user interaction. Our experiments show that a dormant, stationary Android phone (with Chrome active in the background) communicated location information to Google 340 times during a 24-hour period, or at an average of 14 data communications per hour.

The paper distinguishes between "active" and "passive" tracking. Active tracking is when the user directly uses or logs into a Google service, such as performing a search, logging into Gmail, and so on. In addition to recording all of a user's search keywords, Google passively tracks users as they visit web sites that use GA and other Google publisher tools . Schmidt found that in an example "day in the life" scenario, " Google collected or inferred over two-thirds of the information through passive means ".

Schmidt's paper details how GA COOKIE tracking works, noting the difference between "1st-party" and "3rd-party" COOKIEs — the latter of which track users and their ad clicks across multiple sites:

While a GA COOKIE is specific to the particular domain of the website that user visits (called a "1st-party COOKIE"), a DoubleClick COOKIE is typically associated with a common 3rd-party domain (such as doubleclick.net). Google uses such COOKIEs to track user interaction across multiple 3rd-party websites.

When a user interacts with an advertisement on a website, DoubleClick's conversion tracking tools (e.g. Floodlight) places COOKIEs on a user’s computer and generates a unique client ID. Thereafter, if the user visits the advertised website, the stored COOKIE information gets accessed by the DoubleClick server, thereby recording the visit as a valid conversion.

Because such a large percentage of web sites use Google advertising products as well as GA, this has the effect that the company knows a large fraction of users' browsing history across many web sites, both popular sites and smaller "mom and pop" sites. In short, Google knows a lot about what you like, where you are, and what you buy.

Google does provide ways to turn off features like targeted advertising and location tracking, as well as to delete the personalized profile associated with an account. However, these features are almost entirely opt-in, and most users either don't know about them or just never bother to turn them off.

Of course, just switching away from GA won't eliminate all of these privacy issues (for example, it will do nothing to stop Android location tracking or search tracking), but it's one way to reduce the huge amount of data Google collects. In addition, for site owners that use a GA alternative, Google does not get a behind-the-scenes look at the site's traffic patterns — data which it could conceivably use in the future to build a competing tool.

LWN readers likely skew toward privacy-conscious: using Firefox instead of Google Chrome, turning on ad blockers, and so on. However, the users of the web sites they build may not be so privacy-conscious. For web-site developers, the analytics tools they choose can help respect their users' privacy and avoid Google knowing quite so much about their users' browsing patterns.

GoatCounter

GoatCounter is one of the more recent web-analytics tools, launched in August 2019 . Created by Martin Tournoij, it has more of a "made by a single developer" feel than other tools; it's a little less slick-looking than some, but it is also developer-friendly and simple to set up.

Lightweight alternatives to Google Analytics

The tool supports all of the basic analytics: page views and visits by URL, browser and operating system statistics, device screen sizes, locations, and referrer information. By default GoatCounter shows the last seven days with counts broken down by hour, but site owners can adjust the date span with simple controls.

GoatCounter has an unusual pricing model, with its source code licensed under the copyleft European Union Public License (EUPL). Companies can host the software themselves, or use GoatCounter's hosted version for a small fee (though the hosted version doesn't cost anything for " personal " projects). Tournoij has a lengthy article discussing why he chose the EUPL, noting:

I still don't really care what people do with my code, but I do care if my ability to make a living would be unreasonably impeded. Taking my MIT code and working full-time on enhancements that aren't sent back to me means my competitor has double the amount of people working on it: me (for free, from their perspective), and them. They will always have an advantage over me.

GoatCounter is written in Go, and uses vanilla Javascript in its UI for some lightweight interactivity. Javascript frameworks often get in the way of web accessibility , and GoatCounter's prioritization of accessibility (mentioned on its home page) struck a chord with "ctoth", who thanked Tournoij on Hacker News:

First time I've ever seen a comment about accessibility on the homepage of a mainstream product like this. As a blind developer this was just awesome, made me really feel like somebody out there is listening. Thank you for making this.

In addition to counting page views, GoatCounter tracks sessions using a hash of the browser's user agent and IP address to identify the client without storing any personal information. The salt used to generate these hashes is rotated every 4 hours with a sliding window. Tournoij has a detailed write-up about the technical aspects of session tracking, including a comparison with other solutions that have similar aims.

For web-site owners who prefer to avoid Javascript or who want analytics from users with Javascript disabled, GoatCounter supports non-Javascript tracking scheme. It uses a 1x1 transparent GIF image in an " " tag on the pages to be counted, though this approach will not record the referrer or screen size.

The hosted version of GoatCounter is easy to set up — taking about five minutes to set up an account and add the one line of Javascript to my web site. Analytics data started showing up within a few seconds. Even with the hosted version, the site owner fully owns the data, and can export the full dump or delete their account at any time.

The self-hosted version is also straightforward to set up using the Linux binaries or by building from source — it took me less than ten minutes to build from source and set it up locally with the default SQLite database configuration. In contrast to Plausible (discussed below), it was much lighter to install, didn't download anything, and started up almost instantly.

Plausible

Plausible is another relatively new analytics tool that was launched in early 2019. Soon after launching, it switched to open source , with the code licensed under the permissive MIT license. The company's business model is to charge for the hosting, with pricing aimed at small businesses. In addition to making its source code available, Plausible is one of an increasing number of companies that has a publicly-visible roadmap for better transparency. It also posts informational content for potential customers on its blog .

Lightweight alternatives to Google Analytics

Plausible is unique from a technology perspective, with its server code written in Elixir , which is a functional programming language that runs on the Erlang virtual machine. Its frontend UI uses a small amount of vanilla Javascript for the interactive parts, rather than a rendering framework like React. It also boasts one of the smallest analytics scripts, with plausible.js weighing in at 781 bytes (1.2KB uncompressed) at the time of this writing. GA's analytics.js , by comparison, is almost 18KB (46KB uncompressed), while GoatCounter's count.js is 2.3KB (6.3KB uncompressed). That size can make a meaningful difference since the scripts are loaded for each page on the site.

In terms of user interface, Plausible is definitely more polished than GoatCounter. It is fairly minimalist, though, perhaps even more so than GoatCounter, providing total visitor counts, page-view counts per path, referrer information, map location, and devices (broken down by screen size, browser, and operating system). The tool also provides a " bounce rate " metric, though the exact definition is unclear.

Plausible's home page states that it provides "100% data ownership", and it is possible to export the CSV data for a single chart (as well as delete a Plausible.io account). However, the data dump is significantly less useful than GoatCounter's full data dump, which includes detailed information for every event.

Self-hosting Plausible is possible (even plausible ), though as founder Uku Taht points out in the announcement of switching to open source:

It's worth noting that for now, there's no explicit support for self-hosting Plausible. The project is still evolving quickly and maintaining a self-hosted solution would slow product development down considerably. I would love to offer a self-hosted solution in the future once the product and the business are more stable.

That said, just a few weeks ago, Plausible added a document that describes an experimental way to self-host the system using Docker. Following those recommendations, I tried to use docker-compose to get it running locally. It was a little disconcerting how many Docker and npm packages it downloaded during the minutes-long installation process, and even when it was done, there was a hard-to-comprehend error with a PostgreSQL migration which prevented it from starting — the "experimental" label definitely fits.

Proprietary options, briefly

There are also a couple of lightweight proprietary tools with a focus on privacy worth mentioning. Obviously, these don't have the advantages of open development or self-hosting, but still provide a low-cost way out of Google's data-collection net.

One is the minimalist Simple Analytics product, which is a cloud-based tool created by solo developer Adriaan van Rossum; it has a clean-looking interface with only the few key metrics, similar to Plausible. Another is Fathom , which was open source initially, but the current version is proprietary (although the company hopes to start maintaining the open-source code base again in the future).

Summary

The last few years have seen a number of good alternatives to Google Analytics, particularly for those who only need a few basic features. Many of the recent alternatives are both open source and privacy-conscious, which means there are fewer reasons for projects and businesses to continue using proprietary analytics systems.

For site owners who just need basic traffic numbers, GoatCounter and Plausible both seem like excellent options. Those who like more visual polish and documentation might prefer Plausible; those who value a more developer-friendly tool with easy self-hosting will probably prefer GoatCounter. We will soon be publishing a second article that looks at some heavier-weight GA alternatives, as well as tools that provide analytics from web-server logs.

Index entries for this article
GuestArticles Hoyt, Ben
(

to post comments)


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 我们


推荐阅读
  • 在对WordPress Duplicator插件0.4.4版本的安全评估中,发现其存在跨站脚本(XSS)攻击漏洞。此漏洞可能被利用进行恶意操作,建议用户及时更新至最新版本以确保系统安全。测试方法仅限于安全研究和教学目的,使用时需自行承担风险。漏洞编号:HTB23162。 ... [详细]
  • 浏览器作为我们日常不可或缺的软件工具,其背后的运作机制却鲜为人知。本文将深入探讨浏览器内核及其版本的演变历程,帮助读者更好地理解这一关键技术组件,揭示其内部运作的奥秘。 ... [详细]
  • 【问题】在Android开发中,当为EditText添加TextWatcher并实现onTextChanged方法时,会遇到一个问题:即使只对EditText进行一次修改(例如使用删除键删除一个字符),该方法也会被频繁触发。这不仅影响性能,还可能导致逻辑错误。本文将探讨这一问题的原因,并提供有效的解决方案,包括使用Handler或计时器来限制方法的调用频率,以及通过自定义TextWatcher来优化事件处理,从而提高应用的稳定性和用户体验。 ... [详细]
  • NOIP2000的单词接龙问题与常见的成语接龙游戏有异曲同工之妙。题目要求在给定的一组单词中,从指定的起始字母开始,构建最长的“单词链”。每个单词在链中最多可出现两次。本文将详细解析该题目的解法,并分享学习过程中的心得体会。 ... [详细]
  • 计算机视觉领域介绍 | 自然语言驱动的跨模态行人重识别前沿技术综述(上篇)
    本文介绍了计算机视觉领域的最新进展,特别是自然语言驱动的跨模态行人重识别技术。上篇内容详细探讨了该领域的基础理论、关键技术及当前的研究热点,为读者提供了全面的概述。 ... [详细]
  • 本文介绍如何在 Android 中自定义加载对话框 CustomProgressDialog,包括自定义 View 类和 XML 布局文件的详细步骤。 ... [详细]
  • 在分析Android的Audio系统时,我们对mpAudioPolicy->get_input进行了详细探讨,发现其背后涉及的机制相当复杂。本文将详细介绍这一过程及其背后的实现细节。 ... [详细]
  • 解决Bootstrap DataTable Ajax请求重复问题
    在最近的一个项目中,我们使用了JQuery DataTable进行数据展示,虽然使用起来非常方便,但在测试过程中发现了一个问题:当查询条件改变时,有时查询结果的数据不正确。通过FireBug调试发现,点击搜索按钮时,会发送两次Ajax请求,一次是原条件的请求,一次是新条件的请求。 ... [详细]
  • 第二十五天接口、多态
    1.java是面向对象的语言。设计模式:接口接口类是从java里衍生出来的,不是python原生支持的主要用于继承里多继承抽象类是python原生支持的主要用于继承里的单继承但是接 ... [详细]
  • PTArchiver工作原理详解与应用分析
    PTArchiver工作原理及其应用分析本文详细解析了PTArchiver的工作机制,探讨了其在数据归档和管理中的应用。PTArchiver通过高效的压缩算法和灵活的存储策略,实现了对大规模数据的高效管理和长期保存。文章还介绍了其在企业级数据备份、历史数据迁移等场景中的实际应用案例,为用户提供了实用的操作建议和技术支持。 ... [详细]
  • Webdriver中元素定位的多种技术与策略
    在Webdriver中,元素定位是自动化测试的关键环节。本文详细介绍了8种常用的元素定位技术与策略,包括ID、名称、标签名、类名、链接文本、部分链接文本、XPath和CSS选择器。每种方法都有其独特的优势和适用场景,通过合理选择和组合使用,可以显著提高测试脚本的稳定性和效率。此外,文章还探讨了在复杂页面结构中如何灵活运用这些定位技术,以应对各种挑战。 ... [详细]
  • Spring框架中枚举参数的正确使用方法与技巧
    本文详细阐述了在Spring Boot框架中正确使用枚举参数的方法与技巧,旨在帮助开发者更高效地掌握和应用枚举类型的数据传递,适合对Spring Boot感兴趣的读者深入学习。 ... [详细]
  • V8不仅是一款著名的八缸发动机,广泛应用于道奇Charger、宾利Continental GT和BossHoss摩托车中。自2008年以来,作为Chromium项目的一部分,V8 JavaScript引擎在性能优化和技术创新方面取得了显著进展。该引擎通过先进的编译技术和高效的垃圾回收机制,显著提升了JavaScript的执行效率,为现代Web应用提供了强大的支持。持续的优化和创新使得V8在处理复杂计算和大规模数据时表现更加出色,成为众多开发者和企业的首选。 ... [详细]
  • C++ 开发实战:实用技巧与经验分享
    C++ 开发实战:实用技巧与经验分享 ... [详细]
  • 技术日志:使用 Ruby 爬虫抓取拉勾网职位数据并生成词云分析报告
    技术日志:使用 Ruby 爬虫抓取拉勾网职位数据并生成词云分析报告 ... [详细]
author-avatar
jackdaosen900
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有