Welcome to LWN.netThe following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net! |
Free trial subscriptionTry LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net. |
June 17, 2020
This article was contributed by Ben Hoyt
More and more web-site owners are concerned about the "all-seeing Google" tracking users as they browse around the web. Google Analytics (GA) is a full-featured web-analytics system that is available for free and, despite the privacy concerns, has become the de facto analytics tool for small and large web sites alike. However, in recent years, a growing number of alternatives are helping break Google's dominance. In this article we'll look at two of the lightweight open-source options, namely GoatCounter and Plausible. In a subsequent article, we'll look at a few of the larger tools.
GA is by far the biggest player here: BuiltWith shows that around 86% of the top 100,000 web sites use it. This figure goes down to 64% for the top one-million web sites. These figures have grown steadily for the past 15 years, since Google acquired Urchin and rebranded it as Google Analytics. In addition to privacy concerns, GA is more complex and feature-heavy than some web-site owners need; many of them just want to see how much traffic is going to the pages on their site, and where that traffic is coming from. So it's not surprising that a number of simpler, more open tools have taken off in the past few years.
It should be noted that LWN does use GA, though we are evaluating other choices. Those who turn off ads in their preferences will not be served with the GA code, however.
If asked what information Google tracks, a cynic might say, "everything". Part of the problem is that this isn't too far from the truth: Google tracks and stores a huge amount of information about users.
A 2018 paper [PDF] by Douglas Schmidt highlights the extent of Google's tracking, with location tracking on Android devices as one example:
Both Android and Chrome send data to Google even in the absence of any user interaction. Our experiments show that a dormant, stationary Android phone (with Chrome active in the background) communicated location information to Google 340 times during a 24-hour period, or at an average of 14 data communications per hour.
The paper distinguishes between "active" and "passive" tracking. Active tracking is when the user directly uses or logs into a Google service, such as performing a search, logging into Gmail, and so on. In addition to recording all of a user's search keywords, Google passively tracks users as they visit web sites that use GA and other Google publisher tools . Schmidt found that in an example "day in the life" scenario, " Google collected or inferred over two-thirds of the information through passive means ".
Schmidt's paper details how GA COOKIE tracking works, noting the difference between "1st-party" and "3rd-party" COOKIEs — the latter of which track users and their ad clicks across multiple sites:
While a GA COOKIE is specific to the particular domain of the website that user visits (called a "1st-party COOKIE"), a DoubleClick COOKIE is typically associated with a common 3rd-party domain (such as doubleclick.net). Google uses such COOKIEs to track user interaction across multiple 3rd-party websites.
When a user interacts with an advertisement on a website, DoubleClick's conversion tracking tools (e.g. Floodlight) places COOKIEs on a user’s computer and generates a unique client ID. Thereafter, if the user visits the advertised website, the stored COOKIE information gets accessed by the DoubleClick server, thereby recording the visit as a valid conversion.
Because such a large percentage of web sites use Google advertising products as well as GA, this has the effect that the company knows a large fraction of users' browsing history across many web sites, both popular sites and smaller "mom and pop" sites. In short, Google knows a lot about what you like, where you are, and what you buy.
Google does provide ways to turn off features like targeted advertising and location tracking, as well as to delete the personalized profile associated with an account. However, these features are almost entirely opt-in, and most users either don't know about them or just never bother to turn them off.
Of course, just switching away from GA won't eliminate all of these privacy issues (for example, it will do nothing to stop Android location tracking or search tracking), but it's one way to reduce the huge amount of data Google collects. In addition, for site owners that use a GA alternative, Google does not get a behind-the-scenes look at the site's traffic patterns — data which it could conceivably use in the future to build a competing tool.
LWN readers likely skew toward privacy-conscious: using Firefox instead of Google Chrome, turning on ad blockers, and so on. However, the users of the web sites they build may not be so privacy-conscious. For web-site developers, the analytics tools they choose can help respect their users' privacy and avoid Google knowing quite so much about their users' browsing patterns.
GoatCounter is one of the more recent web-analytics tools, launched in August 2019 . Created by Martin Tournoij, it has more of a "made by a single developer" feel than other tools; it's a little less slick-looking than some, but it is also developer-friendly and simple to set up.
The tool supports all of the basic analytics: page views and visits by URL, browser and operating system statistics, device screen sizes, locations, and referrer information. By default GoatCounter shows the last seven days with counts broken down by hour, but site owners can adjust the date span with simple controls.
GoatCounter has an unusual pricing model, with its source code licensed under the copyleft European Union Public License (EUPL). Companies can host the software themselves, or use GoatCounter's hosted version for a small fee (though the hosted version doesn't cost anything for " personal " projects). Tournoij has a lengthy article discussing why he chose the EUPL, noting:
I still don't really care what people do with my code, but I do care if my ability to make a living would be unreasonably impeded. Taking my MIT code and working full-time on enhancements that aren't sent back to me means my competitor has double the amount of people working on it: me (for free, from their perspective), and them. They will always have an advantage over me.
GoatCounter is written in Go, and uses vanilla Javascript in its UI for some lightweight interactivity. Javascript frameworks often get in the way of web accessibility , and GoatCounter's prioritization of accessibility (mentioned on its home page) struck a chord with "ctoth", who thanked Tournoij on Hacker News:
First time I've ever seen a comment about accessibility on the homepage of a mainstream product like this. As a blind developer this was just awesome, made me really feel like somebody out there is listening. Thank you for making this.
In addition to counting page views, GoatCounter tracks sessions using a hash of the browser's user agent and IP address to identify the client without storing any personal information. The salt used to generate these hashes is rotated every 4 hours with a sliding window. Tournoij has a detailed write-up about the technical aspects of session tracking, including a comparison with other solutions that have similar aims.
For web-site owners who prefer to avoid Javascript or who want analytics from users with Javascript disabled, GoatCounter supports non-Javascript tracking scheme. It uses a 1x1 transparent GIF image in an " " tag on the pages to be counted, though this approach will not record the referrer or screen size.
The hosted version of GoatCounter is easy to set up — taking about five minutes to set up an account and add the one line of Javascript to my web site. Analytics data started showing up within a few seconds. Even with the hosted version, the site owner fully owns the data, and can export the full dump or delete their account at any time.
The self-hosted version is also straightforward to set up using the Linux binaries or by building from source — it took me less than ten minutes to build from source and set it up locally with the default SQLite database configuration. In contrast to Plausible (discussed below), it was much lighter to install, didn't download anything, and started up almost instantly.
Plausible is another relatively new analytics tool that was launched in early 2019. Soon after launching, it switched to open source , with the code licensed under the permissive MIT license. The company's business model is to charge for the hosting, with pricing aimed at small businesses. In addition to making its source code available, Plausible is one of an increasing number of companies that has a publicly-visible roadmap for better transparency. It also posts informational content for potential customers on its blog .
Plausible is unique from a technology perspective, with its server code written in Elixir , which is a functional programming language that runs on the Erlang virtual machine. Its frontend UI uses a small amount of vanilla Javascript for the interactive parts, rather than a rendering framework like React. It also boasts one of the smallest analytics scripts, with plausible.js weighing in at 781 bytes (1.2KB uncompressed) at the time of this writing. GA's analytics.js , by comparison, is almost 18KB (46KB uncompressed), while GoatCounter's count.js is 2.3KB (6.3KB uncompressed). That size can make a meaningful difference since the scripts are loaded for each page on the site.
In terms of user interface, Plausible is definitely more polished than GoatCounter. It is fairly minimalist, though, perhaps even more so than GoatCounter, providing total visitor counts, page-view counts per path, referrer information, map location, and devices (broken down by screen size, browser, and operating system). The tool also provides a " bounce rate " metric, though the exact definition is unclear.
Plausible's home page states that it provides "100% data ownership", and it is possible to export the CSV data for a single chart (as well as delete a Plausible.io account). However, the data dump is significantly less useful than GoatCounter's full data dump, which includes detailed information for every event.
Self-hosting Plausible is possible (even plausible ), though as founder Uku Taht points out in the announcement of switching to open source:
It's worth noting that for now, there's no explicit support for self-hosting Plausible. The project is still evolving quickly and maintaining a self-hosted solution would slow product development down considerably. I would love to offer a self-hosted solution in the future once the product and the business are more stable.
That said, just a few weeks ago, Plausible added a document that describes an experimental way to self-host the system using Docker. Following those recommendations, I tried to use docker-compose to get it running locally. It was a little disconcerting how many Docker and npm packages it downloaded during the minutes-long installation process, and even when it was done, there was a hard-to-comprehend error with a PostgreSQL migration which prevented it from starting — the "experimental" label definitely fits.
There are also a couple of lightweight proprietary tools with a focus on privacy worth mentioning. Obviously, these don't have the advantages of open development or self-hosting, but still provide a low-cost way out of Google's data-collection net.
One is the minimalist Simple Analytics product, which is a cloud-based tool created by solo developer Adriaan van Rossum; it has a clean-looking interface with only the few key metrics, similar to Plausible. Another is Fathom , which was open source initially, but the current version is proprietary (although the company hopes to start maintaining the open-source code base again in the future).
The last few years have seen a number of good alternatives to Google Analytics, particularly for those who only need a few basic features. Many of the recent alternatives are both open source and privacy-conscious, which means there are fewer reasons for projects and businesses to continue using proprietary analytics systems.
For site owners who just need basic traffic numbers, GoatCounter and Plausible both seem like excellent options. Those who like more visual polish and documentation might prefer Plausible; those who value a more developer-friendly tool with easy self-hosting will probably prefer GoatCounter. We will soon be publishing a second article that looks at some heavier-weight GA alternatives, as well as tools that provide analytics from web-server logs.
Index entries for this article | |
---|---|
GuestArticles | Hoyt, Ben |
to post comments)
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 我们