热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

LightweightalternativestoGoogleAnalytics

June17,2020ThisarticlewascontributedbyBenHoytMoreandmoreweb-siteownersareconcernedaboutthe"all-seeingGoogle"trackingusersastheybrowsearo

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

June 17, 2020

This article was contributed by Ben Hoyt

More and more web-site owners are concerned about the "all-seeing Google" tracking users as they browse around the web. Google Analytics (GA) is a full-featured web-analytics system that is available for free and, despite the privacy concerns, has become the de facto analytics tool for small and large web sites alike. However, in recent years, a growing number of alternatives are helping break Google's dominance. In this article we'll look at two of the lightweight open-source options, namely GoatCounter and Plausible. In a subsequent article, we'll look at a few of the larger tools.

GA is by far the biggest player here: BuiltWith shows that around 86% of the top 100,000 web sites use it. This figure goes down to 64% for the top one-million web sites. These figures have grown steadily for the past 15 years, since Google acquired Urchin and rebranded it as Google Analytics. In addition to privacy concerns, GA is more complex and feature-heavy than some web-site owners need; many of them just want to see how much traffic is going to the pages on their site, and where that traffic is coming from. So it's not surprising that a number of simpler, more open tools have taken off in the past few years.

It should be noted that LWN does use GA, though we are evaluating other choices. Those who turn off ads in their preferences will not be served with the GA code, however.

What Google tracks, and why it's concerning

If asked what information Google tracks, a cynic might say, "everything". Part of the problem is that this isn't too far from the truth: Google tracks and stores a huge amount of information about users.

A 2018 paper [PDF] by Douglas Schmidt highlights the extent of Google's tracking, with location tracking on Android devices as one example:

Both Android and Chrome send data to Google even in the absence of any user interaction. Our experiments show that a dormant, stationary Android phone (with Chrome active in the background) communicated location information to Google 340 times during a 24-hour period, or at an average of 14 data communications per hour.

The paper distinguishes between "active" and "passive" tracking. Active tracking is when the user directly uses or logs into a Google service, such as performing a search, logging into Gmail, and so on. In addition to recording all of a user's search keywords, Google passively tracks users as they visit web sites that use GA and other Google publisher tools . Schmidt found that in an example "day in the life" scenario, " Google collected or inferred over two-thirds of the information through passive means ".

Schmidt's paper details how GA COOKIE tracking works, noting the difference between "1st-party" and "3rd-party" COOKIEs — the latter of which track users and their ad clicks across multiple sites:

While a GA COOKIE is specific to the particular domain of the website that user visits (called a "1st-party COOKIE"), a DoubleClick COOKIE is typically associated with a common 3rd-party domain (such as doubleclick.net). Google uses such COOKIEs to track user interaction across multiple 3rd-party websites.

When a user interacts with an advertisement on a website, DoubleClick's conversion tracking tools (e.g. Floodlight) places COOKIEs on a user’s computer and generates a unique client ID. Thereafter, if the user visits the advertised website, the stored COOKIE information gets accessed by the DoubleClick server, thereby recording the visit as a valid conversion.

Because such a large percentage of web sites use Google advertising products as well as GA, this has the effect that the company knows a large fraction of users' browsing history across many web sites, both popular sites and smaller "mom and pop" sites. In short, Google knows a lot about what you like, where you are, and what you buy.

Google does provide ways to turn off features like targeted advertising and location tracking, as well as to delete the personalized profile associated with an account. However, these features are almost entirely opt-in, and most users either don't know about them or just never bother to turn them off.

Of course, just switching away from GA won't eliminate all of these privacy issues (for example, it will do nothing to stop Android location tracking or search tracking), but it's one way to reduce the huge amount of data Google collects. In addition, for site owners that use a GA alternative, Google does not get a behind-the-scenes look at the site's traffic patterns — data which it could conceivably use in the future to build a competing tool.

LWN readers likely skew toward privacy-conscious: using Firefox instead of Google Chrome, turning on ad blockers, and so on. However, the users of the web sites they build may not be so privacy-conscious. For web-site developers, the analytics tools they choose can help respect their users' privacy and avoid Google knowing quite so much about their users' browsing patterns.

GoatCounter

GoatCounter is one of the more recent web-analytics tools, launched in August 2019 . Created by Martin Tournoij, it has more of a "made by a single developer" feel than other tools; it's a little less slick-looking than some, but it is also developer-friendly and simple to set up.

Lightweight alternatives to Google Analytics

The tool supports all of the basic analytics: page views and visits by URL, browser and operating system statistics, device screen sizes, locations, and referrer information. By default GoatCounter shows the last seven days with counts broken down by hour, but site owners can adjust the date span with simple controls.

GoatCounter has an unusual pricing model, with its source code licensed under the copyleft European Union Public License (EUPL). Companies can host the software themselves, or use GoatCounter's hosted version for a small fee (though the hosted version doesn't cost anything for " personal " projects). Tournoij has a lengthy article discussing why he chose the EUPL, noting:

I still don't really care what people do with my code, but I do care if my ability to make a living would be unreasonably impeded. Taking my MIT code and working full-time on enhancements that aren't sent back to me means my competitor has double the amount of people working on it: me (for free, from their perspective), and them. They will always have an advantage over me.

GoatCounter is written in Go, and uses vanilla Javascript in its UI for some lightweight interactivity. Javascript frameworks often get in the way of web accessibility , and GoatCounter's prioritization of accessibility (mentioned on its home page) struck a chord with "ctoth", who thanked Tournoij on Hacker News:

First time I've ever seen a comment about accessibility on the homepage of a mainstream product like this. As a blind developer this was just awesome, made me really feel like somebody out there is listening. Thank you for making this.

In addition to counting page views, GoatCounter tracks sessions using a hash of the browser's user agent and IP address to identify the client without storing any personal information. The salt used to generate these hashes is rotated every 4 hours with a sliding window. Tournoij has a detailed write-up about the technical aspects of session tracking, including a comparison with other solutions that have similar aims.

For web-site owners who prefer to avoid Javascript or who want analytics from users with Javascript disabled, GoatCounter supports non-Javascript tracking scheme. It uses a 1x1 transparent GIF image in an " " tag on the pages to be counted, though this approach will not record the referrer or screen size.

The hosted version of GoatCounter is easy to set up — taking about five minutes to set up an account and add the one line of Javascript to my web site. Analytics data started showing up within a few seconds. Even with the hosted version, the site owner fully owns the data, and can export the full dump or delete their account at any time.

The self-hosted version is also straightforward to set up using the Linux binaries or by building from source — it took me less than ten minutes to build from source and set it up locally with the default SQLite database configuration. In contrast to Plausible (discussed below), it was much lighter to install, didn't download anything, and started up almost instantly.

Plausible

Plausible is another relatively new analytics tool that was launched in early 2019. Soon after launching, it switched to open source , with the code licensed under the permissive MIT license. The company's business model is to charge for the hosting, with pricing aimed at small businesses. In addition to making its source code available, Plausible is one of an increasing number of companies that has a publicly-visible roadmap for better transparency. It also posts informational content for potential customers on its blog .

Lightweight alternatives to Google Analytics

Plausible is unique from a technology perspective, with its server code written in Elixir , which is a functional programming language that runs on the Erlang virtual machine. Its frontend UI uses a small amount of vanilla Javascript for the interactive parts, rather than a rendering framework like React. It also boasts one of the smallest analytics scripts, with plausible.js weighing in at 781 bytes (1.2KB uncompressed) at the time of this writing. GA's analytics.js , by comparison, is almost 18KB (46KB uncompressed), while GoatCounter's count.js is 2.3KB (6.3KB uncompressed). That size can make a meaningful difference since the scripts are loaded for each page on the site.

In terms of user interface, Plausible is definitely more polished than GoatCounter. It is fairly minimalist, though, perhaps even more so than GoatCounter, providing total visitor counts, page-view counts per path, referrer information, map location, and devices (broken down by screen size, browser, and operating system). The tool also provides a " bounce rate " metric, though the exact definition is unclear.

Plausible's home page states that it provides "100% data ownership", and it is possible to export the CSV data for a single chart (as well as delete a Plausible.io account). However, the data dump is significantly less useful than GoatCounter's full data dump, which includes detailed information for every event.

Self-hosting Plausible is possible (even plausible ), though as founder Uku Taht points out in the announcement of switching to open source:

It's worth noting that for now, there's no explicit support for self-hosting Plausible. The project is still evolving quickly and maintaining a self-hosted solution would slow product development down considerably. I would love to offer a self-hosted solution in the future once the product and the business are more stable.

That said, just a few weeks ago, Plausible added a document that describes an experimental way to self-host the system using Docker. Following those recommendations, I tried to use docker-compose to get it running locally. It was a little disconcerting how many Docker and npm packages it downloaded during the minutes-long installation process, and even when it was done, there was a hard-to-comprehend error with a PostgreSQL migration which prevented it from starting — the "experimental" label definitely fits.

Proprietary options, briefly

There are also a couple of lightweight proprietary tools with a focus on privacy worth mentioning. Obviously, these don't have the advantages of open development or self-hosting, but still provide a low-cost way out of Google's data-collection net.

One is the minimalist Simple Analytics product, which is a cloud-based tool created by solo developer Adriaan van Rossum; it has a clean-looking interface with only the few key metrics, similar to Plausible. Another is Fathom , which was open source initially, but the current version is proprietary (although the company hopes to start maintaining the open-source code base again in the future).

Summary

The last few years have seen a number of good alternatives to Google Analytics, particularly for those who only need a few basic features. Many of the recent alternatives are both open source and privacy-conscious, which means there are fewer reasons for projects and businesses to continue using proprietary analytics systems.

For site owners who just need basic traffic numbers, GoatCounter and Plausible both seem like excellent options. Those who like more visual polish and documentation might prefer Plausible; those who value a more developer-friendly tool with easy self-hosting will probably prefer GoatCounter. We will soon be publishing a second article that looks at some heavier-weight GA alternatives, as well as tools that provide analytics from web-server logs.

Index entries for this article
GuestArticles Hoyt, Ben
(

to post comments)


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 我们


推荐阅读
  • Metasploit攻击渗透实践
    本文介绍了Metasploit攻击渗透实践的内容和要求,包括主动攻击、针对浏览器和客户端的攻击,以及成功应用辅助模块的实践过程。其中涉及使用Hydra在不知道密码的情况下攻击metsploit2靶机获取密码,以及攻击浏览器中的tomcat服务的具体步骤。同时还讲解了爆破密码的方法和设置攻击目标主机的相关参数。 ... [详细]
  • Skywalking系列博客1安装单机版 Skywalking的快速安装方法
    本文介绍了如何快速安装单机版的Skywalking,包括下载、环境需求和端口检查等步骤。同时提供了百度盘下载地址和查询端口是否被占用的命令。 ... [详细]
  • 本文介绍了Python高级网络编程及TCP/IP协议簇的OSI七层模型。首先简单介绍了七层模型的各层及其封装解封装过程。然后讨论了程序开发中涉及到的网络通信内容,主要包括TCP协议、UDP协议和IPV4协议。最后还介绍了socket编程、聊天socket实现、远程执行命令、上传文件、socketserver及其源码分析等相关内容。 ... [详细]
  • 本文介绍了在rhel5.5操作系统下搭建网关+LAMP+postfix+dhcp的步骤和配置方法。通过配置dhcp自动分配ip、实现外网访问公司网站、内网收发邮件、内网上网以及SNAT转换等功能。详细介绍了安装dhcp和配置相关文件的步骤,并提供了相关的命令和配置示例。 ... [详细]
  • Linux重启网络命令实例及关机和重启示例教程
    本文介绍了Linux系统中重启网络命令的实例,以及使用不同方式关机和重启系统的示例教程。包括使用图形界面和控制台访问系统的方法,以及使用shutdown命令进行系统关机和重启的句法和用法。 ... [详细]
  • 生成对抗式网络GAN及其衍生CGAN、DCGAN、WGAN、LSGAN、BEGAN介绍
    一、GAN原理介绍学习GAN的第一篇论文当然由是IanGoodfellow于2014年发表的GenerativeAdversarialNetworks(论文下载链接arxiv:[h ... [详细]
  • baresip android编译、运行教程1语音通话
    本文介绍了如何在安卓平台上编译和运行baresip android,包括下载相关的sdk和ndk,修改ndk路径和输出目录,以及创建一个c++的安卓工程并将目录考到cpp下。详细步骤可参考给出的链接和文档。 ... [详细]
  • Webmin远程命令执行漏洞复现及防护方法
    本文介绍了Webmin远程命令执行漏洞CVE-2019-15107的漏洞详情和复现方法,同时提供了防护方法。漏洞存在于Webmin的找回密码页面中,攻击者无需权限即可注入命令并执行任意系统命令。文章还提供了相关参考链接和搭建靶场的步骤。此外,还指出了参考链接中的数据包不准确的问题,并解释了漏洞触发的条件。最后,给出了防护方法以避免受到该漏洞的攻击。 ... [详细]
  • 如何在服务器主机上实现文件共享的方法和工具
    本文介绍了在服务器主机上实现文件共享的方法和工具,包括Linux主机和Windows主机的文件传输方式,Web运维和FTP/SFTP客户端运维两种方式,以及使用WinSCP工具将文件上传至Linux云服务器的操作方法。此外,还介绍了在迁移过程中需要安装迁移Agent并输入目的端服务器所在华为云的AK/SK,以及主机迁移服务会收集的源端服务器信息。 ... [详细]
  • position属性absolute与relative的区别和用法详解
    本文详细解读了CSS中的position属性absolute和relative的区别和用法。通过解释绝对定位和相对定位的含义,以及配合TOP、RIGHT、BOTTOM、LEFT进行定位的方式,说明了它们的特性和能够实现的效果。同时指出了在网页居中时使用Absolute可能会出错的原因,即以浏览器左上角为原始点进行定位,不会随着分辨率的变化而变化位置。最后总结了一些使用这两个属性的技巧。 ... [详细]
  • 本文介绍了前端人员必须知道的三个问题,即前端都做哪些事、前端都需要哪些技术,以及前端的发展阶段。初级阶段包括HTML、CSS、JavaScript和jQuery的基础知识。进阶阶段涵盖了面向对象编程、响应式设计、Ajax、HTML5等新兴技术。高级阶段包括架构基础、模块化开发、预编译和前沿规范等内容。此外,还介绍了一些后端服务,如Node.js。 ... [详细]
  • 【shell】网络处理:判断IP是否在网段、两个ip是否同网段、IP地址范围、网段包含关系
    本文介绍了使用shell脚本判断IP是否在同一网段、判断IP地址是否在某个范围内、计算IP地址范围、判断网段之间的包含关系的方法和原理。通过对IP和掩码进行与计算,可以判断两个IP是否在同一网段。同时,还提供了一段用于验证IP地址的正则表达式和判断特殊IP地址的方法。 ... [详细]
  • 本文介绍了绕过WAF的XSS检测机制的方法,包括确定payload结构、测试和混淆。同时提出了一种构建XSS payload的方法,该payload与安全机制使用的正则表达式不匹配。通过清理用户输入、转义输出、使用文档对象模型(DOM)接收器和源、实施适当的跨域资源共享(CORS)策略和其他安全策略,可以有效阻止XSS漏洞。但是,WAF或自定义过滤器仍然被广泛使用来增加安全性。本文的方法可以绕过这种安全机制,构建与正则表达式不匹配的XSS payload。 ... [详细]
  • JavaScript和HTML之间的交互是经由过程事宜完成的。事宜:文档或浏览器窗口中发作的一些特定的交互霎时。能够运用侦听器(或处置惩罚递次来预订事宜),以便事宜发作时实行相应的 ... [详细]
  • React基础篇一 - JSX语法扩展与使用
    本文介绍了React基础篇一中的JSX语法扩展与使用。JSX是一种JavaScript的语法扩展,用于描述React中的用户界面。文章详细介绍了在JSX中使用表达式的方法,并给出了一个示例代码。最后,提到了JSX在编译后会被转化为普通的JavaScript对象。 ... [详细]
author-avatar
jackdaosen900
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有