热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Howftracewasabletobricke1000enetworkcards(2008)

ByOctober21,2008atthee1000ehardwarecorruptionbug,thesourceoftheproblemwas,atbest,unclear.Problemswithinthedriveritselfseemedlikealikelyculprit,butitdidnottakelongforthosechasingthisproblemtorealizethattheyneededto
Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jonathan Corbet

October 21, 2008

When LWN last looked

at the e1000e hardware corruption bug, the source of the problem was, at best, unclear. Problems within the driver itself seemed like a likely culprit, but it did not take long for those chasing this problem to realize that they needed to look further afield. For a while, the X server came under scrutiny, as did a number of other system components. When the real problem was found, though, it turned out to be a surprise for everybody involved.

Tracking down intermittent problems is hard. When those problems result in the destruction of hardware, finding them is even harder. Even the most dedicated testers tend to balk when faced with the prospect of shipping their systems back to the manufacturer for repairs. So the task of finding this issue fell to Intel; engineers there locked themselves into a lab with a box full of e1000e adapters and set about bisecting the kernel history to identify the patch which caused the problem. Some time (and numerous fried adapters) later, the bisection process turned up an unlikely suspect: the ftrace tracing framework.

Developers working on tracing generally put a lot of effort into minimizing the impact of their code on system performance. Every last bit of runtime overhead is scrutinized and eliminated if at all possible. As a general rule, bricking the hardware is a level of overhead which goes well beyond the acceptable parameters. So the ftrace developers, once informed of the bisection result, put in some significant work of their own to figure out what was going on.

One of the features offered by ftrace is a simple function-call tracing operation; ftrace will output a line with the called function (and its caller) every time a function call is made. This tracing is accomplished by using the venerable profiling mechanism built into gcc (and most other Unix-based compilers). When code is compiled with the -pg option, the compiler will place a call to mcount() at the beginning of every function. The version of mcount() provided by ftrace then logs the relevant information on every call.

As noted above, though, tracing developers are concerned about overhead. On most systems, it is almost certain that, at any given time, nobody will be doing function call tracing. Having all those mcount() calls happening anyway would be a measurable drag on the system. So the ftrace hackers looked for a way to eliminate that overhead when it is not needed. A naive solution to this problem might look something like the following. Rather than put in an unconditional call to mcount() , get gcc to add code like this:

if (function_tracing_active)
        mcount();

But the kernel makes a lot of function calls, so even this version will have a noticeable overhead; it will also bloat the size of the kernel with all those tests. So the favored approach tends to be different: run-time patching. When function tracing is not being used, the kernel overwrites all of the mcount() calls with no-op instructions. As it happens, doing nothing is a highly optimized operation in contemporary processors, so the overhead of a few no-ops is nearly zero. Should somebody decide to turn function tracing on, the kernel can go through and patch all of those mcount() calls back in.

Run-time patching can solve the performance problem, but it introduces a new problem of its own. Changing the code underneath a running kernel is a dangerous thing to do; extreme caution is required. Care must be taken to ensure that the kernel is not running in the affected code at the time, processor caches must be invalidated, and so on. To be safe, it is necessary to get all other processors on the system to stop and wait while the patching is taking place. The end result is that patching the code is an expensive thing to do.

The way ftrace was coded was to patch out every mcount() call point as it was discovered through an actual call to mcount() . But, as noted above, run-time patching is very expensive, especially if it is done a single function at a time. So ftrace would make a list of mcount() call sites, then fix up a bunch of them later on. In that way, the cost of patching out the calls was significantly reduced.

The problem now is that things might have changed between the time when an mcount() call is noticed and when the kernel gets around to patching out the call. It would be very unfortunate if the kernel were to patch out an mcount() call which no longer existed in the expected place. To be absolutely sure that unrelated data was not being corrupted, the ftrace code used the cmpxchg operation to patch in the no-ops. cmpxchg atomically tests the contents of the target memory against the caller's idea of what is supposed to be there; if the two do not match, the target location will be left with its old value at the end of the operation. So the no-ops will only be written to memory if the current contents of that memory are a call to mcount() .

This all seems pretty safe, except that it fell down in one obscure, but important case. One obvious place where an mcount() call could go away is in loadable modules. This can happen if the module is unloaded, of course, but there is another important case too: any code marked as initialization code will be removed once initialization is complete. So a module's initialization function (and any other code marked __init ) could leave a dangling reference in the " mcount() calls to be patched out" list maintained by ftrace.

The final piece of this puzzle comes from this little fact: on 32-bit architectures, memory returned from vmalloc() and ioremap() share the same address space. Both functions create mappings to memory from the same range of addresses. Space for loadable modules is allocated with vmalloc() , so all module code is found within this shared address space. Meanwhile, the e1000e driver uses ioremap() to map the adapter's I/O memory and NVRAM into the kernel's address space. The end result is this fatal sequence of events:

  1. A module is loaded into the system. As part of the module's initialization, a number of mcount() calls are made; these call sites are noted for later patching.
  2. Module initialization completes, and the module's __init functions are removed from memory. The address space they occupied is freed up for future use.
  3. The e1000e driver maps its I/O memory and NVRAM into the address range recently occupied by the above-mentioned initialization code.
  4. Ftrace gets around to patching out the accumulated list of mcount() calls. But some of those "calls" are now, actually, I/O memory belonging to the e1000e device.

Remember that the ftrace code was very careful in its patching, using cmpxchg to avoid overwriting anything which is not an mcount() call. But, as Steven Rostedt noted in his summary of the problem :

The cmpxchg could have saved us in most cases (via luck) - but with ioremap-ed memory that was exactly the wrong thing to do - the results of cmpxchg on device memory are undefined. (and will likely result in a write)

The end result is a write to the wrong bit of I/O memory - and a destroyed device.

In hindsight, this bug is reasonably clear and understandable, but it's not at all surprising that it took a long time to find. One should note that there were, in fact, two different bugs here. One of them is ftrace's attempt to write to a stale pointer. But the other one was just as important: the e1000e driver should never have left its hardware configured in a mode where a single stray write could turn it into a brick. One never knows where things might go wrong; hardware should never be left in such a vulnerable state if it can be helped.

The good news is that both bugs have been fixed. The e1000e hardware was locked down before 2.6.27 was released, and the 2.6.27.1 update disables the dynamic ftrace feature. The ftrace code has been significantly rewritten for 2.6.28; it no longer records mcount() call sites on the fly, no longer uses cmpxchg , and, one hopes, is generally incapable of creating such mayhem again.

(

to post comments)


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 我们


推荐阅读
  • Python瓦片图下载、合并、绘图、标记的代码示例
    本文提供了Python瓦片图下载、合并、绘图、标记的代码示例,包括下载代码、多线程下载、图像处理等功能。通过参考geoserver,使用PIL、cv2、numpy、gdal、osr等库实现了瓦片图的下载、合并、绘图和标记功能。代码示例详细介绍了各个功能的实现方法,供读者参考使用。 ... [详细]
  • 本文介绍了Python爬虫技术基础篇面向对象高级编程(中)中的多重继承概念。通过继承,子类可以扩展父类的功能。文章以动物类层次的设计为例,讨论了按照不同分类方式设计类层次的复杂性和多重继承的优势。最后给出了哺乳动物和鸟类的设计示例,以及能跑、能飞、宠物类和非宠物类的增加对类数量的影响。 ... [详细]
  • Java太阳系小游戏分析和源码详解
    本文介绍了一个基于Java的太阳系小游戏的分析和源码详解。通过对面向对象的知识的学习和实践,作者实现了太阳系各行星绕太阳转的效果。文章详细介绍了游戏的设计思路和源码结构,包括工具类、常量、图片加载、面板等。通过这个小游戏的制作,读者可以巩固和应用所学的知识,如类的继承、方法的重载与重写、多态和封装等。 ... [详细]
  • 本文介绍了设计师伊振华受邀参与沈阳市智慧城市运行管理中心项目的整体设计,并以数字赋能和创新驱动高质量发展的理念,建设了集成、智慧、高效的一体化城市综合管理平台,促进了城市的数字化转型。该中心被称为当代城市的智能心脏,为沈阳市的智慧城市建设做出了重要贡献。 ... [详细]
  • Linux重启网络命令实例及关机和重启示例教程
    本文介绍了Linux系统中重启网络命令的实例,以及使用不同方式关机和重启系统的示例教程。包括使用图形界面和控制台访问系统的方法,以及使用shutdown命令进行系统关机和重启的句法和用法。 ... [详细]
  • 本文介绍了一个Java猜拳小游戏的代码,通过使用Scanner类获取用户输入的拳的数字,并随机生成计算机的拳,然后判断胜负。该游戏可以选择剪刀、石头、布三种拳,通过比较两者的拳来决定胜负。 ... [详细]
  • Commit1ced2a7433ea8937a1b260ea65d708f32ca7c95eintroduceda+Clonetraitboundtom ... [详细]
  • Java容器中的compareto方法排序原理解析
    本文从源码解析Java容器中的compareto方法的排序原理,讲解了在使用数组存储数据时的限制以及存储效率的问题。同时提到了Redis的五大数据结构和list、set等知识点,回忆了作者大学时代的Java学习经历。文章以作者做的思维导图作为目录,展示了整个讲解过程。 ... [详细]
  • 本文主要解析了Open judge C16H问题中涉及到的Magical Balls的快速幂和逆元算法,并给出了问题的解析和解决方法。详细介绍了问题的背景和规则,并给出了相应的算法解析和实现步骤。通过本文的解析,读者可以更好地理解和解决Open judge C16H问题中的Magical Balls部分。 ... [详细]
  • 本文讨论了一个关于cuowu类的问题,作者在使用cuowu类时遇到了错误提示和使用AdjustmentListener的问题。文章提供了16个解决方案,并给出了两个可能导致错误的原因。 ... [详细]
  • 本文介绍了P1651题目的描述和要求,以及计算能搭建的塔的最大高度的方法。通过动态规划和状压技术,将问题转化为求解差值的问题,并定义了相应的状态。最终得出了计算最大高度的解法。 ... [详细]
  • 不同优化算法的比较分析及实验验证
    本文介绍了神经网络优化中常用的优化方法,包括学习率调整和梯度估计修正,并通过实验验证了不同优化算法的效果。实验结果表明,Adam算法在综合考虑学习率调整和梯度估计修正方面表现较好。该研究对于优化神经网络的训练过程具有指导意义。 ... [详细]
  • 关键词:Golang, Cookie, 跟踪位置, net/http/cookiejar, package main, golang.org/x/net/publicsuffix, io/ioutil, log, net/http, net/http/cookiejar ... [详细]
  • 本文介绍了Windows操作系统的版本及其特点,包括Windows 7系统的6个版本:Starter、Home Basic、Home Premium、Professional、Enterprise、Ultimate。Windows操作系统是微软公司研发的一套操作系统,具有人机操作性优异、支持的应用软件较多、对硬件支持良好等优点。Windows 7 Starter是功能最少的版本,缺乏Aero特效功能,没有64位支持,最初设计不能同时运行三个以上应用程序。 ... [详细]
  • 先看官方文档TheJavaTutorialshavebeenwrittenforJDK8.Examplesandpracticesdescribedinthispagedontta ... [详细]
author-avatar
浪子得意_357
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有