Golang性能调优(go-torch,gotoolpprof)

作者：至上励合_安儿_466 | 来源：互联网 | 2023-05-19 07:41

Go语言已经为开发者内置配套了很多性能调优监控的好工具和方法，这大大提升了我们profile分析的效率。此外本文还将重点介绍和推荐uber开源的go-torch，其生成的火焰图更方便更直观的帮我

Go语言已经为开发者内置配套了很多性能调优监控的好工具和方法，这大大提升了我们profile分析的效率。此外本文还将重点介绍和推荐uber开源的go-torch，其生成的火焰图更方便更直观的帮我们进行性能调优。我也是在实际一次的性能调优中，接触到go-torch，非常棒。

go tool pprof简介

Golang内置cpu, mem, block profiler

Go强大之处是它已经在语言层面集成了profile采样工具,并且允许我们在程序的运行时使用它们，使用Go的profiler我们能获取以下的样本信息：

cpu profiles
mem profiles
block profile

Golang常见的profiling使用场景

基准测试文件：例如使用命令go test . -bench . -cpuprofile prof.cpu生成采样文件后，再通过命令 go tool pprof [binary] prof.cpu 来进行分析。
import _ net/http/pprof：如果我们的应用是一个web服务，我们可以在http服务启动的代码文件(eg: main.go)添加 import _ net/http/pprof，这样我们的服务便能自动开启profile功能，有助于我们直接分析采样结果。
通过在代码里面调用runtime.StartCPUProfile或者runtime.WriteHeapProfile等内置方法，即可方便的进行数据采样。

更多Golang Profiling的使用，推荐https://blog.golang.org/profiling-go-programs。

go tool pprof的使用方法

go tool pprof的参数很多，不做详细介绍，自己help看看。在这里，我主要用到的命令为：
go tool pprof --seconds 25 http://localhost:9090/debug/pprof/profile
命令中，设置了25s的采样时间，当25s采样结束后，就生成了我们想要的profile文件，然后在pprof交互命令行中输入web，从浏览器中打开，就能看到对应的整个调用链的性能树形图。

root@garnett:~/# go tool pprof -h
usage: pprof [options] [binary]  ...
Output format (only set one):
  -callgrind Outputs a graph in callgrind format
  -disasm=p Output annotated assembly for functions matching regexp or address
  -dot Outputs a graph in DOT format
  -eog Visualize graph through eog
  -evince Visualize graph through evince
  -gif Outputs a graph image in GIF format
  -gv Visualize graph through gv
  -list=p Output annotated source for functions matching regexp
  -pdf Outputs a graph in PDF format
  -peek=p Output callers/callees of functions matching regexp
  -png Outputs a graph image in PNG format
  -proto Outputs the profile in compressed protobuf format
  -ps Outputs a graph in PS format
  -raw Outputs a text representation of the raw profile
  -svg Outputs a graph in SVG format
  -tags Outputs all tags in the profile
  -text Outputs top entries in text form
  -top Outputs top entries in text form
  -tree Outputs a text rendering of call graph
  -web Visualize graph through web browser
  -weblist=p Output annotated source in HTML for functions matching regexp or address
Output file parameters (for file-based output formats):
  -output=f Generate output on file f (stdout by default)
Output granularity (only set one):
  -functions Report at function level [default]
  -files Report at source file level
  -lines Report at source line level
  -addresses Report at address level
Comparison options:
  -base  Show delta from this profile
  -drop_negative Ignore negative differences
Sorting options:
  -cum Sort by cumulative data

Dynamic profile options:
  -secOnds=N Length of time for dynamic profiles
Profile trimming options:
  -nodecount=N Max number of nodes to show
  -nodefraction=f Hide nodes below *total
  -edgefraction=f Hide edges below *total
Sample value selection option (by index):
  -sample_index Index of sample value to display
  -mean Average sample value over first value
Sample value selection option (for heap profiles):
  -inuse_space Display in-use memory size
  -inuse_objects Display in-use object counts
  -alloc_space Display allocated memory size
  -alloc_objects Display allocated object counts
Sample value selection option (for contention profiles):
  -total_delay Display total delay at each region
  -contentions Display number of delays at each region
  -mean_delay Display mean delay at each region
Filtering options:
  -runtime Show runtime call frames in memory profiles
  -focus=r Restricts to paths going through a node matching regexp
  -ignore=r Skips paths going through any nodes matching regexp
  -tagfocus=r Restrict to samples tagged with key:value matching regexp
                    Restrict to samples with numeric tags in range (eg "32kb:1mb")
  -tagignore=r Discard samples tagged with key:value matching regexp
                    Avoid samples with numeric tags in range (eg "1mb:")
Miscellaneous:
  -call_tree Generate a context-sensitive call tree
  -unit=u Convert all samples to unit u for display
  -divide_by=f Scale all samples by dividing them by f
  -buildid=id Override build id for main binary in profile
  -tools=path Search path for object-level tools
  -help This message
Environment Variables:
   PPROF_TMPDIR       Location for saved profiles (default $HOME/pprof)
   PPROF_TOOLS        Search path for object-level tools
   PPROF_BINARY_PATH  Search path for local binary files
                      default: $HOME/pprof/binaries
                      finds binaries by $name and $buildid/$name

go-torch简介

go-torch是Uber公司开源的一款针对Golang程序的火焰图生成工具，能收集 stack traces,并把它们整理成火焰图，直观地程序给开发人员。go-torch是基于使用BrendanGregg创建的火焰图工具生成直观的图像，很方便地分析Go的各个方法所占用的CPU的时间。

go-torch的具体使用参加如下help信息，在这里，我们主要使用到-u和-t参数:
go-torch -u http://localhost:9090 -t 30

root@garnett:~/# go-torch -h
Usage:
  go-torch [options] [binary] 

pprof Options:
  -u, --url= Base URL of your Go program (default: http://localhost:8080)
  -s, --suffix= URL path of pprof profile (default: /debug/pprof/profile)
  -b, --binaryinput= File path of previously saved binary profile. (binary profile is anything accepted by https://golang.org/cmd/pprof)
      --binaryname= File path of the binary that the binaryinput is for, used for pprof inputs
  -t, --secOnds= Number of seconds to profile for (default: 30)
      --pprofArgs= Extra arguments for pprof

Output Options:
  -f, --file= Output file name (must be .svg) (default: torch.svg)
  -p, --print Print the generated svg to stdout instead of writing to file
  -r, --raw Print the raw call graph output to stdout instead of creating a flame graph; use with Brendan Gregg's flame graph perl script (see
                     https://github.com/brendangregg/FlameGraph)
      --title= Graph title to display in the output file (default: Flame Graph)
      --hljs-constant">Generated graph width (default: 1200)
      --hash Colors are keyed by function name hash
      --colors= set color palette. choices are: hot (default), mem, io, wakeup, chain, java, js, perl, red, green, blue, aqua, yellow, purple, orange
      --cp Use consistent palette (palette.map)
      --reverse Generate stack-reversed flame graph
      --inverted icicle graph

Help Options:
  -h, --help Show this help message

环境准备

安装FlameGraph脚本

git clone https://github.com/brendangregg/FlameGraph.git

cp flamegraph.pl /usr/local/bin

在终端输入 flamegraph.pl -h 是否安装FlameGraph成功:

$ flamegraph.pl -h
Option h is ambiguous (hash, height, help)
USAGE: /usr/local/bin/flamegraph.pl [options] infile > outfile.svg

    --title # change title text
    --width # width of image (default 1200)
    --height # height of each frame (default 16)
    --minwidth # omit smaller functions (default 0.1 pixels)
    --fonttype # font type (default "Verdana")
    --fontsize # font size (default 12)
    --countname # count type label (default "samples")
    --nametype # name type label (default "Function:")
    --colors # set color palette. choices are: hot (default), mem, io,
                  # wakeup, chain, java, js, perl, red, green, blue, aqua,
                  # yellow, purple, orange
    --hash # colors are keyed by function name hash
    --cp # use consistent palette (palette.map)
    --reverse # generate stack-reversed flame graph
    --inverted # icicle graph
    --negate # switch differential hues (blue<->red)
    --help # this message

    eg,
 /usr/local/bin/flamegraph.pl --title="Flame Graph: malloc()" trace.txt > graph.svg

安装go-torch

有了flamegraph的支持，我们接下来要使用go-torch展示profile的输出:

go get -v github.com/uber/go-torch

Demo

启动待调优的程序

在我的实例中，是一个简单的web Demo，go run main.go -printStats启动之后，浏览器能正常访问待调优的接口: http://localhost:9090/demo。每次该接口的访问，都会打印访问信息，如下所示：

root@garnett:/# go run main.go -printStats
Starting Server on :9090
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 67.984µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 339.656µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 55.749µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 89.34µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 59.606µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 47.917µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 42.768µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 1.270416ms
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 34.518µs
IncCounter: handler.received.garnett.advance.no-os.no-browser = 1
RecordTimer: handler.latency.garnett.advance.no-os.no-browser = 281.014µs

启动压力测试

接下来，我们对该接口进行压力测试，看看它在大并发情况下的性能表现。

我们使用go-wrk工具进行试压，go-wrk的安装请前往github官网https://github.com/adjust/go-wrk，只要把代码clone下来go build一下即可。

执行如下命令，进行35s 1W次高并发场景模拟：

go-wrk -d 35 -n 10000 http://localhost:9090/demo

使用go tool pprof

在上面的压测过程中，我们再新建一个终端窗口输入以下命令，生成我们的profile文件：

go tool pprof --seconds 25 http://localhost:9090/debug/pprof/profile

命令中，我们设置了25秒的采样时间，当看到(pprof)的时候，我们输入 web, 表示从浏览器打开,可见下图：

这里写图片描述
看到这个图，你可能已经懵逼了。在我这个简单的Demo中，已经这么难看了，更何况在实际的性能调优中呢！

使用go-torch

在上面的压测过程中，这次我们使用go-torch来生成采样报告：

go-torch -u http://localhost:9090 -t 30

30s后，go-torch完成采样，输出以下信息：

Writing svg to torch.svg

torch.svg是go-torch采样结束后自动生成的profile文件，我们也用浏览器打开,可见下图：

这里写图片描述

这就是go-torch生成的火焰图，看起来是不是舒服多了。

火焰图的y轴表示cpu调用方法的先后，x轴表示在每个采样调用时间内，方法所占的时间百分比，越宽代表占据cpu时间越多

有了火焰图，我们就可以更清楚的看到哪个方法调用耗时长了，然后不断的修正代码，重新采样，不断优化。

好了，本文只有一个目的，就是希望让你对golang程序的性能调优更有兴趣。接下来，你可以在自己的golang项目中对那些耗时太长的接口进行调优了。

推荐阅读

go
[译]技术公司十年经验的职场生涯回顾

本文是一位在技术公司工作十年的职场人士对自己职业生涯的总结回顾。她的职业规划与众不同，令人深思又有趣。其中涉及到的内容有机器学习、创新创业以及引用了女性主义者在TED演讲中的部分讲义。文章表达了对职业生涯的愿望和希望，认为人类有能力不断改善自己。 ... [详细]

蜡笔小新 2023-12-14 11:31:05
main
Android开发实现的计时器功能示例

本文分享了Android开发实现的计时器功能示例，包括效果图、布局和按钮的使用。通过使用Chronometer控件，可以实现计时器功能。该示例适用于Android平台，供开发者参考。 ... [详细]

蜡笔小新 2023-12-12 22:51:19
java
Python爬虫中使用正则表达式的方法和注意事项

本文介绍了在Python爬虫中使用正则表达式的方法和注意事项。首先解释了爬虫的四个主要步骤，并强调了正则表达式在数据处理中的重要性。然后详细介绍了正则表达式的概念和用法，包括检索、替换和过滤文本的功能。同时提到了re模块是Python内置的用于处理正则表达式的模块，并给出了使用正则表达式时需要注意的特殊字符转义和原始字符串的用法。通过本文的学习，读者可以掌握在Python爬虫中使用正则表达式的技巧和方法。 ... [详细]

蜡笔小新 2023-12-12 11:51:07
go
2016 linux发行版排行_灵越7590 安装 linux (manjarognome)

RT之前做了一次灵越7590黑苹果炒作业的文章，希望能够分享给更多不想折腾的人。kawauso：教你如何给灵越7590黑苹果抄作业zhuanlan.z ... [详细]

蜡笔小新 2023-12-10 19:11:07
go
ShiftLeft：将静态防护与运行时防护结合的持续性安全防护解决方案

ShiftLeft公司是一家致力于将应用的静态防护和运行时防护与应用开发自动化工作流相结合以提升软件开发生命周期中的安全性的公司。传统的安全防护方式存在误报率高、人工成本高、耗时长等问题，而ShiftLeft提供的持续性安全防护解决方案能够解决这些问题。通过将下一代静态代码分析与应用开发自动化工作流中涉及的安全工具相结合，ShiftLeft帮助企业实现DevSecOps的安全部分，提供高效、准确的安全能力。 ... [详细]

蜡笔小新 2023-12-10 10:45:15
go
node . js urlsearchparams API

node.jsurlsearchparamsAPI哎哎哎 ... [详细]

蜡笔小新 2023-12-09 18:08:10
go
SQL Server 2008 到底需要使用哪些端口？

SQLServer2008到底需要使用哪些端口？-下面就来介绍下SQLServer2008中使用的端口有哪些：　　首先，最常用最常见的就是1433端口。这个是数据库引擎的端口，如果 ... [详细]

蜡笔小新 2023-10-17 14:12:12
go
Spring MVC定制用户登录注销实现示例

这篇文章描述了如何实现对SpringMVCWeb应用程序的自定义用户访问（登录注销）。作为前提，建议读者阅读这篇文章，其中介 ... [详细]

蜡笔小新 2023-10-16 18:20:22
main
golang 解析磁力链为 torrent 相关的信息

其实通过http请求已经获得了种子的信息了，但是传播存储种子好像是违法的，所以就存储些描述信息吧。之前python跑的太慢了。这个go并发不知道写的有没有问题？！packag ... [详细]

蜡笔小新 2023-10-11 04:54:46
go
go协程模型

本文主要分享【go协程模型】，技术文章【【GORM】模型关系-HasOne】为【VivaPython】投稿，如果你遇到GoWeb相关问题，本文相关知识或能到你。go协程模型一、概述HasO ... [详细]

蜡笔小新 2023-10-10 22:20:54
go
golang 的 http 请求池

看到平台银行对接方案写的demo确实还不错记个笔记互相学习学习packageapiimport(cryptotlsnetnethttpstringssynct ... [详细]

蜡笔小新 2023-09-25 22:40:14
main
Go冒泡排序练习

package main要求：随机生成5个元素的数组，并使用冒泡排序对其排序从小到大思路分析：随机数用mathrand生成为了更好 ... [详细]

蜡笔小新 2023-09-25 18:29:33
go
一篇文章学会 Docker

1Docker简介1.1什么是虚拟化在计算机中，虚拟化（英语：Virtualization）是一种资源管理技术，是将计算机的各种实体资源，如服务器、网络、内存及存储等，予以抽象、转 ... [详细]

蜡笔小新 2023-09-24 12:29:42
main
Golang 递归打印杨辉三角

packagemainimportfmtfuncmain(){YangHuiTriangle(10)}funcYangHuiTriangle(nint)[]int{i:n-1l ... [详细]

蜡笔小新 2023-09-17 11:45:40
main
怎么编译并运行golang程序

怎么编译并运行golang程序？刚入门的朋友还不知道怎么编译运行golang程序，通过这篇文章的总结，希望你能学会书写你的第一个go语言程序。首先我们 ... [详细]

蜡笔小新 2023-09-16 18:02:55

至上励合_安儿_466

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章