Go pprof 性能分析利器

pprof 是 Go 标准库 runtime/pprof 和 net/http/pprof 提供的性能分析工具，是排查 Go 程序性能问题的必备武器。

pprof 概述

pprof 能分析什么

pprof 分析类型
├── CPU Profiling (cpu.pprof)
│   └── 采样 CPU 使用时间（每 10ms 中断采样）
│
├── Memory Profiling (mem.pprof / allocs.pprof)
│   ├── mem.pprof: 采样存活对象的内存分配
│   └── allocs.pprof: 采样所有内存分配（包括已回收的）
│
├── Goroutine Profiling (goroutine.pprof)
│   └── 采样当前所有 goroutine 的堆栈
│
├── Mutex Profiling (mutex.pprof)
│   └── 采样锁竞争导致的阻塞
│
├── Block Profiling (block.pprof)
│   └── 采样阻塞操作（channel、sync.Mutex 等）
│
└── Thread Creation (threadcreate.pprof)
    └── 采样新线程创建

启用 pprof

go
// 方法 1: 导入 net/http/pprof（最常用）
import _ "net/http/pprof"

// 当你导入这个包，它会自动注册 pprof handler 到 /debug/pprof/

// 方法 2: 手动启动 pprof 服务
import (
    "net/http"
    "runtime/pprof"
)

func startPprof() {
    go func() {
        // 监听 6060 端口
        http.ListenAndServe(":6060", nil)
    }()
}

// 方法 3: 生成 profiling 文件（离线分析）
import "os"

func writeProfile() {
    f, _ := os.Create("cpu.pprof")
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()
    
    // 执行要分析的代码
    doWork()
    
    f.Close()
}

CPU Profiling

采集 CPU 数据

bash
# 方式 1: 通过 HTTP 接口
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.pprof

# 方式 2: 通过 Go 程序
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# 方式 3: 使用 pprof.StartCPUProfile

分析 CPU 数据

bash
# 启动交互式分析
go tool pprof cpu.pprof

# 常用命令：
(pprof) top           # 显示 CPU 占用最高的函数
(pprof) top25          # 显示 top 25
(pprof) web            # 生成 SVG 可视化（在浏览器打开）
(pprof) list funcName  # 查看特定函数的源码和汇编
(pprof) disasm funcName # 查看反汇编

top 命令解读

(pprof) top10
Showing nodes accounting for 8.53s, 85.3% of 10s total

# flat:  函数自身执行时间
# flat%: 占总时间百分比
# sum%:  累计百分比
# cum:   函数 + 调用它函数的总时间
# cum%:  累计占总时间百分比

      flat    flat%   sum%       cum      cum%     name
    2.12s   21.2%   21.2%     2.12s    21.2%     runtime.memmove
    1.81s   18.1%   39.3%     1.81s    18.1%     runtime.malloc
    1.54s   15.4%   54.7%     1.54s    15.4%     main.processData
    0.89s    8.9%   63.6%     2.31s    23.1%     main.encrypt
    0.71s    7.1%   70.7%     0.71s     7.1%     runtime.memscan

list 命令分析函数

(pprof) list processData
Total: 1.54s
ROUTINE ======================== main.processData in /path/to/main.go:45
    1.54s      1.54s (flat, flat% sum%)
         .          .     45:func processData(data []byte) {
         .          .     46:    result := make([]byte, len(data))
         .          .     47:    for i := range data {
         .          .     48:        result[i] = data[i] + 1
         .          .     49:    }
         .          .     50:    encrypt(result)  // 调用 encrypt
         .          .     51:    return result
         .          .     52:}

内存分析

采集内存数据

bash
# allocs: 采样所有内存分配（包括已回收的）
curl http://localhost:6060/debug/pprof/allocs?seconds=30 > allocs.pprof

# heap: 只采样存活对象（更常用）
curl http://localhost:6060/debug/pprof/heap > heap.pprof

# 分析
go tool pprof heap.pprof

heap 分析关键指标

(pprof) top5
# heap profile 包含：
# alloc_space: 累计已分配内存（含已回收）
# inuse_space: 当前存活且正在使用的内存
# alloc_objects: 累计已分配对象数（含已回收）
# inuse_objects: 当前存活对象数

Showing nodes accounting for 1024.56MB, 80% of 1280.70MB inuse space

# size: 对象大小
# filesize: 文件大小
      size     size%   num      num%    name
  512.50MB   50.1%  50.1% 1000000     50.0%    []byte (main.makeBigSlice)
  256.20MB   25.0%  50.0%  500000     25.0%    map[string]*User (main.cache)
  128.10MB   12.5%  25.0%  100000     12.5%    main.User

内存泄漏排查

go
// 典型内存泄漏场景：map 只增不减
var cache = make(map[string]*User)

func AddUser(id string, user *User) {
    cache[id] = user  // 不断添加，从不清理
}

// 解决方案：
// 1. 定期清理
func cleanup() {
    ticker := time.NewTicker(10 * time.Minute)
    for range ticker.C {
        // 只保留最近 1000 条
        if len(cache) > 1000 {
            // 删除最旧的 500 条
        }
    }
}

// 2. 使用 sync.Map 并定期重建
var users sync.Map

func GetOrCreate(id string) *User {
    if user, ok := users.Load(id); ok {
        return user.(*User)
    }
    user := loadFromDB(id)
    users.Store(id, user)
    return user
}

// 3. 使用 LRU Cache
import "github.com/hashicorp/golang-lru"

lruCache, _ := lru.New[string, *User](1000)  // 超过 1000 自动淘汰

用 pprof 排查内存泄漏

bash
# 1. 对比两个时间点的 heap
curl http://localhost:6060/debug/pprof/heap > heap_1.pprof
# ... 运行一段时间 ...
curl http://localhost:6060/debug/pprof/heap > heap_2.pprof

# 2. 对比分析
go tool pprof -diff_base=heap_1.pprof heap_2.pprof

# 3. 如果某个函数的 size 在增长，就是泄漏点
(pprof) top
(pprof) web

Goroutine Profiling

采集 Goroutine 数据

bash
# 获取 goroutine profile
curl http://localhost:6060/debug/pprof/goroutine?debug=1 > goroutine.txt

# debug=1 输出人类可读格式
# debug=0 输出 pprof 格式

# goroutine 分析
go tool pprof goroutine.pprof

分析 Goroutine 泄漏

go
// goroutine 泄漏场景：channel 阻塞
func worker(ch <-chan int) {
    for v := range ch {
        // 处理数据
    }
}

func main() {
    ch := make(chan int)
    go worker(ch)  // 如果没人往 ch 发送数据，worker 会一直阻塞
    // worker goroutine 泄漏！
}

// 正确做法：使用 select + default
func worker(ch <-chan int) {
    for {
        select {
        case v, ok := <-ch:
            if !ok {
                return  // channel 关闭
            }
            // 处理数据
        case <-time.After(5 * time.Second):
            return  // 超时退出
        }
    }
}

goroutine.txt 分析示例

goroutine profile: total 5234
5234 @ something/something.block
#0  runtime.chanrecv(0xc000084000?, 0x0?, 0x20e5e9?)
    /usr/local/go/src/runtime/chan.go:574
#1  runtime.chanrecv1(0xc000084000?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/chan.go:736

--- 5234 个 goroutine 在同一位置阻塞（channel 接收阻塞）

goroutine 数异常排查

bash
# 查看 goroutine 数量变化
# curl http://localhost:6060/debug/pprof/goroutine > g1.txt
# ... 操作 ...
# curl http://localhost:6060/debug/pprof/goroutine > g2.txt

# 使用 web 命令可视化
(pprof) web

# 常见泄漏原因：
# 1. channel 发送阻塞（发送端无人接收）
# 2. HTTP 客户端忘记关闭 response body
# 3. 数据库连接池耗尽阻塞
# 4. 死锁（多个 goroutine 互相等待）

Mutex 和 Block Profiling

Mutex Profiling（锁竞争）

bash
# 开启 mutex profiling
curl http://localhost:6060/debug/pprof/mutex > mutex.pprof

# 分析
go tool pprof mutex.pprof

Mutex 分析示例

go
// 锁竞争热点
var counter int
var mu sync.Mutex

func inc() {
    mu.Lock()
    counter++  // 所有 goroutine 在这里竞争
    mu.Unlock()
}

// pprof 结果会显示哪个锁的竞争最激烈

Block Profiling（阻塞操作）

bash
# 开启 block profiling
curl http://localhost:6060/debug/pprof/block > block.pprof

# 分析
go tool pprof block.pprof

Block 分析示例

go
// 阻塞场景：channel 缓冲满了
ch := make(chan int, 1000)

func producer() {
    for i := 0; i < 100000; i++ {
        ch <- i  // channel 满时会阻塞
    }
}

// pprof 会显示这个阻塞点的堆栈

火焰图（Flame Graph）

生成火焰图

bash
# 1. 安装火焰图工具
# git clone https://github.com/brendangregg/FlameGraph.git
# export PATH=$PATH:/path/to/FlameGraph

# 2. 生成火焰图
go tool pprof -raw cpu.pprof | /path/to/FlameGraph/stackcollapse-go > collapsed.txt
/path/to/FlameGraph/flamegraph.pl collapsed.txt > cpu.svg

# 3. 本地查看
google-chrome cpu.svg  # 或直接用浏览器打开

火焰图阅读指南

                    ┌──────────────────┐
                    │    main.process   │
                    │     (1.2s, 12%)   │
                    └────────┬─────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼────────┐    │    ┌────────▼────────┐
     │  processData    │    │    │    encrypt        │
     │   (0.8s, 8%)    │    │    │   (0.4s, 4%)      │
     └────────┬────────┘    │    └────────┬────────┘
              │              │              │
     ┌────────▼────────┐    │    ┌────────▼────────┐
     │   memmove      │    │    │   AES 加密       │
     │   (0.5s, 5%)   │    │    │   (0.3s, 3%)    │
     └────────────────┘    │    └────────────────┘

# 阅读规则：
# 1. 每一层是其父函数的子函数
# 2. 方块越宽表示占用越多
# 3. 从上到下读是调用栈
# 4. 从下到上看是执行路径
# 5. 点击方块可以看到具体代码行

火焰图颜色约定

markdown
# 颜色按类型区分（一般不用太关注）：
# 绿色: 用户代码
# 橙色: 系统调用（内核）
# 红色: I/O 操作
# 黄色: CPU 时间

实战案例

案例一：CPU 占用高

go
// 问题：程序 CPU 100%，但不知哪里慢
package main

import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    go func() {
        http.ListenAndServe(":6060", nil)
    }()
    
    // 业务代码
    for {
        doSomething()
    }
}

func doSomething() {
    // 假设这个函数很慢
    data := make([]int, 10000000)
    sum := 0
    for i := range data {
        sum += i
    }
    println(sum)
}

// 排查步骤：
// 1. curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.pprof
// 2. go tool pprof cpu.pprof
// 3. (pprof) top
//    发现 doSomething 占 95%
// 4. (pprof) list doSomething
//    发现 make([]int, 10000000) 每次都分配 80MB 内存
// 5. 优化：使用 sync.Pool 复用

go
// 优化后
var dataPool = sync.Pool{
    New: func() interface{} {
        return make([]int, 10000000)
    },
}

func doSomething() {
    data := dataPool.Get().([]int)
    defer dataPool.Put(data)
    
    sum := 0
    for i := range data {
        sum += i
    }
    println(sum)
}

案例二：内存持续增长

go
// 问题：程序内存从 100MB 涨到 1GB+
package main

import (
    "net/http"
    _ "net/http/pprof"
    "sync"
)

type Request struct {
    ID    string
    Data  []byte
}

var requests = make(map[string]*Request)
var mu sync.Mutex

func main() {
    go http.ListenAndServe(":6060", nil)
    
    // 模拟不断添加请求
    for i := 0; ; i++ {
        addRequest(strconv.Itoa(i), make([]byte, 1024*1024))
    }
}

func addRequest(id string, data []byte) {
    mu.Lock()
    requests[id] = &Request{ID: id, Data: data}
    mu.Unlock()
}

// 排查：
// 1. curl http://localhost:6060/debug/pprof/heap > heap.pprof
// 2. go tool pprof heap.pprof
// 3. (pprof) top
//    发现 requests map 占 900MB
// 4. (pprof) list addRequest
//    发现 map 只增不减
// 5. 优化：使用 LRU 或定期清理

案例三：Goroutine 泄漏

go
// 问题：goroutine 数从 100 涨到 10000
package main

import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    go http.ListenAndServe(":6060", nil)
    
    for i := 0; i < 100; i++ {
        go worker()  // 启动 100 个 worker
    }
    
    select {}
}

func worker() {
    ch := make(chan int)
    
    // 这里会永远阻塞，因为没人发送数据
    <-ch
}

// 排查：
// 1. curl http://localhost:6060/debug/pprof/goroutine?debug=1 > g.txt
// 2. 查看 g.txt 发现 10000 个 goroutine 在 ch <- 位置阻塞
// 3. 修复：给 worker 提供正确的 channel

常见面试问题

Q1: pprof 会影响性能吗？

markdown
# CPU profiling：影响较小
# - 采样率约 100Hz（每秒 100 次）
# - 每个样本只中断 10ms
# - 开销 < 5%

# Memory profiling：有一定影响
# - 开启后会记录调用栈
# - 建议只在排查时开启

# 建议：
# - 生产环境默认不开启 pprof
# - 如需排查，开启 mutex/block profiling
# - HTTP 接口加认证或内网访问

Q2: 如何定位死锁？

go
// 方法 1: 使用 deadlock 检测库
import "go.uber.org/goleak"

func TestMain(m *testing.M) {
    code := m.Run()
    goleak.FindLeaks()  // 测试结束时检查 goroutine 泄漏
    os.Exit(code)
}

// 方法 2: runtime.Stack 打印所有 goroutine
func printAllGoroutines() {
    buf := make([]byte, 1<<20)
    for {
        n := runtime.Stack(buf, true)
        if n < len(buf) {
            break
        }
        buf = make([]byte, 2*n)
    }
}

Q3: pprof 和 trace 的区别？

markdown
# pprof：采样分析
# - 采样式，不精确
# - 适合分析 CPU、内存热点
# - 输出静态报告

# trace：完整追踪
# - 记录每个事件（调度、系统调用、GC）
# - 精确到微秒
# - 适合分析延迟、毛刺
# - 使用 go tool trace 查看

# 选择：
# - 不知道哪慢 → 先用 pprof top
# - 知道慢，想分析细节 → 用 trace

最佳实践

1. 生产环境安全开启 pprof

go
import "net/http"
import "net/http/pprof"

func main() {
    mux := http.NewServeMux()
    
    // 只在内网或 localhost 开启
    if os.Getenv("ENV") == "dev" {
        mux.Handle("/debug/pprof/", http.DefaultServeMux)
    }
    
    // 或手动注册需要的功能
    mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
    mux.HandleFunc("/debug/pprof/heap", pprof.Handler("heap").ServeHTTP)
    
    http.ListenAndServe(":8080", mux)
}

2. 自动化监控

go
// 定期采样并上报到监控系统
func recordProfiles() {
    ticker := time.NewTicker(1 * time.Minute)
    for range ticker.C {
        // 采样 goroutine 数
        n := runtime.NumGoroutine()
        metrics.Gauge("goroutine_count").Set(float64(n))
        
        // 采样内存
        var m runtime.MemStats
        runtime.ReadMemStats(&m)
        metrics.Gauge("goroutine_alloc").Set(float64(m.Alloc))
    }
}

3. 常用命令速查

bash
# CPU 分析
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.pprof
go tool pprof cpu.pprof

# 内存分析
curl http://localhost:6060/debug/pprof/heap > heap.pprof
go tool pprof heap.pprof

# Goroutine 分析
curl http://localhost:6060/debug/pprof/goroutine > goroutine.pprof
go tool pprof goroutine.pprof

# 常用 pprof 命令
(pprof) top              # 显示 top 列表
(pprof) top25             # 显示 top 25
(pprof) web              # 生成 SVG（需要 graphviz）
(pprof) list funcName    # 查看函数源码
(pprof) peek funcName    # 查看函数调用者
(pprof) disasm funcName  # 反汇编
(pprof) callgrind        # 输出 callgrind 格式

总结

pprof 是 Go 性能分析的瑞士军刀：

工具	用途	采样内容
`cpu.pprof`	CPU 热点	CPU 使用时间
`heap.pprof`	内存分配	存活对象
`allocs.pprof`	内存分配	所有分配（含回收）
`goroutine.pprof`	Goroutine	堆栈快照
`mutex.pprof`	锁竞争	持锁时间
`block.pprof`	阻塞操作	阻塞时间

排查流程：

top 找热点函数
list 看源码
web 生成调用图
对比多次采样确认问题

Go pprof 性能分析利器：CPU、内存、Goroutine、Mutex、Block 分析

Go pprof 性能分析利器

pprof 概述

pprof 能分析什么

启用 pprof

CPU Profiling

采集 CPU 数据

分析 CPU 数据

top 命令解读

list 命令分析函数

内存分析

采集内存数据

heap 分析关键指标

内存泄漏排查

用 pprof 排查内存泄漏

Goroutine Profiling

采集 Goroutine 数据

分析 Goroutine 泄漏

goroutine.txt 分析示例

goroutine 数异常排查

Mutex 和 Block Profiling

Mutex Profiling（锁竞争）

Mutex 分析示例

Block Profiling（阻塞操作）

Block 分析示例

火焰图（Flame Graph）

生成火焰图

火焰图阅读指南

火焰图颜色约定

实战案例

案例一：CPU 占用高

案例二：内存持续增长

案例三：Goroutine 泄漏

常见面试问题

Q1: pprof 会影响性能吗？

Q2: 如何定位死锁？

Q3: pprof 和 trace 的区别？

最佳实践

1. 生产环境安全开启 pprof

2. 自动化监控

3. 常用命令速查

总结

相关标签