{"id":37190963,"url":"https://github.com/caibirdme/hand-to-hand-optimize-go","last_synced_at":"2026-01-14T22:04:19.934Z","repository":{"id":94564829,"uuid":"97580547","full_name":"caibirdme/hand-to-hand-optimize-go","owner":"caibirdme","description":"a simple tutorial for optimizing go program by some useful tools","archived":false,"fork":false,"pushed_at":"2017-07-20T06:33:54.000Z","size":1901,"stargazers_count":263,"open_issues_count":0,"forks_count":28,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-06-20T05:26:07.299Z","etag":null,"topics":["golang","pprof","tutorial"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/caibirdme.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-18T09:34:59.000Z","updated_at":"2024-03-14T19:58:26.000Z","dependencies_parsed_at":null,"dependency_job_id":"8d579670-6a69-4e44-8987-e2b99ba36be3","html_url":"https://github.com/caibirdme/hand-to-hand-optimize-go","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/caibirdme/hand-to-hand-optimize-go","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caibirdme%2Fhand-to-hand-optimize-go","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caibirdme%2Fhand-to-hand-optimize-go/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caibirdme%2Fhand-to-hand-optimize-go/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caibirdme%2Fhand-to-hand-optimize-go/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/caibirdme","download_url":"https://codeload.github.com/caibirdme/hand-to-hand-optimize-go/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caibirdme%2Fhand-to-hand-optimize-go/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28436268,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T21:32:52.117Z","status":"ssl_error","status_checked_at":"2026-01-14T21:32:33.442Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","pprof","tutorial"],"created_at":"2026-01-14T22:04:19.099Z","updated_at":"2026-01-14T22:04:19.926Z","avatar_url":"https://github.com/caibirdme.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# Tutorial for optimizing golang program\n\n\u003e Inspired By cch123's article [pprof 和火焰图](http://xargin.com/pprof-he-huo-yan-tu/)\n\nThere're lots of powerful tools for optimizing golang program and some of those you may have already known but still not used it.So let's get start from a very simple demo,the `main.go`\n\n*Follow the article and change the `main.go` yourself when needed*\n\nBut first of all let's have a glimpse of it:\n\n``` go\npackage main\n\nimport (\n\t\"bytes\"\n\t\"io/ioutil\"\n\t\"log\"\n\t\"math/rand\"\n\t\"net/http\"\n\t_ \"net/http/pprof\"\n)\n\nfunc main() {\n\thttp.HandleFunc(\"/test\", handler)\n\tlog.Fatal(http.ListenAndServe(\":9876\", nil))\n}\n\nfunc handler(w http.ResponseWriter, r *http.Request) {\n\terr := r.ParseForm()\n\tif nil != err {\n\t\tw.Write([]byte(err.Error()))\n\t\treturn\n\t}\n\tlog.Println(r.Form)\n\tdoSomeThingOne(10000)\n\tbuff := genSomeBytes()\n\tb, err := ioutil.ReadAll(buff)\n\tif nil != err {\n\t\tw.Write([]byte(err.Error()))\n\t\treturn\n\t}\n\tw.Write(b)\n}\n\nfunc doSomeThingOne(times int) {\n\tfor i := 0; i \u003c times; i++ {\n\t\tfor j := 0; j \u003c times; j++ {\n\n\t\t}\n\t}\n}\n\nfunc genSomeBytes() *bytes.Buffer {\n\tvar buff bytes.Buffer\n\tfor i := 1; i \u003c 20000; i++ {\n\t\tbuff.Write([]byte{'0' + byte(rand.Intn(10))})\n\t}\n\treturn \u0026buff\n}\n```\n\nMake sure you have imported the `net/http/pprof` at the top of the code.\n\nThe demo above is very simple, it sets up a server and handles the request with the function, `handler`.The handler does three things:\n\n* ParseForm\n* doSomeThingOne\n* genSomeBytes\n\nall of them are aiming to simulate the reality.\n\nThen you can run `go run main.go` to start the server.\n\n## Mock Requests\n\nActually, to optimize your system you should know the bottleneck of it therefore you need put the server into a very busy condition just as in production.That's what [wrk](https://github.com/wg/wrk) does.\n\nFor more detail about the `wrk`,see its [github page](https://github.com/wg/wrk)\n\n#### Install wrk\n\nTo install the `wrk`,you need only:\n\n* git clone https://github.com/wg/wrk.git\n* cd wrk\n* make\n\n**wrk relies on the openssl and luajit, learn more from its github page**\n\n#### Generating requests\n\nOur demo is listening on the port *9876*,so let's generate some requests for that.\n\n`./wrk -c400 -t8 -d5m http://localhost:9876/test` \n\n* `-c400` means we have 400 connections to keep open\n* `-t8` means we use 8 threads to build requests\n* `-d5m` means the duration of the test will last for 5 minutes\n\n#### pprof\n\nOur server is very busy now and we can see some information via browser.\nInput `localhost:9876/debug/pprof` you will see:\n\n![pprof page](img/pprof1.png)\n\nThe information in this page can't help you find the bottleneck or the bug directly.\n* If you think your system is not as fast as you expect, see `localhost:9876/debug/pprof/profile`.\n* If you want to optimize the memory, see `localhost:9876/debug/pprof/heap`\n\nIn this article I'll just show you how to find the bottleneck in profile,but you can apply the same way into finding the bottleneck of memory.\n\nNow open your terminal and run `go tool pprof http://localhost:9876/debug/pprof/profile` and see what will happen.\n\nMostly you have to wait for 30s to see the output because pprof need time to do the sampling.The output maybe look like:\n![go tool pprof](img/pprof2.png)\n\n*Tip: Input `help` if you don't know what to do*\n\nOne of the most important commands in pprof is `top x`(x is a number and default 10).\n\n![pprof3](img/pprof3.png)\n\nThe report of `top 10` is divided into six rows\n* `flat`: How much time was spent to run the function which is showed in the last column.\n* `cum`: How much time was spend to run the function and functions invoked by it.\n* `sum%`: sum%(line(i)) = flat%(line(i)) + sum%(line(i-1))\n\n*The latest output from pprof is different from what you maybe learned before from [Profiling Go Programs](https://blog.golang.org/profiling-go-programs), but it doesn't matter*\n\nYou can dope out the meaning of other fields.\n\n### Analyse\n\nYou can learn easily from the report that our server spent 79.25s out of 86.73s on the function `main.doSomethingOne`.If we can optimize it to make it run faster,it'll be a huge step forward.\n\nSo let's look at the code and see what the function `doSomethingOne` on earth do.\n\n``` go\nfunc doSomeThingOne(times int) {\n\tfor i := 0; i \u003c times; i++ {\n\t\tfor j := 0; j \u003c times; j++ {\n\n\t\t}\n\t}\n}\n```\nAssume that the `doSomeThingOne` implements an O(N²) algorithim say, the bubble sort.In fact the bubble sort is not always a good choice so we could change `doSomeThingOne` and implement it with the merge sort(O(NlogN)):\n``` go\nfunc doSomeThingOne(times int) {\n    var inner = int(math.Log2(float64(times)))\n\tfor i := 0; i \u003c times; i++ {\n\t\tfor j := 0; j \u003c inner; j++ {\n\n\t\t}\n\t}\n}\n```\n\n### Test again\n\n* Kill your server process and `go run main.go` again\n* cd ~/wrk \u0026\u0026 ./wrk -c400 -t8 -d5m http://localhost:9876/test\n* open a new terminal and run `go tool pprof http://localhost:9876/debug/pprof/profile`\n* input `top`\n\nthe output in my laptop look like:\n\n![pprof4](img/pprof4.png)\n\nYou may ask where's the`doSomeThingOne`?\nIn fact it spent so little time out of 85.84s that has been ignored by the counter.\nBefore we change the `doSomeThingOne` from O(N²) to O(NlogN),it spent 79.25s out of 86.73s.In our code N equals 10000 so NlogN is around 769 times faster than N².\n79s/769 ≈ 0.1s so yes,the result is reasonable.\n\n### Effort\nHow many times our handler run faster than that before?\nWe could find the answer by `wrk`\n![comparison](img/pprof5.png)\n\nJust `Five times` faster and far less than we expected.There were many reasons,even how we use the `wrk` is one of them, but these're not in the scope of this article.\n\n### Optimize again\n\nIn my eyes,the report of `go tool pprof` is not so intuitive.We have more powerful tools, yes, the graph.\n\nBut before we use it we need install `Graphviz` first.\n\nOn OSX,just run:\n\n`brew install graphviz`\n\nFor other platforms see [here](https://github.com/ellson/graphviz)\n\nAfter have the `graphviz` installed, in the go tool pprof interaction panel, input `web` instead of `top` now, and it'll produce a svg file.Open it in the browser or anything else you like and you'll see a graph like this.\n\n![pprof6](img/pprof6.png)\n\nVery intuitive right?\n\nNow, follow the thickest arrow, you can learn from the graph that most of time is spent on the function `genSomeBytes`\n\n![genSomeBytes](img/pprof7.png)\n\nSo we should come up with an idea to make it run faster! But let's look at it firstly:\n\n``` go\nfunc genSomeBytes() *bytes.Buffer {\n\tvar buff bytes.Buffer\n\tfor i := 1; i \u003c 20000; i++ {\n\t\tbuff.Write([]byte{'0' + byte(rand.Intn(10))})\n\t}\n\treturn \u0026buff\n}\n```\n\nFrom the graph above we can learn than most of time in `genSomeBytes` is spent on `rand.Intn`.But to tell the truth it's not easy to optimize packages in stdlib,but the purpose of this article is to teach you how to find the bottleneck,so I just change the `rand.Intn` to a constant,but you should know they're not equivalent in fact.\n\nThe `genSomeBytes` after modifying:\n\n``` go\nfunc genSomeBytes() *bytes.Buffer {\n\tvar buff bytes.Buffer\n\tfor i := 1; i \u003c 20000; i++ {\n\t\tbuff.Write([]byte{'0' + byte(i%10)})\n\t}\n\treturn \u0026buff\n}\n```\n\n### Generate pprof graph again\n\n* Kill your server process and `go run main.go` again\n* cd ~/wrk \u0026\u0026 ./wrk -c400 -t8 -d5m http://localhost:9876/test\n* open a new terminal and run `go tool pprof http://localhost:9876/debug/pprof/profile`\n* input `web`\n* open the svg file\n\nAnd something maybe like:\n\n![graph2](img/pprof8.png)\n\nYou can see in the graph that most time spent on `genSomeBytes` is no longer how to generate bytes but something else.It's enough for us.\n\nAnd according to `wrk`, we have a huge improvement! Surprise!\n\n![wrk_result](img/wrk0.png)\n\n### go-torch\n\n[go-torch](https://github.com/uber/go-torch) is a tool based on flame graph, sometimes it's even more powerful for us to find the bottleneck,it's more intuitive!\n\n#### Install go-torch\n\n* `go get -u github.com/uber/go-torch`\n* `git clone https://github.com/brendangregg/FlameGraph.git`\n* Make sure add the path of FlameGraph into your $PATH\n\n#### show the flame graph\n\n* use wrk to build more requests\n* `go-torch http://localhost:9876/debug/pprof/profile`\n* open the generated svg file\n\nAnd it may be something like this:\n\n![flame](img/torch.png)\n\nEach colored rectangle stands for a function.The longer the rectangle is, the more time it costs.\n\nSo it's really easy for you to find out the bottleneck in your system.\n\nIn the flame graph we can learn that `fmt.Println` which is used to print some debug information,cost even more time than `doSomeThingOne`.And of course we can remove it.\n\n### Conclusion\n\nIn this article I've just shown you how to use these tools to find the bottleneck in your system,and I just cover the profile. Memory is also a very important aspect for optimization.\n\nIf you have problems welcome to open an issue","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaibirdme%2Fhand-to-hand-optimize-go","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcaibirdme%2Fhand-to-hand-optimize-go","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaibirdme%2Fhand-to-hand-optimize-go/lists"}