{"id":21062520,"url":"https://github.com/mg98/ae-chunker-go","last_synced_at":"2026-02-25T08:13:57.339Z","repository":{"id":57663266,"uuid":"481493186","full_name":"mg98/ae-chunker-go","owner":"mg98","description":"Go implementation of the AE chunking algorithm.","archived":false,"fork":false,"pushed_at":"2023-01-04T13:47:41.000Z","size":85,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-12T19:40:05.121Z","etag":null,"topics":["chunking","chunking-algorithm","go","golang"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mg98.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-14T06:34:24.000Z","updated_at":"2024-07-19T04:54:42.000Z","dependencies_parsed_at":"2023-02-02T14:32:31.585Z","dependency_job_id":null,"html_url":"https://github.com/mg98/ae-chunker-go","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/mg98/ae-chunker-go","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fae-chunker-go","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fae-chunker-go/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fae-chunker-go/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fae-chunker-go/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mg98","download_url":"https://codeload.github.com/mg98/ae-chunker-go/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mg98%2Fae-chunker-go/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29815020,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-25T05:36:42.804Z","status":"ssl_error","status_checked_at":"2026-02-25T05:36:31.934Z","response_time":61,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chunking","chunking-algorithm","go","golang"],"created_at":"2024-11-19T17:39:09.886Z","updated_at":"2026-02-25T08:13:57.325Z","avatar_url":"https://github.com/mg98.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AE Chunker (GO)\n\n[![GoDoc](http://img.shields.io/badge/godoc-reference-blue.svg)](https://pkg.go.dev/github.com/mg98/ae-chunker-go)\n[![Test](https://github.com/mg98/ae-chunker-go/actions/workflows/test.yml/badge.svg)](https://github.com/mg98/ae-chunker-go/actions/workflows/test.yml)\n[![codecov](https://codecov.io/gh/mg98/ae-chunker-go/branch/main/graph/badge.svg?token=R3OYXX1HC7)](https://codecov.io/gh/mg98/ae-chunker-go)\n[![Go Report Card](https://goreportcard.com/badge/github.com/mg98/ae-chunker-go?)](https://goreportcard.com/report/github.com/mg98/ae-chunker-go)\n![License](https://img.shields.io/github/license/mg98/ae-chunker-go)\n\n**ae-chunker-go** is a best-effort Go implementation of the chunking algorithm presented in\n_AE: An Asymmetric Extremum Content Defined\nChunking Algorithm for Fast and\nBandwidth-Efficient Data Deduplication_\nby Yucheng Zhang et al. ([PDF](https://ranger.uta.edu/~jiang/publication/Conferences/2015/2015-INFOCOM-AE-%20An%20Asymmetric%20Extremum%20Content%20Defined%20Chunking%20Algorithm%20for%20Fast%20and%20Bandwidth-Efficient%20Data%20Deduplication.pdf)).\n\n## Install\n\n```\ngo get -u github.com/mg98/ae-chunker-go\n```\n\n## Example\n\n```go\nimport (\n    \"bytes\"\n    \"fmt\"\n    \"io\"\n    \"log\"\n    \"math/rand\"\n    \"time\"\n    \"github.com/ae-chunker-go\"\n)\n\nfunc main() {\n    data := make([]byte, 1024*1024)  // 1 MiB\n    rnd := rand.New(rand.NewSource(time.Now().Unix()))\n    if _, err := rnd.Read(data); err != nil {\n        log.Fatal(err)\n    }\n\n    chunker := ae.NewChunker(bytes.NewReader(data), \u0026ae.Options{\n    \tAverageSize: 256*1024,  // 256 KiB\n    \tMaxSize: 512*1024,      // 512 KiB\n    })\n    var chunks [][]byte\n    for {\n    \tchunk, err := chunker.NextBytes()\n    \tif err == io.EOF {\n    \t\tbreak\n        } else if err != nil {\n        \tlog.Fatal(err)\n        }\n        chunks = append(chunks, chunk)\n    }\n    \n    fmt.Printf(\n        \"Data divided into %d chunks. First chunk is %d bytes.\\n\",\n        len(chunks),\n        len(chunks[0]),\n    )\n    // Example output: Data divided into 5 chunks. First chunk is 224098 bytes.\n}\n```\n\n## Benchmarks\n\n### Performance\n\nThe task was to divide 100 MiB of random bytes into chunks with an average size of 256 KiB\n(CPU: _Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz_).\n\n| Chunking Algorithm                                                      |      Speed | Processed Bytes | Allocated Bytes | Distinct Mem. Alloc. |\n|-------------------------------------------------------------------------|-----------:|----------------:|----------------:|---------------------:|\n| ae-chunker-go                                                           | 168 sec/op |     622.94 MB/s |    507.27 MB/op |       7769 allocs/op |\n| [fastcdc-go](https://github.com/jotfs/fastcdc-go)                       |  88 sec/op |    1194.74 MB/s |      2.10 MB/op |          3 allocs/op |\n| [go-ipfs-chunker](https://github.com/ipfs/go-ipfs-chunker) (Rabin)      | 414 sec/op |     253.54 MB/s |    108.83 MB/op |       1192 allocs/op |\n| [go-ipfs-chunker](https://github.com/ipfs/go-ipfs-chunker) (Buzhash)    |  81 sec/op |    1288.27 MB/s |    106.48 MB/op |        406 allocs/op |\n| [go-ipfs-chunker](https://github.com/ipfs/go-ipfs-chunker) (Fixed Size) |  22 sec/op |    4773.13 MB/s |    104.87 MB/op |        405 allocs/op |\n\n### Deduplication Efficiency\n\nThis metric measures how well deduplication performs with multiple versions of a file. \nMore precisely, we define the _Deduplication Elimination Ratio (DER)_ as the ratio of the size of the input data\nto the size of the altered data (the higher the better).\n\nFor the evaluation, the uncompressed TAR archives of 20 consecutive versions of the GCC source code were used\n(a total of 12 GB). The algorithms were run configured for an average chunk size \nof 8 KB and (where applicable) a maximum chunk size of 16 KB. \nBecause the Buzhash library does not support flexible chunk sizes \nthe tests were repeated with 256 KB average and 512 KB max size for better comparison.\nGenerally, smaller chunk sizes make better deduplication.\n\n| Chunking Algorithm                                                      | DER (8K/16K) | DER (256K/512K) |\n|-------------------------------------------------------------------------|-------------:|----------------:|\n| ae-chunker-go                                                           |     1.056510 |        1.002392 |\n| [fastcdc-go](https://github.com/jotfs/fastcdc-go)                       |     1.000643 |        1.000000 |\n| [go-ipfs-chunker](https://github.com/ipfs/go-ipfs-chunker) (Rabin)      |     1.354034 |        1.058422 |\n| [go-ipfs-chunker](https://github.com/ipfs/go-ipfs-chunker) (Buzhash)    |          n/a |        1.083399 |\n| [go-ipfs-chunker](https://github.com/ipfs/go-ipfs-chunker) (Fixed Size) |     1.032097 |        1.000579 |\n\n\n### Chunk Size Variance\n\nThe following plots show the chunk size distribution on a set of random bytes of 1 GiB.\nThe algorithm was run with the options\n`\u0026ae.Options{AverageSize: 256*1024}` and `\u0026ae.Options{AverageSize: 256*1024, MaxSize: 512*1024}`,\nrespectively.\n\n\u003cimg src=\"./img/csd256kib.png\" width=\"49%\"\u003e \u003cimg src=\"./img/csd256kib512kib.png\" width=\"49%\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmg98%2Fae-chunker-go","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmg98%2Fae-chunker-go","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmg98%2Fae-chunker-go/lists"}