{"id":21814346,"url":"https://github.com/snawoot/terse","last_synced_at":"2025-10-08T01:31:49.599Z","repository":{"id":152607925,"uuid":"596572273","full_name":"Snawoot/terse","owner":"Snawoot","description":"Output randomly sampled lines from input stream or file","archived":false,"fork":false,"pushed_at":"2023-02-03T15:37:41.000Z","size":22,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-27T14:48:58.771Z","etag":null,"topics":["random-sampling","reservoir-sampling"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Snawoot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-02T13:33:43.000Z","updated_at":"2024-09-14T20:19:08.000Z","dependencies_parsed_at":"2023-05-26T07:30:27.985Z","dependency_job_id":null,"html_url":"https://github.com/Snawoot/terse","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Snawoot%2Fterse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Snawoot%2Fterse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Snawoot%2Fterse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Snawoot%2Fterse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Snawoot","download_url":"https://codeload.github.com/Snawoot/terse/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235669424,"owners_count":19026822,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["random-sampling","reservoir-sampling"],"created_at":"2024-11-27T14:37:52.748Z","updated_at":"2025-10-08T01:31:44.326Z","avatar_url":"https://github.com/Snawoot.png","language":"Go","readme":"# terse\nOutput randomly sampled lines from input stream or file. Uses simple [reservoir sampling](http://www.cs.umd.edu/~samir/498/vitter.pdf) algorithm to process input with linear time complexity. Suitable for processing streams, seeing each line only once. Retains relative order of lines.\n\n## Usage example\n\n```\n\u003e seq 1000000 | terse -n 5\n349893\n539678\n576919\n738393\n758023\n```\n\n## Performance\n\nComparison against `shuf -n`  on real data: 5.1GB nginx log with 17451712  lines in it.\n\n```\nroot@logger:~# ls -lh /var/log/remote/nginx/2023_02_02_18.log\n-rw-r----- 1 root logs 5.1G Feb  2 18:59 /var/log/remote/nginx/2023_02_02_18.log\nroot@logger:~# wc -l /var/log/remote/nginx/2023_02_02_18.log\n17451712 /var/log/remote/nginx/2023_02_02_18.log\nroot@logger:~# time terse -i /var/log/remote/nginx/2023_02_02_18.log -n 25 \u003e /dev/null\n\nreal    0m2.656s\nuser    0m1.315s\nsys     0m1.372s\nroot@logger:~# time shuf -n 25 /var/log/remote/nginx/2023_02_02_18.log \u003e /dev/null\n\nreal    0m22.784s\nuser    0m21.059s\nsys     0m1.703s\n```\n\nIt processes about tens of millions of lines per second on modern computer. Most likely I/O will become bottleneck in such sampling rather than application performance will be an issue.\n\n## Installation\n\n#### Binaries\n\nPre-built binaries are available [here](https://github.com/Snawoot/terse/releases/latest).\n\n#### Build from source\n\nAlternatively, you may install terse from source. Run the following within the source directory:\n\n```\nmake install\n```\n\n#### Docker\n\nA docker image is available as well. Here is an example of running terse in a pipeline with docker:\n\n```sh\nseq 5 | docker run -i --rm yarmak/terse\n```\n\n## Synopsis\n\n```\n\u003e terse -h\nUsage:\n\nterse [OPTION]...\n\nOptions:\n  -buffered\n    \tbuffer control (default true)\n  -i string\n    \tuse input file instead of stdin\n  -n int\n    \tnumber of lines to sample (default 25)\n  -o string\n    \tuse output file instead of stdout\n  -seed value\n    \tuse fixed random seed (default is a value from CSPRNG)\n  -version\n    \tshow program version and exit\n  -z\tline delimiter is NUL, not newline\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnawoot%2Fterse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnawoot%2Fterse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnawoot%2Fterse/lists"}