{"id":18119802,"url":"https://github.com/mostlygeek/go-csv-gz-test","last_synced_at":"2026-02-26T02:05:12.830Z","repository":{"id":66828448,"uuid":"131892125","full_name":"mostlygeek/go-csv-gz-test","owner":"mostlygeek","description":"How can can we filter .csv.gz files?","archived":false,"fork":false,"pushed_at":"2018-05-03T17:59:48.000Z","size":8,"stargazers_count":5,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-14T17:12:26.903Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mostlygeek.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-05-02T18:41:03.000Z","updated_at":"2025-03-11T16:09:57.000Z","dependencies_parsed_at":"2023-03-18T13:40:27.550Z","dependency_job_id":null,"html_url":"https://github.com/mostlygeek/go-csv-gz-test","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mostlygeek/go-csv-gz-test","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlygeek%2Fgo-csv-gz-test","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlygeek%2Fgo-csv-gz-test/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlygeek%2Fgo-csv-gz-test/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlygeek%2Fgo-csv-gz-test/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mostlygeek","download_url":"https://codeload.github.com/mostlygeek/go-csv-gz-test/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostlygeek%2Fgo-csv-gz-test/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29848634,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-25T22:37:40.667Z","status":"online","status_checked_at":"2026-02-26T02:00:06.774Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-01T05:17:05.923Z","updated_at":"2026-02-26T02:05:12.825Z","avatar_url":"https://github.com/mostlygeek.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"This was a fun little team project to see how we can filter \nS3 inventory .csv.gz files fastest!\n\n## Other implementations:\n\n* [mythmon's Rust implementation](https://github.com/mythmon/rust-gz-csv-test)\n* [peterbe's Python implementation](https://gist.github.com/peterbe/f147fd093aef43304a5c7e0a89c1ea0a) + [blog](https://www.peterbe.com/plog/fastest-python-datetime-parser)\n\n## Usage\n\n```\n# get some working data, downloads 1GB from S3 into testdata/ subdirectory\n\u003e ./download.sh\n\n\n# Processing using a one file at a time\n\u003e go run ./filter.go\n\n\n# Processing in parallel (workers = num cpus)\n\u003e GOPAR=1 go run ./filter.go\n```\n\n## My results (on my late 2017 13\" MBP)\n\n```\nStrategy: One file at a time ...\nTotal: 31521045, Matched: 710093, Ratio: 2.25%\nTime: 52.740166887s\n```\n\n```\nStrategy: Parallel, 4 Workers ...\nTotal: 31521045, Matched: 710093, Ratio: 2.25%\nTime: 27.207802611s\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmostlygeek%2Fgo-csv-gz-test","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmostlygeek%2Fgo-csv-gz-test","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmostlygeek%2Fgo-csv-gz-test/lists"}