{"id":37067102,"url":"https://github.com/jkaardal/csvnav","last_synced_at":"2026-01-14T07:52:39.456Z","repository":{"id":62565730,"uuid":"202797776","full_name":"jkaardal/csvnav","owner":"jkaardal","description":"A memory-efficient python class for navigating large CSV/text files.","archived":false,"fork":false,"pushed_at":"2020-08-09T18:54:00.000Z","size":85,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-01-07T11:46:03.437Z","etag":null,"topics":["csv","data-analysis","data-science","machine-learning","memory-management"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jkaardal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-08-16T20:53:20.000Z","updated_at":"2020-08-09T18:54:02.000Z","dependencies_parsed_at":"2022-11-03T16:00:20.153Z","dependency_job_id":null,"html_url":"https://github.com/jkaardal/csvnav","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/jkaardal/csvnav","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jkaardal%2Fcsvnav","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jkaardal%2Fcsvnav/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jkaardal%2Fcsvnav/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jkaardal%2Fcsvnav/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jkaardal","download_url":"https://codeload.github.com/jkaardal/csvnav/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jkaardal%2Fcsvnav/sbom","scorecard":{"id":522328,"data":{"date":"2025-08-11","repo":{"name":"github.com/jkaardal/csvnav","commit":"071d162ef18a4b24489feea1bb5c78686dd0a5a9"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Code-Review","score":0,"reason":"Found 0/29 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: MIT License: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 2 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-20T03:22:09.849Z","repository_id":62565730,"created_at":"2025-08-20T03:22:09.849Z","updated_at":"2025-08-20T03:22:09.849Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T05:26:33.345Z","status":"ssl_error","status_checked_at":"2026-01-14T05:21:57.251Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","data-analysis","data-science","machine-learning","memory-management"],"created_at":"2026-01-14T07:52:38.782Z","updated_at":"2026-01-14T07:52:39.448Z","avatar_url":"https://github.com/jkaardal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## CSVNAV: a python 3 class for memory-efficient navigation of CSV/Text files.\n\nThis package can be installed with pip:\n```\npip install csvnav\n```\nor by downloading this repo and using setup tools:\n```sh\npython setup.py install\n```\nrun from within the `csvnav` directory.\n\nThe file `csvnav.py` is a python module containing the class `Navigator`. When instantiated, `Navigator` will open a given path and then store pointers to the location of each row in the opened file. In the simplest case, one can use the instantiation sort of like a list. For instance, if I have a file \"inventory.csv\" containing the following CSV data:\n```\ntime,product,quantity\n5,tire,4\n8,sparkplug,20\n2,battery,120\n10,tire,2\n11,tire,3\n30,sparkplug,35\n```\nI can instantiate the class and query rows by index:\n```python\nfrom csvnav import Navigator\n\nnav = Navigator('./inventory.csv', header=True, delimiter=',')\nprint(nav[0])\nprint(nav[2])\nprint(nav.size(force=True))\n\nnav.close()\n```\nwhere the output would be:\n```\n{'product': 'tire', 'quantity': '4', 'time': '5'}\n{'product': 'battery', 'quantity': '120', 'time': '2'}\n6\n```\nNote that the number of data rows (excluding any skipped lines and the header) can be printed by calling `Navigator.size(force=True)`. In this case, `force=True` means that the number of data rows in the file will be determined even if the last row in the file has not be accessed yet. If the last row had been accessed, `force=False` would return the same result. However, if the last row had not yet been accessed, `force=False` would return `None`. Another thing to note is that the rows are returned as a dictionary. As long as `Navigator.header` contains a list of the column names (done automatically from the first row of the CSV file after any skipped lines when `header=True` in instantiation or when column names are provided with the `Navigator.set_header()` method), the rows will be returned as a dictionary. Otherwise, the rows are returned as lists. For example, if \"inventory.csv\" did not have a header then the output would be:\n```\n['5', 'tire', '4']\n['2', 'battery', '120']\n6\n```\nThe `Navigator` class is also iterable and will iterate through rows in order:\n```python\nfor row in nav:\n    print(row)\n```\ngives the output (assuming we have a header):\n```\n{'time': '5', 'product': 'tire', 'quantity': '4'}\n{'time': '8', 'product': 'sparkplug', 'quantity': '20'}\n{'time': '2', 'product': 'battery', 'quantity': '120'}\n{'time': '10', 'product': 'tire', 'quantity': '2'}\n{'time': '11', 'product': 'tire', 'quantity': '3'}\n{'time': '30', 'product': 'sparkplug', 'quantity': '35'}\n```\n\nIf we only want to iterate through a subset of rows that match a condition, we can use the `Navigator.filter` method:\n```python\nfrom csvnav import Navigator\n\nnav = Navigator('./inventory.csv', header=True, delimiter=',')\n\ndef when_few_tires(row):\n    if row['product'] == 'tire' and int(row['quantity']) \u003c= 3:\n        return True\n    else:\n        return False\n\nfor row in nav.filter(when_few_tires):\n    print(row)\n\nnav.close()\n```\nwill produce the output:\n```\n{'time': '10', 'product': 'tire', 'quantity': '2'}\n{'time': '11', 'product': 'tire', 'quantity': '3'} \n```\n\nAnother usage of the class is to group pointers by column name (assuming `Navigator.header` is set). This can be done with the `Navigator.register` method.\nThe following code will then group rows by product and show how this data can be accessed:\n```python\nfrom csvnav import Navigator\n\nnav = Navigator('./inventory.csv', header=True, delimiter=',')\n\nnav.register('product') # can also provide a list of columns to register each\n\nprint(nav.fields)\nprint(nav.keys('product'))\nfor k, v in nav.items('product'):\n    print(k, list(v))\n\nnav.close()\n```\nwill print out the following groups (list of dict or list):\n```\ndict_keys(['product'])\ndict_keys(['tire', 'sparkplug', 'battery'])\ntire [{'time': '5', 'product': 'tire', 'quantity': '4'}, {'time': '10', 'product': 'tire', 'quantity': '2'}, {'time': '11', 'product': 'tire', 'quantity': '3'}]\nsparkplug [{'time': '8', 'product': 'sparkplug', 'quantity': '20'}, {'time': '30', 'product': 'sparkplug', 'quantity': '35'}]\nbattery [{'time': '2', 'product': 'battery', 'quantity': '120'}]\n```\nNote that groups are then accessed by two \"indexes\", namely the column name and the key.\n\nThe `Navigator` class should be thread safe and an instance can be shared between threads. `Navigator` has some more functionality that I have not described here but this covers the basics. Refer to the docstrings of the various methods of the `Navigator` class for more information.\n\n## About\n\nThis code is a generalization of some more application-specific code I wrote while working on analyzing data in large CSV files. I decided to release this code since I think it has some educational value and may be useful to others. This code has been released with permission from the Markov Corporation.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjkaardal%2Fcsvnav","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjkaardal%2Fcsvnav","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjkaardal%2Fcsvnav/lists"}