{"id":13468027,"url":"https://github.com/MarkyMan4/filequery","last_synced_at":"2025-03-26T03:31:26.874Z","repository":{"id":101489077,"uuid":"600906873","full_name":"MarkyMan4/filequery","owner":"MarkyMan4","description":"Query CSV, JSON and Parquet files with SQL","archived":false,"fork":false,"pushed_at":"2024-06-08T18:21:40.000Z","size":124,"stargazers_count":108,"open_issues_count":14,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-03T21:04:39.826Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MarkyMan4.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-12T23:46:00.000Z","updated_at":"2025-02-21T21:35:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"38d4c107-66cf-41ce-a14f-d4de89f3d9a9","html_url":"https://github.com/MarkyMan4/filequery","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkyMan4%2Ffilequery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkyMan4%2Ffilequery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkyMan4%2Ffilequery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkyMan4%2Ffilequery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MarkyMan4","download_url":"https://codeload.github.com/MarkyMan4/filequery/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245584874,"owners_count":20639638,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T15:01:04.367Z","updated_at":"2025-03-26T03:31:25.940Z","avatar_url":"https://github.com/MarkyMan4.png","language":"Python","readme":"# filequery\n[![pypi](https://img.shields.io/pypi/v/filequery.svg)](https://pypi.org/project/filequery/)\n[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/MarkyMan4/filequery)\n\nQuery CSV, JSON and Parquet files using SQL.\n- runs queries using a DuckDB in-memory database for efficient querying\n- any SQL that works with DuckDB will work here\n- use the CLI to easily query files in your terminal or automate queries/transformations as part of a script\n- use the TUI for a more interactive experience\n\n## Demo\n\n### CLI\n\n![out](https://github.com/MarkyMan4/filequery/assets/37815834/38b6f69b-297f-4913-826e-89ffbfe483b3)\n\n### TUI\n\n![filequery_tui](https://github.com/MarkyMan4/filequery/assets/37815834/202655ab-359e-4a42-a9eb-49227cf32f22)\n\n![filequery_menu](https://github.com/MarkyMan4/filequery/assets/37815834/57a58e3b-f283-43e9-8a9f-68c363d748af)\n\n## Installation\n\n```bash\npipx install filequery\n```\n\nor\n\n```bash\npip install filequery\n```\n\n## CLI usage\nRun `filequery --help` to see what options are available.\n\n```\nusage: filequery [-h] [-f FILENAME] [-d FILESDIR] [-q QUERY] [-Q QUERY_FILE] [-o OUT_FILE [OUT_FILE ...]] [-F OUT_FILE_FORMAT] [-D DELIMITER] [-c CONFIG] [-e] [-v]\n\noptions:\n  -h, --help            show this help message and exit\n  -f FILENAME, --filename FILENAME\n                        path to a CSV, Parquet or JSON file\n  -d FILESDIR, --filesdir FILESDIR\n                        path to a directory which can contain a combination of CSV, Parquet and JSON files\n  -q QUERY, --query QUERY\n                        SQL query to execute against file\n  -Q QUERY_FILE, --query_file QUERY_FILE\n                        path to file with query to execute\n  -o OUT_FILE [OUT_FILE ...], --out_file OUT_FILE [OUT_FILE ...]\n                        file to write results to instead of printing to standard output\n  -F OUT_FILE_FORMAT, --out_file_format OUT_FILE_FORMAT\n                        either csv or parquet, defaults to csv\n  -D DELIMITER, --delimiter DELIMITER\n                        delimiter to use when printing result or writing to CSV file\n  -c CONFIG, --config CONFIG\n                        path to JSON config file\n  -e, --editor          run SQL editor UI for exploring data\n  -v, --version         show program's version number and exit\n```\n\nFor basic usage, provide a path to a CSV or Parquet file and a query to execute against it. The table name will be the \nfile name without the extension. If the file name does not conform to DuckDB's rules for unquoted identifiers, the \ntable name will need to be wrapped in double quotes. For example, a file named `my data.csv` would be queried as \n`select * from \"my data\"`.\n\n```bash\nfilequery --filename example/test.csv --query 'select * from test'\n```\n\n## TUI usage\n\nTo use the TUI for querying your files, use the `-e` flag and provide a path to a file or directory.\n\n```bash\nfilequery -e -f path/to/file.csv\n```\n\nor\n\n```bash\nfilequery -e -d path/to/file_directory\n```\n\nYou can also omit a path to a file or directory and open a blank editor. This can be helpful if \nyou want to directly use DuckDB functions such as `read_csv_auto()` for querying your files.\n\n```bash\nfilequery -e\n```\n\n## Examples\n\n```bash\nfilequery --filename example/json_test.json --query 'select nested.nest_id, nested.nest_val from json_test' # query json\n```\n```bash\nfilequery --filesdir example/data --query 'select * from test inner join test1 on test.col1 = test1.col1' # query multiple files in a directory\n```\n```bash\nfilequery --filesdir example/data --query_file example/queries/join.sql # point to a file containing SQL\n```\n```bash\nfilequery --filesdir example/data --query_file example/queries/json_csv_join.sql # SQL file joining data from JSON and CSV files\n```\n```bash\nfilequery --filesdir example/test.csv --query 'select * from test; select sum(col3) from test;' # output multiple query results to multiple files\n```\n\n```bash\nfilequery --filename example/ndjson_test.ndjson --query 'select id, value, nested.subid, nested.subval from ndjson_test' # query nested JSON in an ndjson file\n```\n\nYou can also provide a config file instead of specifying the arguments when running the command.\n\n```bash\nfilequery --config \u003cpath to config file\u003e\n```\n\nThe config file should be a json file. See example config file contents below.\n\n```json\n{\n    \"filename\": \"../example/test.csv\",\n    \"query\": \"select col1, col2 from test\"\n}\n```\n\n```json\n{\n    \"filesdir\": \"../example/data\",\n    \"query_file\": \"../example/queries/join.sql\",\n    \"out_file\": \"result.parquet\",\n    \"out_file_format\": \"parquet\"\n}\n```\n\nSee the `example` directory in the repo for more examples.\n\n## Module usage\nYou can also use filequery in your own programs. See the example below.\n\n```python\nfrom filequery.filedb import FileDb\n\nquery = 'select * from test'\n\n# read test.csv into a table called \"test\"\nfdb = FileDb('example/test.csv')\n\n# return QueryResult object\nres = fdb.exec_query(query)\n\n# formats result as csv\nprint(str(res))\n\n# saves query result to result.csv\nres.save_to_file('result.csv')\n\n# saves query result as parquet file\nfdb.export_query(query, 'result.parquet', FileType.PARQUET)\n```\n\n## Development\nPackages required for distribution should go in `requirements.txt`.\n\nTo build the wheel:\n\n```bash\npip install -r requirements-dev.txt\nmake\n```\n\n## Testing\nTo test the CLI, create a separate virtual environment perform an editable install.\n\n```bash\npython -m venv test-env\n. test-env/bin/activate\npip install -e .\n```\n\nTo run unit tests, stay in the root of the project. The unit tests add `src` to the path so `filequery` can be imported properly.\n\n```bash\npython tests/test_filequery.py\n```\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMarkyMan4%2Ffilequery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMarkyMan4%2Ffilequery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMarkyMan4%2Ffilequery/lists"}