{"id":17694974,"url":"https://github.com/asg017/sqlite-lines","last_synced_at":"2025-04-12T20:45:55.598Z","repository":{"id":38356122,"uuid":"485502138","full_name":"asg017/sqlite-lines","owner":"asg017","description":"A SQLite extension for reading large files line-by-line (NDJSON, logs, txt, etc.)","archived":false,"fork":false,"pushed_at":"2023-10-07T06:04:44.000Z","size":3358,"stargazers_count":397,"open_issues_count":10,"forks_count":9,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-04T00:10:53.258Z","etag":null,"topics":["sqlite","sqlite-extension"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/asg017.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-25T19:11:46.000Z","updated_at":"2025-03-28T07:54:17.000Z","dependencies_parsed_at":"2024-10-23T04:26:34.120Z","dependency_job_id":"d38f17a9-a429-4957-89d3-331e6b5e64e3","html_url":"https://github.com/asg017/sqlite-lines","commit_stats":null,"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asg017%2Fsqlite-lines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asg017%2Fsqlite-lines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asg017%2Fsqlite-lines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asg017%2Fsqlite-lines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/asg017","download_url":"https://codeload.github.com/asg017/sqlite-lines/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248614316,"owners_count":21133691,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["sqlite","sqlite-extension"],"created_at":"2024-10-24T13:50:41.731Z","updated_at":"2025-04-12T20:45:55.539Z","avatar_url":"https://github.com/asg017.png","language":"C","readme":"# sqlite-lines\n\n`sqlite-lines` is a SQLite extension for reading lines from a file or blob.\n\n\u003cimg src=\"./benchmarks/calc.png\" alt=\"Benchmark between sqlite-lines and various other data processing tools\" width=\"600\"/\u003e\n\n\u003csmall\u003eSee [Benchmarks](./benchmarks) for more info.\u003c/small\u003e\n\n## Usage\n\n```sql\n.load ./lines0\nselect line from lines_read('logs.txt');\n```\n\n`sqlite-lines` is great for line-oriented datasets, like [ndjson](https://ndjson.org/) or [JSON Lines](https://jsonlines.org/), when paired with SQLite's [JSON support](https://www.sqlite.org/json1.html). Here, we calculate the top 5 country participants in Google's [Quick, Draw!](https://quickdraw.withgoogle.com/data) dataset for [`calendars.ndjson`](https://storage.googleapis.com/quickdraw_dataset/full/simplified/calendar.ndjson):\n\n```sql\nselect\n  line -\u003e\u003e '$.countrycode' as countrycode,\n  count(*)\nfrom lines_read('./calendar.ndjson')\ngroup by 1\norder by 2 desc\nlimit 5;\n/*\n┌─────────────┬──────────┐\n│ countrycode │ count(*) │\n├─────────────┼──────────┤\n│ US          │ 141001   │\n│ GB          │ 22560    │\n│ CA          │ 11759    │\n│ RU          │ 9250     │\n│ DE          │ 8748     │\n└─────────────┴──────────┘\n*/\n```\n\nUse the SQLite CLI's [`fsdir()`](https://sqlite.org/cli.html#file_i_o_functions) table functions with `lines_read()` to read lines from every file in a directory.\n\n```sql\nselect\n  name as file,\n  lines.rowid as line_number,\n  line\nfrom fsdir('logs')\njoin lines_read(name) as lines\nwhere name like '%.txt';\n/*\n┌─────────────────────┬──────┐\n│ file  │ line_number | line │\n├───────┼─────────────┤──────┤\n| a.txt | 1           | x    |\n| a.txt | 2           | y    |\n| a.txt | 3           | z    |\n| b.txt | 1           | xx   |\n| b.txt | 2           | yy   |\n| c.txt | 1           | xxx  |\n└───────┴─────────────┴──────┘\n*/\n```\n\n## Documentation\n\nSee [`docs.md`](./docs.md) for a full API Reference and detailed documentation.\n\n## Installing\n\n| Language       | Install                                                    |                                                                                                                                                                                           |\n| -------------- | ---------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Python         | `pip install sqlite-lines`                                   | [![PyPI](https://img.shields.io/pypi/v/sqlite-lines.svg?color=blue\u0026logo=python\u0026logoColor=white)](https://pypi.org/project/sqlite-lines/)                                                      |\n| Datasette      | `datasette install datasette-sqlite-lines`                   | [![Datasette](https://img.shields.io/pypi/v/datasette-sqlite-lines.svg?color=B6B6D9\u0026label=Datasette+plugin\u0026logoColor=white\u0026logo=python)](https://datasette.io/plugins/datasette-sqlite-lines) |\n| Node.js        | `npm install sqlite-lines`                                   | [![npm](https://img.shields.io/npm/v/sqlite-lines.svg?color=green\u0026logo=nodedotjs\u0026logoColor=white)](https://www.npmjs.com/package/sqlite-lines)                                                |\n| Deno           | [`deno.land/x/sqlite_lines`](https://deno.land/x/sqlite_lines) | [![deno.land/x release](https://img.shields.io/github/v/release/asg017/sqlite-lines?color=fef8d2\u0026include_prereleases\u0026label=deno.land%2Fx\u0026logo=deno)](https://deno.land/x/sqlite_lines)        |\n| Ruby           | `gem install sqlite-lines`                                   | ![Gem](https://img.shields.io/gem/v/sqlite-lines?color=red\u0026logo=rubygems\u0026logoColor=white)                                                                                                   |\n| Github Release |                                                            | ![GitHub tag (latest SemVer pre-release)](https://img.shields.io/github/v/tag/asg017/sqlite-lines?color=lightgrey\u0026include_prereleases\u0026label=Github+release\u0026logo=github)                     |\n\n\u003c!--\n| Elixir         | [`hex.pm/packages/sqlite_lines`](https://hex.pm/packages/sqlite_lines) | [![Hex.pm](https://img.shields.io/hexpm/v/sqlite_lines?color=purple\u0026logo=elixir)](https://hex.pm/packages/sqlite_lines)                                                                       |\n| Go             | `go get -u github.com/asg017/sqlite-lines/bindings/go`               | [![Go Reference](https://pkg.go.dev/badge/github.com/asg017/sqlite-lines/bindings/go.svg)](https://pkg.go.dev/github.com/asg017/sqlite-lines/bindings/go)                                     |\n| Rust           | `cargo add sqlite-lines`                                             | [![Crates.io](https://img.shields.io/crates/v/sqlite-lines?logo=rust)](https://crates.io/crates/sqlite-lines)                                                                                 |\n--\u003e\n\nThe [Releases page](https://github.com/asg017/sqlite-lines/releases) contains pre-built binaries for Linux amd64 and MacOS (amd64, no arm).\n\n### As a loadable extension\n\nIf you want to use `sqlite-lines` as a [Runtime-loadable extension](https://www.sqlite.org/loadext.html), Download the `lines0.dylib` (for MacOS) or `lines0.so` file from a release and load it into your SQLite environment.\n\n\u003e **Note:**\n\u003e The `0` in the filename (`lines0.dylib` or `lines0.so`) denotes the major version of `sqlite-lines`. Currently `sqlite-lines` is pre v1, so expect breaking changes in future versions.\n\nFor example, if you are using the [SQLite CLI](https://www.sqlite.org/cli.html), you can load the library like so:\n\n```sql\n.load ./lines0\nselect lines_version();\n-- v0.0.1\n```\n\nOr in Python, using the builtin [sqlite3 module](https://docs.python.org/3/library/sqlite3.html):\n\n```python\nimport sqlite3\n\ncon = sqlite3.connect(\":memory:\")\n\ncon.enable_load_extension(True)\ncon.load_extension(\"./lines0\")\n\nprint(con.execute(\"select lines_version()\").fetchone())\n# ('v0.0.1',)\n```\n\nOr in Node.js using [better-sqlite3](https://github.com/WiseLibs/better-sqlite3):\n\n```javascript\nconst Database = require(\"better-sqlite3\");\nconst db = new Database(\":memory:\");\n\ndb.loadExtension(\"./lines0\");\n\nconsole.log(db.prepare(\"select lines_version()\").get());\n// { 'lines_version()': 'v0.0.1' }\n```\n\nOr with [Datasette](https://datasette.io/) (using the \"no filesystem\" version to limit security vulnerabilities):\n\n```\ndatasette data.db --load-extension ./lines_nofs0\n```\n\nWindows is not supported - [yet](https://github.com/asg017/sqlite-lines/issues/4)!\n\n### From the browser with WASM/JavaScript\n\n`sqlite-lines` is also distributed as a standalone [SQL.js](https://github.com/sql-js/sql.js) library. It's essentially a fork of the original SQL.js library, with the addition of `sqlite-lines` functions like `lines_version()` and `lines()`.\n\nCheck out [this Observable notebook](https://observablehq.com/@asg017/introducing-sqlite-lines) for the full demonstration. The [Releases page](https://github.com/asg017/sqlite-lines/releases) contains the JavaScript and WASM files.\n\n### The sqlite-lines CLI\n\n`sqlite-lines` comes with an example CLI modeled after [ndjson-cli](https://github.com/mbostock/ndjson-cli) that demos the speed and versatility of `sqlite-lines`. Download a pre-compiled version from the [Releases page](https://github.com/asg017/sqlite-lines/releases), or build yourself with:\n\n```\nmake cli\n./dist/sqlite-lines\n```\n\nThe `sqlite-lines` CLI reads data from stdin and applies transformations with SQL code through its arguments.\n\nThe first argument should be a SQL expression that is used transform a single line from stdlin. The available columns are `rowid`, which is the \"line number\" that is being processed, and `d`, an alias for `line`, which is the text content of the current line (inspired by ndjson-cli). For example, to uppercase every line from a file with [`upper()`](https://www.sqlite.org/lang_corefunc.html#upper):\n\n```bash\n$ cat names.txt | sqlite-lines 'rowid || upper(d)'\n1ALEX\n2BRIAN\n3CRAIG\n```\n\nThis includes SQLite's new JSON `-\u003e` and `-\u003e\u003e` operators for NDJSON/JSONL files:\n\n```bash\n$ cat data.ndjson | sqlite-lines 'd -\u003e\u003e \"$.id\"'\n$ cat data.ndjson | sqlite-lines 'json_object(\"name\", d -\u003e\u003e \"$.name\", \"age\": d -\u003e\u003e \"$.stats.age\")'\n```\n\nThe second argument is another SQL expression that's used in the WHERE statement of the underlying SQL query to filter out lines.\n\n```bash\n# get the names of all people older than 40\ncat data.ndjson | sqlite-lines 'd -\u003e\u003e \"$.name\"' 'd -\u003e\u003e \"$.age\" \u003e 40'\n```\n\nThe third argument is another SQL expression that's used in the GROUP BY statement of the underlying SQL query to aggregate lines.\n\n### A Note on CSV Parsing\n\n`sqlite-lines` isn't a great option for CSVs. Technically you can, but the moment your data has a `\\n` character in a field or header, then you'll get corrupted results.\n\nInstead, you should use the \"official\" [CSV Virtual Table](https://www.sqlite.org/csv.html), or use the [`.import`](https://www.sqlite.org/cli.html#csv) command in the SQLite CLI.\n","funding_links":[],"categories":["Extensions","C"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasg017%2Fsqlite-lines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasg017%2Fsqlite-lines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasg017%2Fsqlite-lines/lists"}