{"id":18652638,"url":"https://github.com/umarbutler/orjsonl","last_synced_at":"2025-10-11T18:32:09.564Z","repository":{"id":63362792,"uuid":"567302485","full_name":"umarbutler/orjsonl","owner":"umarbutler","description":"A lightweight, high-performance Python library for parsing jsonl files.","archived":false,"fork":false,"pushed_at":"2023-11-09T02:30:53.000Z","size":29,"stargazers_count":30,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T15:53:06.757Z","etag":null,"topics":["bzip2","deserialization","gzip","json","json-lines","jsonl","jsonlines","ndjson","parser","parsing","python","serialization","xz","zstandard"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/umarbutler.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-11-17T14:02:30.000Z","updated_at":"2025-03-15T12:10:31.000Z","dependencies_parsed_at":"2023-01-30T18:46:00.101Z","dependency_job_id":"5ad150bb-7afa-455f-af1f-af95b0d92e30","html_url":"https://github.com/umarbutler/orjsonl","commit_stats":{"total_commits":33,"total_committers":2,"mean_commits":16.5,"dds":"0.030303030303030276","last_synced_commit":"6fa858c68480189082252c4a54590772b5007510"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umarbutler%2Forjsonl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umarbutler%2Forjsonl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umarbutler%2Forjsonl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/umarbutler%2Forjsonl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/umarbutler","download_url":"https://codeload.github.com/umarbutler/orjsonl/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248441225,"owners_count":21103947,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bzip2","deserialization","gzip","json","json-lines","jsonl","jsonlines","ndjson","parser","parsing","python","serialization","xz","zstandard"],"created_at":"2024-11-07T07:07:51.718Z","updated_at":"2025-10-11T18:32:09.485Z","avatar_url":"https://github.com/umarbutler.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# orjsonl\n\u003ca href=\"https://pypi.org/project/orjsonl/\" alt=\"PyPI Version\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/orjsonl\"\u003e\u003c/a\u003e \u003ca href=\"https://github.com/umarbutler/orjsonl/actions/workflows/ci.yml\" alt=\"Build Status\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/umarbutler/orjsonl/ci.yml?branch=main\"\u003e\u003c/a\u003e \u003ca href=\"https://app.codecov.io/gh/umarbutler/orjsonl\" alt=\"Code Coverage\"\u003e\u003cimg src=\"https://img.shields.io/codecov/c/github/umarbutler/orjsonl\"\u003e\u003c/a\u003e \u003ca href=\"https://pypistats.org/packages/orjsonl\" alt=\"Downloads\"\u003e\u003cimg src=\"https://img.shields.io/pypi/dm/orjsonl\"\u003e\u003c/a\u003e\n\n`orjsonl` is a lightweight, high-performance Python library for parsing jsonl files. It supports a wide variety of compression formats, including gzip, bzip2, xz and Zstandard. It is powered by [`orjson`](https://github.com/ijl/orjson), the fastest and most accurate json serializer for Python.\n\n## Installation\n`orjsonl` may be installed with `pip`:\n```bash\npip install orjsonl\n```\n\nTo read or write Zstandard files, install either [`zstd`](https://github.com/facebook/zstd) or the [`zstandard`](https://pypi.org/project/zstandard/) Python package.\n\n## Usage\nThe code snippet below demonstrates how jsonl files can be saved, loaded, streamed, appended and extended with `orjsonl`:\n```python\n\u003e\u003e\u003e import orjsonl\n\u003e\u003e\u003e # Create an iterable of Python objects.\n\u003e\u003e\u003e data = [\n    'hello world',\n    ['fizz', 'buzz'],\n]\n\u003e\u003e\u003e # Save the iterable to a jsonl file.\n\u003e\u003e\u003e orjsonl.save('test.jsonl', data)\n\u003e\u003e\u003e # Append a Python object to the jsonl file.\n\u003e\u003e\u003e orjsonl.append('test.jsonl', {42 : 3.14})\n\u003e\u003e\u003e # Extend the jsonl file with an iterable of Python objects.\n\u003e\u003e\u003e orjsonl.extend('test.jsonl', [True, False])\n\u003e\u003e\u003e # Load the jsonl file.\n\u003e\u003e\u003e orjsonl.load('test.jsonl')\n['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]\n\u003e\u003e\u003e # Stream the jsonl file.\n\u003e\u003e\u003e list(orjsonl.stream('test.jsonl'))\n['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]\n```\n\n`orjsonl` can also be used to process jsonl files compressed with gzip, bzip2, xz and Zstandard:\n```python\n\u003e\u003e\u003e orjsonl.save('test.jsonl.gz', data)\n\u003e\u003e\u003e orjsonl.append('test.jsonl.gz', {42 : 3.14})\n\u003e\u003e\u003e orjsonl.extend('test.jsonl.gz', [True, False])\n\u003e\u003e\u003e orjsonl.load('test.jsonl.gz')\n['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]\n\u003e\u003e\u003e [obj for obj in orjsonl.stream('test.jsonl.gz')]\n['hello world', ['fizz', 'buzz'], {42 : 3.14}, True, False]\n```\n\n### Load\n```python\ndef load(\n    path: str | bytes | int | os.PathLike,\n    decompression_threads: Optional[int] = None,\n    compression_format: Optional[str] = None\n) -\u003e list[dict | list | int | float | str | bool | None]\n```\n\n`load()` deserializes a compressed or uncompressed UTF-8-encoded jsonl file to a list of Python objects.\n\n`path` is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be deserialized.\n\n`decompression_threads` is an optional integer passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`threads`](https://github.com/pycompression/xopen/#xopen) argument that specifies the number of threads that should be used for decompression.\n\n`compression_format` is an optional string passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`format`](https://github.com/pycompression/xopen/#v130-2022-01-10) argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.\n\nThis function returns a `list` object comprised of deserialized `dict`, `list`, `int`, `float`, `str`, `bool` or `None` objects.\n\n### Stream\n```python\ndef stream(\n    path: str | bytes | int | os.PathLike,\n    decompression_threads: Optional[int] = None,\n    compression_format: Optional[str] = None\n) -\u003e Generator[dict | list | int | float | str | bool | None, None, None]\n```\n\n`stream()` creates a `generator` that deserializes a compressed or uncompressed UTF-8-encoded jsonl file to Python objects.\n\n`path` is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be deserialized by the `generator`.\n\n`decompression_threads` is an optional integer passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`threads`](https://github.com/pycompression/xopen/#xopen) argument that specifies the number of threads that should be used for decompression.\n\n`compression_format` is an optional string passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`format`](https://github.com/pycompression/xopen/#v130-2022-01-10) argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.\n\nThis function returns a `generator` that deserializes the file to `dict`, `list`, `int`, `float`, `str`, `bool` or `None` objects.\n\n### Save\n```python\ndef save(\n    path: str | bytes | int | os.PathLike,\n    data: Iterable,\n    default: Optional[Callable] = None,\n    option: int = 0,\n    compression_level: Optional[int] = None,\n    compression_threads: Optional[int] = None,\n    compression_format: Optional[str] = None\n) -\u003e None\n```\n\n`save()` serializes an iterable of Python objects to a compressed or uncompressed UTF-8-encoded jsonl file.\n\n`path` is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be saved.\n\n`data` is an iterable of Python objects to be serialized to the file.\n\n`default` is an optional callable passed to [`orjson.dumps()`](https://github.com/ijl/orjson#serialize) as the [`default`](https://github.com/ijl/orjson#default) argument that serializes subclasses or arbitrary types to supported types.\n\n`option` is an optional integer passed to [`orjson.dumps()`](https://github.com/ijl/orjson#serialize) as the [`option`](https://github.com/ijl/orjson#option) argument that modifies how data is serialized.\n\n`compression_level` is an optional integer passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`compresslevel`](https://github.com/pycompression/xopen/#usage) argument that determines the compression level for writing to gzip, xz and Zstandard files.\n\n`compression_threads` is an optional integer passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`threads`](https://github.com/pycompression/xopen/#xopen) argument that specifies the number of threads that should be used for compression.\n\n`compression_format` is an optional string passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`format`](https://github.com/pycompression/xopen/#v130-2022-01-10) argument that overrides the autodetection of the file’s compression format based on its extension. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.\n\n### Append\n```python\ndef append(\n    path: str | bytes | int | os.PathLike,\n    data: Any,\n    newline: bool = True,\n    default: Optional[Callable] = None,\n    option: int = 0,\n    compression_level: Optional[int] = None,\n    compression_threads: Optional[int] = None,\n    compression_format: Optional[str] = None\n) -\u003e None\n```\n\n`append()` serializes and appends a Python object to a compressed or uncompressed UTF-8-encoded jsonl file.\n\n`path` is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be appended.\n\n`data` is a Python object to be serialized and appended to the file.\n\n`newline` is an optional Boolean flag that, if set to `False`, indicates that the file does not end with a newline and should, therefore, have one added before data is appended.\n\n`default` is an optional callable passed to [`orjson.dumps()`](https://github.com/ijl/orjson#serialize) as the [`default`](https://github.com/ijl/orjson#default) argument that serializes subclasses or arbitrary types to supported types.\n\n`option` is an optional integer passed to [`orjson.dumps()`](https://github.com/ijl/orjson#serialize) as the [`option`](https://github.com/ijl/orjson#option) argument that modifies how data is serialized.\n\n`compression_level` is an optional integer passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`compresslevel`](https://github.com/pycompression/xopen/#usage) argument that determines the compression level for writing to gzip, xz and Zstandard files.\n\n`compression_threads` is an optional integer passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`threads`](https://github.com/pycompression/xopen/#xopen) argument that specifies the number of threads that should be used for compression.\n\n`compression_format` is an optional string passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`format`](https://github.com/pycompression/xopen/#v130-2022-01-10) argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.\n\n### Extend\n```python\ndef extend(\n    path: str | bytes | int | os.PathLike,\n    data: Iterable,\n    newline: bool = True,\n    default: Optional[Callable] = None,\n    option: int = 0,\n    compression_level: Optional[int] = None,\n    compression_threads: Optional[int] = None,\n    compression_format: Optional[str] = None\n) -\u003e None\n```\n\n`extend()` serializes and appends an iterable of Python objects to a compressed or uncompressed UTF-8-encoded jsonl file.\n\n`path` is a path-like object giving the pathname (absolute or relative to the current working directory) of the compressed or uncompressed UTF-8-encoded jsonl file to be extended.\n\n`data` is an iterable of Python objects to be serialized and appended to the file.\n\n`newline` is an optional Boolean flag that, if set to `False`, indicates that the file does not end with a newline and should, therefore, have one added before data is extended.\n\n`default` is an optional callable passed to [`orjson.dumps()`](https://github.com/ijl/orjson#serialize) as the [`default`](https://github.com/ijl/orjson#default) argument that serializes subclasses or arbitrary types to supported types.\n\n`option` is an optional integer passed to [`orjson.dumps()`](https://github.com/ijl/orjson#serialize) as the [`option`](https://github.com/ijl/orjson#option) argument that modifies how data is serialized.\n\n`compression_level` is an optional integer passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`compresslevel`](https://github.com/pycompression/xopen/#usage) argument that determines the compression level for writing to gzip, xz and Zstandard files.\n\n`compression_threads` is an optional integer passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`threads`](https://github.com/pycompression/xopen/#xopen) argument that specifies the number of threads that should be used for compression.\n\n`compression_format` is an optional string passed to [`xopen.xopen()`](https://github.com/pycompression/xopen/#xopen) as the [`format`](https://github.com/pycompression/xopen/#v130-2022-01-10) argument that overrides the autodetection of the file’s compression format based on its extension or content. Possible values are ‘gz’, ‘xz’, ‘bz2’ and ‘zst’.\n\n## License\nThis library is licensed under the [MIT License](https://github.com/umarbutler/orjsonl/blob/main/LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fumarbutler%2Forjsonl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fumarbutler%2Forjsonl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fumarbutler%2Forjsonl/lists"}