{"id":15440671,"url":"https://github.com/vearutop/flatjsonl","last_synced_at":"2025-10-17T09:38:12.097Z","repository":{"id":45502487,"uuid":"508361095","full_name":"vearutop/flatjsonl","owner":"vearutop","description":"A tool to flatten JSONL into CSV or SQL","archived":false,"fork":false,"pushed_at":"2025-09-12T22:16:55.000Z","size":817,"stargazers_count":10,"open_issues_count":11,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-09-13T00:22:07.980Z","etag":null,"topics":["csv","hacktoberfest","json","jsonl","sqlite3"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vearutop.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-06-28T15:45:01.000Z","updated_at":"2025-09-12T22:16:37.000Z","dependencies_parsed_at":"2024-01-15T16:16:44.392Z","dependency_job_id":"ac17f271-6be0-484c-a92d-120feb9d4a52","html_url":"https://github.com/vearutop/flatjsonl","commit_stats":null,"previous_names":[],"tags_count":74,"template":false,"template_full_name":"bool64/go-template","purl":"pkg:github/vearutop/flatjsonl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vearutop%2Fflatjsonl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vearutop%2Fflatjsonl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vearutop%2Fflatjsonl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vearutop%2Fflatjsonl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vearutop","download_url":"https://codeload.github.com/vearutop/flatjsonl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vearutop%2Fflatjsonl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279318405,"owners_count":26147235,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-17T02:00:07.504Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","hacktoberfest","json","jsonl","sqlite3"],"created_at":"2024-10-01T19:14:54.310Z","updated_at":"2025-10-17T09:38:12.092Z","avatar_url":"https://github.com/vearutop.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# flatjsonl\n\n[![Build Status](https://github.com/vearutop/flatjsonl/workflows/test-unit/badge.svg)](https://github.com/vearutop/flatjsonl/actions?query=branch%3Amaster+workflow%3Atest-unit)\n[![Coverage Status](https://codecov.io/gh/vearutop/flatjsonl/branch/master/graph/badge.svg)](https://codecov.io/gh/vearutop/flatjsonl)\n[![GoDevDoc](https://img.shields.io/badge/dev-doc-00ADD8?logo=go)](https://pkg.go.dev/github.com/vearutop/flatjsonl)\n[![Time Tracker](https://wakatime.com/badge/github/vearutop/flatjsonl.svg)](https://wakatime.com/badge/github/vearutop/flatjsonl)\n![Code lines](https://sloc.xyz/github/vearutop/flatjsonl/?category=code)\n![Comments](https://sloc.xyz/github/vearutop/flatjsonl/?category=comments)\n\n`flatjsonl` renders structured logs as table.\n\n## Why?\n\nLogs, structured as [`JSON Lines`](https://jsonlines.org/) (and sometimes prefixed with non-JSON message), are very \ncommon source of information for ad-hoc analytics and investigations. \n\nThey can be processed with `jq` and grepped for a variety of data checks, however there are much more powerful and \nconvenient tools that operate on rows and columns, rather than hierarchical structures.\n\nThis tool converts structured logs into tabular data (`CSV`, `SQLite`, `PostgreSQL dump`) with flexible mapping options.\n\n## Performance\n\nLogs of busy systems tend to be large, so performance is important if you want the job done in reasonable time.\n\nThanks to [`github.com/valyala/fastjson`](https://github.com/valyala/fastjson),\n[`github.com/puzpuzpuz/xsync`](https://github.com/puzpuzpuz/xsync) and concurrency-friendly design, \n`flatjsonl` can leverage multicore machines to a large extent and crunch data at high speed.\n\n```\nvearutop@bigassbox ~ $ time ~/flatjsonl -pg-dump ~/events.pg.sql.gz -input ~/events.log -sql-table events -progress-interval 1m\n```\n```\nscanning keys...\nscanning keys: 100.0% bytes read, 11396506 lines processed, 200806.2 l/s, 902.3 MB/s, elapsed 56.75s, remaining 0s, heap 44 MB\nlines: 11396506 , keys: 310\nflattening data...\nflattening data: 20.7% bytes read, 2363192 lines processed, 39385.9 l/s, 177.0 MB/s, elapsed 1m0s, remaining 3m49s, heap 569 MB\nflattening data: 41.7% bytes read, 4750006 lines processed, 39583.1 l/s, 177.9 MB/s, elapsed 2m0s, remaining 2m47s, heap 485 MB\nflattening data: 62.7% bytes read, 7140289 lines processed, 39668.1 l/s, 178.3 MB/s, elapsed 3m0s, remaining 1m47s, heap 610 MB\nflattening data: 83.6% bytes read, 9528709 lines processed, 39702.9 l/s, 178.4 MB/s, elapsed 4m0s, remaining 47s, heap 572 MB\nflattening data: 100.0% bytes read, 11396506 lines processed, 39692.4 l/s, 178.4 MB/s, elapsed 4m47.12s, remaining 0s, heap 508 MB\nlines: 11396506 , keys: 310\n\nreal    5m44.002s\nuser    53m24.841s\nsys     1m1.772s\n```\n```\n51G  events.log\n3.6G events.pg.sql.gz\n```\n\n\n\n## How it works?\n\nIn simplest case this tool iterates log file two times, first pass to collect all available keys and \nsecond pass to actually fill the table with already known keys (columns).\n\nDuring each pass, each line is decoded and traversed recursively.\nKeys for nested elements are declared with dot-separated syntax (same as in `jq`), array indexes are enclosed in `[x]`, \ne.g. `.deeper.subProperty.[0].foo`.\n\nString values are checked for JSON contents and are also traversed if JSON is found (with `-extract-strings` flag).\n\nIf `includeKeys` is not empty in [configuration file](#configuration-file), first pass is skipped.\n\n## Install\n\n\n### Macos Brew\n\n```\nbrew tap vearutop/tools \u0026\u0026 brew update \u0026\u0026 brew install flatjsonl\n```\n\n### Go Install\n\n```\ngo install github.com/vearutop/flatjsonl@latest\n$(go env GOPATH)/bin/flatjsonl --help\n```\n\nOr download binary from [releases](https://github.com/vearutop/flatjsonl/releases).\n\n### Linux AMD64\n\n```\nwget https://github.com/vearutop/flatjsonl/releases/latest/download/linux_amd64.tar.gz \u0026\u0026 tar xf linux_amd64.tar.gz \u0026\u0026 rm linux_amd64.tar.gz\n./flatjsonl -version\n```\n\n### Macos Intel\n\n```\nwget https://github.com/vearutop/flatjsonl/releases/latest/download/darwin_amd64.tar.gz \u0026\u0026 tar xf darwin_amd64.tar.gz \u0026\u0026 rm darwin_amd64.tar.gz\ncodesign -s - ./flatjsonl\n./flatjsonl -version\n```\n\n### Macos Apple Silicon (M1, etc...)\n\n```\nwget https://github.com/vearutop/flatjsonl/releases/latest/download/darwin_arm64.tar.gz \u0026\u0026 tar xf darwin_arm64.tar.gz \u0026\u0026 rm darwin_arm64.tar.gz\ncodesign -s - ./flatjsonl\n./flatjsonl -version\n```\n\n## Usage\n\n```\nflatjsonl -help\n```\n```\nUsage of flatjsonl:\n  -add-sequence\n        Add auto incremented sequence number.\n  -buf-size int\n        Buffer size (max length of file line) in bytes. (default 10000000)\n  -case-sensitive-keys\n        Use case-sensitive keys (can fail for SQLite).\n  -children-limit value\n        Max number of unique child keys, keep JSON is enabled for high cardinality parent, 0 for unlimited, comma-separated for \u003cobject\u003e,\u003carray\u003e, default 100,10.\n  -concurrency int\n        Number of concurrent routines in reader. (default 8)\n  -config string\n        Configuration JSON value, path to JSON5 or YAML file.\n  -csv string\n        Output to CSV file (gzip encoded if ends with .gz).\n  -dbg-cpu-prof string\n        Write CPU profile to file.\n  -dbg-loop-input-size int\n        (benchmark) Repeat input until total target size reached, bytes.\n  -dbg-mem-prof string\n        Write mem profile to file.\n  -extract-strings\n        Check string values for JSON content and extract when available.\n  -field-limit int\n        Max length of field value, exceeding tail is truncated, 0 for unlimited.\n  -get-key string\n        Add a single key to list of included keys.\n  -input string\n        Input from JSONL files, comma-separated.\n  -key-limit int\n        Max length of key, exceeding tail is truncated, 0 for unlimited.\n  -match-line-prefix string\n        Regular expression to capture parts of line prefix (preceding JSON).\n  -max-lines int\n        Max number of lines to process.\n  -max-lines-keys int\n        Max number of lines to process when scanning keys.\n  -mem-limit int\n        Heap in use soft limit, in MB. (default 1000)\n  -offset-lines int\n        Skip a number of first lines.\n  -output string\n        Output to a file (default \u003cinput\u003e.csv).\n  -pg-dump string\n        Output to PostgreSQL dump file.\n  -progress-interval duration\n        Progress update interval. (default 5s)\n  -raw string\n        Output to RAW file (column values are written as is without escaping, gzip encoded if ends with .gz).\n  -raw-delim string\n        RAW file column delimiter.\n  -replace-keys\n        Use unique tail segment converted to snake_case as key.\n  -show-json-schema\n        Show hierarchy as JSON schema.\n  -show-keys-flat\n        Show all available keys as flat list.\n  -show-keys-hier\n        Show all available keys as hierarchy.\n  -show-keys-info\n        Show keys, their replaces and types.\n  -skip-zero-cols\n        Skip columns with zero values.\n  -sql-max-cols int\n        Maximum columns in single SQL table (SQLite will fail with more than 2000). (default 2000)\n  -sql-table string\n        Table name. (default \"flatjsonl\")\n  -sqlite string\n        Output to SQLite file.\n  -sqlite3-cli\n        Use SQLite3 CLI to import via CSV.\n  -verbosity int\n        Show progress in STDERR, 0 disables status, 2 adds more metrics. (default 1)\n  -version\n        Show version and exit.\n```\n\n### Configuration file\n\n```yaml\nincludeKeys:\n  - \".key1\"\n  - \".key2\"\n  - \"const:my-value\"\n  - \".keyGroup.[0].key3\"\nincludeKeysRegex:\n  - \".keyGroup.[1].*\"\nexcludeKeys:\n  - \".keyGroup.[1].notNeeded\"\nexcludeKeysRegex:\n  - \".keyGroup.*.notNeeded\"\nreplaceKeys:\n  \".key1\": key1\n  \".key2\": created_at\nparseTime:\n  \"._prefix.[1]\": 2006/01/02 15:04:05.99999\noutputTimeFormat: '2006-01-02 15:04:05'\noutputTZ: UTC\nconcatDelimiter: \"::\"\nextractValuesRegex:\n  \".foo.link\": \"URL\"\n  \".*.nested\": \"JSON\"\n  \".req.ip_address\": \"GEOIP\"\n# Use keepJSON to list keys with arrays and objects of highly cardinal data, \n#  values would remain as JSON literals instead of being flattened to columns.\nkeepJSON:\n  - \".deviceMapping\"\n# Use keepJSONRegex to list key patterns with arrays and objects of highly cardinal data.\nkeepJSONRegex:\n  - \".data.*.values\"\n```\n\nParse time is a map of original key to time pattern. See https://pkg.go.dev/time#pkg-constants for pattern rules.\n\nOutput time format is used to write parsed timestamps.\n\nList of `includeKeys` can also declare columns with constant values in form of `\"const:\u003cvalue\u003e\"`, `\u003cvalue\u003e` would\nbe used as column value.\n\nConfiguration file can also have [regexp replaces](https://pkg.go.dev/regexp#Regexp.ReplaceAllString) as a map of \nregular expression as keys and replace patterns as values.\n\nIt is also possible to use simplified syntax with `*`, where `*` means key segment (can not start with a digit) between two dots.\n\n```json\n{\n  \"replaceKeysRegex\": {\n    \"^\\\\.foo\\\\.([^.]+)$\": \"f00_${1} VARCHAR(255)\",\n    \".foo.*.*\": \"f00_${2}_${1} VARCHAR(255)\"\n  }\n}\n```\nThis example would produce such transformation.\n```\n.foo.bar =\u003e f00_bar VARCHAR(255)\n.foo.baz.qux =\u003e f00_qux_baz VARCHAR(255)\n```\n\nRegular expression replaces are applied to keys that have no matches in `replaceKeys`.\n\nRegular expressions are checked in no particular order, when replaced key is different from original checks are \nstopped and replaced key is used.\n\nMultiple regular expression could match and replace a key, this can lead to undefined behavior, to avoid it is \nrecommended to use mutually exclusive expressions and match against full key by having `^` and `$` at the edges of exp.\n\nIf multiple keys are replaced into similar key, coalesce function is used for resulting column value, or if \n`concatDelimiter` is defined those values would be concatenated.\n\n### Transposing data\n\nIn cases of dynamic arrays or objects, you may want to transpose the values as rows of separate tables instead of\ncolumns of main table.\n\nThis is possible with `transpose` configuration file field ([example](./flatjsonl/testdata/transpose_cfg.json)), \nit accepts a map of key prefixes to transposed table name. During processing, values found in the prefixed keys would\nbe moved as multiple rows in transposed table.\n\n### Extracting data from strings\n\nWith `extractValuesRegex` config parameter, you can set a map of `regexp` matching key name to value format. \nCurrently `URL` and `JSON` are supported as formats. The string values in the matching keys would be decoded \nand exposed as JSON.\n\n## Examples\n\nImport data from `events.jsonl` as columns described in `events.json` config file to \nSQLite table `events` in file `report.sqlite`.\n\n```\nflatjsonl -sqlite report.sqlite -sql-table events -config events.json events.jsonl\n```\n\nShow flat list of keys found in first 100 (or less) lines of `events.jsonl`.\n```\nflatjsonl -max-lines 100 -show-keys-flat events.jsonl\n```\n\nImport data from `part1.log`, `part2.log`, `part3.log` into `part1.log.csv` with keys converted to snake_case \nunique tails and with columns matched from line prefix (for lines formatted as `\u003cprefix\u003e {\u003cjson\u003e}`).\n```\nflatjsonl -match-line-prefix '([\\w\\d-]+) [\\w\\d]+ ([\\d/]+\\s[\\d:\\.]+)' -replace-keys part1.log part2.log part3.log\n```\n\nExtract a single column from JSONL log (equivalent to `cat huge.log | jq .foo.bar.baz \u003e entries.log`), `flatjsonl` is optimized for multi-core processors, so it can bring perfromance improvement compared to single-threaded `jq`.\n```\nflatjsonl -input huge.log -raw entries.log -get-key \".foo.bar.baz\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvearutop%2Fflatjsonl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvearutop%2Fflatjsonl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvearutop%2Fflatjsonl/lists"}