{"id":42356382,"url":"https://github.com/hellige/au","last_synced_at":"2026-01-27T16:35:13.361Z","repository":{"id":81079781,"uuid":"123471189","full_name":"hellige/au","owner":"hellige","description":"Binary JSON encoder/decoder, library and tool, mostly for log files","archived":false,"fork":false,"pushed_at":"2025-02-11T22:24:05.000Z","size":2380,"stargazers_count":24,"open_issues_count":1,"forks_count":5,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-02-11T23:29:57.724Z","etag":null,"topics":["json","logging","tools"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hellige.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-01T17:47:18.000Z","updated_at":"2025-02-11T22:24:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"547b9880-8786-4dfd-9bcb-98e9d2acbf3a","html_url":"https://github.com/hellige/au","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/hellige/au","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hellige%2Fau","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hellige%2Fau/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hellige%2Fau/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hellige%2Fau/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hellige","download_url":"https://codeload.github.com/hellige/au/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hellige%2Fau/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28816563,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T12:25:15.069Z","status":"ssl_error","status_checked_at":"2026-01-27T12:25:05.297Z","response_time":168,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["json","logging","tools"],"created_at":"2026-01-27T16:35:12.215Z","updated_at":"2026-01-27T16:35:13.355Z","avatar_url":"https://github.com/hellige.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build status](https://github.com/hellige/au/actions/workflows/ci.yml/badge.svg)](https://github.com/hellige/au/actions?workflow=CI)\n[![Code coverage](https://codecov.io/gh/hellige/au/branch/master/graph/badge.svg)](https://codecov.io/gh/hellige/au)\n[![GitHub release](https://img.shields.io/github/v/release/hellige/au?include_prereleases\u0026sort=semver)](https://github.com/hellige/au/releases)\n[![GitHub license](https://img.shields.io/github/license/hellige/au)](https://github.com/hellige/au/blob/master/LICENSE)\n\n`au` is a file format, header-only C++ library and command-line tool for\nworking with sequential record-oriented data, primarily log files. The tool\nsupports grepping in `.au` files, and for convenience supports grepping normal\nJSON files and plain timestamped ASCII log files as well. The combination of\nbinary search within large files, and the ability to index and offer random\naccess to gzipped files, make this a very useful tool even for plain ASCII\nlogs!\n\n\n## Motivation and usage\n\nOk, so you're doing some logging. The records have some structure but it's\nragged and irregular, like maybe every line has a timestamp and a couple of\nother fields, but beyond that different kinds of events have different fields.\nSo you decide to use JSON. But now your files start getting really big, the key\nnames are duplicated all over the place, you have lots of huge ASCII timestamps\nall over the place, and it all feels pretty wasteful. Your program is also\nspending lots of time formatting JSON, which feels pretty wasteful as well.\n\nBut there are a few things that you don't want to lose:\n - structured but schemaless: the records are nicely structured but also\n   self-describing. You don't need to update a schema in order to add a new\n   record type or otherwise rearrange the content of your logs.\n - greppable, tailable: you can use unix tools to inspect these files in ways\n   normal for log files, particularly being able to tail the end of the file\n   without having to start at the very beginning.\n   \n`au` is the tool for you!\n\nReplace your JSON-writing code with calls to `au`. We've observed roughly a 3:1\nreduction in file size and a considerable improvement in logging performance.\nUsing the command-line tool, you can still grep/tail files while they're being\nwritten:\n\n    # decode/follow records as they're appended to the end of the file:\n    $ au tail -f mylog.au\n\n    # find records matching a string pattern, include 5 records of context\n    # before and after:\n    $ au grep -C 5 2018-07-16T08:01:23.102 mylog.au\n\nThis is all pretty nice, but let's imagine your files are still annoyingly\nlarge, say 10G.  Grepping is slow, and you need to find things quickly. These\nare log files, and all (or most) records have certain useful keys, a timestamp\nand and event ID, which happen to be roughly sequentially ordered in the file.\n\nYou can find a record containing a particular key/value pair:\n\n    $ au grep -k eventTime 2018-07-16T08:01:23.102 biglog.au\n    \nwhich might take a long time. But if you know the values of that key are\nroughly ordered, you can also tell `au` to take advantage of that fact by doing\na binary search:\n\n    $ au grep -o eventTime 2018-07-16T08:01:23.102 biglog.au\n    \nThis often reduces a multi-minute grep to 100ms! And you can also request\na specific number of matches, records of context before/after your match, etc.\n(see `au grep --help` for details).\n\n`au` also provides the same ability to search within normal JSON files:\n\n    $ au grep -o eventTime 2018-07-16T08:01:23.102 biglog.json\n\n`au` will attempt to automatically detect whether the input stream is JSON\nor au-encoded.\n\n\n### Compressed files\n\nWhen your files are big enough to be annoying, you'll probably also want to\ncompress them after writing.  In order to keep supporting binary search, `au`\ncan read and index gzipped files, after which binary search is supported there\ntoo:\n\n    # build an index of the file, written to biglog.au.gz.auzx:\n    $ au zindex biglog.au.gz\n\n    # note grep is now zgrep! this is still a binary search:\n    $ au zgrep -o eventTime 2018-07-16T08:01:23.102 biglog.au.gz\n\nAnd, once again, we can do this with normal JSON files as well:\n\n    $ au zindex biglog.json.gz\n    $ au zgrep -o eventTime 2018-07-16T08:01:23.102 biglog.json.gz\n\n\n### Patterns\n\n`au grep` takes advantage of the typed nature of JSON values when possible for\nsearching. The default is to treat the pattern as possibly any type and try to\nmatch against values of any type, including string values. You can provide\nhints on the command line, for example that the pattern should be considered an\ninteger and only matched against integer values. Special support is provided\nfor timestamps, because although they aren't a distinct type in JSON, they're\nvery common in log files. A pattern like:\n    \n    2018-03-27T18:45:00.123456789\n    \nwill be recognized as a timestamp. Prefixes will match ranges of time, so that\nfor instance:\n\n    $ au grep -k eventTime 2018-03-27T18:45:00.12 file.au\n    \nwill match any event having an `eventTime` in the given 10ms window, and\n`2018-03-27T18` will match events anywhere in the entire hour. Timestamps are\ndecoded as JSON strings. The encoding library has dedicated functions for\nencoding timestamps, while the JSON encoder included in the command-line tool\nwill recognize strings that happen to be representable as timestamps and encode\nthem as such.\n\n\n## Rolling your own decoder\n\nStart with an `AuByteSource`. Use the provided `BufferByteSource` if you have an\nin-memory buffer with the `au` data. If you're starting with an on-disk file,\nuse the `FileByteSourceImpl`. Or inherit from `AuByteSource` if you have more\nspecialized needs.\n\nIn the `src/au/Handlers.h` file you will find a `NoopRecordHandler` and a\n`NoopValueHandler` that you can inherit from and override the pieces you're\ninterested in.\n\nYou'll need a value handler like the `NoopValueHandler`. Let's call this\n`MyValueHandler`.\n\nYou also need a different `ValueHandler` to give to the `au::AuRecordHandler`\n(let's call it `OnValueHandler`). This just needs to implement an `onValue()`\nthat does something like:\n```\nstruct OnValueHandler {\n    onValue(AuByteSource \u0026src, Dictionary \u0026dict) {\n        MyValueHandler vHandler(dict);\n        au::ValueParser parser(src, vHandler);\n        parser.value();\n    }\n};\n```\n\nInstead of using `au::AuRecordHandler` directly, you can roll your own (perhaps\nby ineriting from `NoopRecordHandler`).\n\nPutting it all together:\n```\n    au::BufferByteSource auBuf(buf, len);\n    au::Dictionary dict;\n    OnValueHandler onValueHandler;\n    au::AuRecordHandler\u003cOnValueHandler\u003e recHandler(dict, onValueHandler);\n    au::RecordParser(auBuf, recHandler).parseStream();\n```\n\nThe call graph looks like:\n`au::RecordParser -\u003e au::RecordHandler -\u003e OnValueHandler -\u003e au::ValueParser -\u003e MyValueHandler`\n\n\n## Building from source\n\nWe use git submodules to include some dependencies. The build is via CMake. You\ncan set the usual options to control your compiler, build type, etc., but the\ncrash course is:\n\n    $ git submodule update -i\n    $ mkdir -p out/rel\n    $ cd out/rel\n    $ cmake -DCMAKE_BUILD_TYPE=Release -DSTATIC=On ../..\n    $ make\n    $ src/au --version\n\nYou can run the unit tests in your cmake build directory with:\n\n    $ make unittest\n\nAlternatively, you might use `ctest`.\n\n_Please note that tarballs downloaded from Github releases do not include\nsubmodules, and so building from one of those won't work. Best to just clone\nthe repo and check out the relevant tag._\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhellige%2Fau","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhellige%2Fau","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhellige%2Fau/lists"}