{"id":13413970,"url":"https://github.com/jf-tech/omniparser","last_synced_at":"2025-03-14T20:30:57.713Z","repository":{"id":37071895,"uuid":"288033984","full_name":"jf-tech/omniparser","owner":"jf-tech","description":"omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.","archived":false,"fork":false,"pushed_at":"2024-07-03T21:19:50.000Z","size":7146,"stargazers_count":890,"open_issues_count":1,"forks_count":65,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-07-31T20:53:12.995Z","etag":null,"topics":["codeless","csv","delimited","edi","edifact","etl","fixed-length","fixed-width","golang","javascript","json","parser","schema","schemas","streaming","transform","txt","x12","xml"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jf-tech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["jf-tech"]}},"created_at":"2020-08-16T22:22:21.000Z","updated_at":"2024-07-31T13:38:49.000Z","dependencies_parsed_at":"2024-06-18T16:42:38.664Z","dependency_job_id":"9324cdfc-a839-4b62-85eb-5e7b30601eb7","html_url":"https://github.com/jf-tech/omniparser","commit_stats":null,"previous_names":["jf-tech/goft"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jf-tech%2Fomniparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jf-tech%2Fomniparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jf-tech%2Fomniparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jf-tech%2Fomniparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jf-tech","download_url":"https://codeload.github.com/jf-tech/omniparser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243642055,"owners_count":20323953,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codeless","csv","delimited","edi","edifact","etl","fixed-length","fixed-width","golang","javascript","json","parser","schema","schemas","streaming","transform","txt","x12","xml"],"created_at":"2024-07-30T20:01:54.078Z","updated_at":"2025-03-14T20:30:56.712Z","avatar_url":"https://github.com/jf-tech.png","language":"Go","readme":"# omniparser\n![CI](https://github.com/jf-tech/omniparser/workflows/CI/badge.svg)\n[![codecov](https://codecov.io/gh/jf-tech/omniparser/branch/master/graph/badge.svg)](https://codecov.io/gh/jf-tech/omniparser)\n[![Go Report Card](https://goreportcard.com/badge/github.com/jf-tech/omniparser)](https://goreportcard.com/report/github.com/jf-tech/omniparser)\n[![PkgGoDev](https://pkg.go.dev/badge/github.com/jf-tech/omniparser)](https://pkg.go.dev/github.com/jf-tech/omniparser)\n[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)\n\nOmniparser is a native Golang ETL parser that ingests input data of various formats (**CSV, txt, fixed length/width,\nXML, EDI/X12/EDIFACT, JSON**, and custom formats) in streaming fashion and transforms data into desired JSON output\nbased on a schema written in JSON.\n\nMin Golang Version: 1.16\n\n## Licenses and Sponsorship\nOmniparser is publicly available under [MIT License](./LICENSE).\n[Individual and corporate sponsorships](https://github.com/sponsors/jf-tech/) are welcome and gratefully\nappreciated, and will be listed in the [SPONSORS](./sponsors/SPONSORS.md) page.\n[Company-level sponsors](https://github.com/sponsors/jf-tech/) get additional benefits and supports\ngranted in the [COMPANY LICENSE](./sponsors/COMPANY_LICENSE.md).\n\n## Documentation\n\nDocs:\n- [Getting Started](./doc/gettingstarted.md): a tutorial for writing your first omniparser schema.\n- [IDR](./doc/idr.md): in-memory data representation of ingested data for omniparser.\n- [XPath Based Record Filtering and Data Extraction](./doc/xpath.md): xpath queries are essential to omniparser schema\nwriting. Learn the concept and tricks in depth.\n- [All About Transforms](./doc/transforms.md): everything about `transform_declarations`.\n- [Use of `custom_func`, Specially `javascript`](./doc/use_of_custom_funcs.md): An in depth look of how `custom_func`\nis used, specially the all mighty `javascript` (and `javascript_with_context`).\n- [CSV Schema in Depth](./doc/csv2_in_depth.md): everything about schemas for CSV input.\n- [Fixed-Length Schema in Depth](./doc/fixedlength2_in_depth.md): everything about schemas for fixed-length (e.g. TXT)\ninput\n- [JSON/XML Schema in Depth](./doc/json_xml_in_depth.md): everything about schemas for JSON or XML input.\n- [EDI Schema in Depth](./doc/edi_in_depth.md): everything about schemas for EDI input.\n- [Programmability](./doc/programmability.md): Advanced techniques for using omniparser (or some of its components) in\nyour code.\n\nReferences:\n- [Custom Functions](./doc/customfuncs.md): a complete reference of all built-in custom functions.\n\nExamples:\n- [CSV Examples](extensions/omniv21/samples/csv2)\n- [Fixed-Length Examples](extensions/omniv21/samples/fixedlength2)\n- [JSON Examples](extensions/omniv21/samples/json)\n- [XML Examples](extensions/omniv21/samples/xml).\n- [EDI Examples](extensions/omniv21/samples/edi).\n- [Custom File Format](extensions/omniv21/samples/customfileformats/jsonlog)\n- [Custom Funcs](extensions/omniv21/samples/customfuncs)\n\nIn the example folders above you will find pairs of input files and their schema files. Then in the\n`.snapshots` sub directory, you'll find their corresponding output files.\n\n## Online Playground (not functioning)\n\n~~Use [The Playground](https://omniparser-prod-omniparser-qd0sj4.mo2.mogenius.io/)  (may need to wait for a few seconds for instance to wake up)\nfor trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.~~\n\nAs for now (2023/03/14), all of our previous free docker hosting solutions went away and we haven't found another one yet. For now please clone the repo and use `./cli.sh` as described in the [Getting Started](./doc/gettingstarted.md) page.\n\n## Why\n- No good ETL transform/parser library exists in Golang.\n- Even looking into Java and other languages, choices aren't many and all have limitations:\n    - [Smooks](https://www.smooks.org/) is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.\n    - [BeanIO](http://beanio.org/) can't deal with EDI input.\n    - [Jolt](https://github.com/bazaarvoice/jolt) can't deal with anything other than JSON input.\n    - [JSONata](https://jsonata.org/) still only JSON -\u003e JSON transform.\n- Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some\nsituations.\n\n## Requirements\n- Golang 1.16 or later.\n\n## Recent Major Feature Additions/Changes\n- 2024/06: v1.0.5 released: **upgraded minimum go version to 1.16**; enabled full ES6 feature support in javascript custom function.\n- 2022/09: v1.0.4 released: added `csv2` file format that supersedes the original `csv` format with support of hierarchical and nested records.\n- 2022/09: v1.0.3 released: added `fixedlength2` file format that supersedes the original `fixed-length` format with support of hierarchical and nested envelopes.\n- 1.0.0 Released!\n- Added `Transform.RawRecord()` for caller of omniparser to access the raw ingested record.\n- Deprecated `custom_parse` in favor of `custom_func` (`custom_parse` is still usable for\nback-compatibility, it is just removed from all public docs and samples).\n- Added `NonValidatingReader` EDI segment reader.\n- Added fixed-length file format support in omniv21 handler.\n- Added EDI file format support in omniv21 handler.\n- Major restructure/refactoring\n    - Upgrade omni schema version to `omni.2.1` due a number of incompatible schema changes:\n        - `'result_type'` -\u003e `'type'`\n        - `'ignore_error_and_return_empty_str` -\u003e `'ignore_error'`\n        - `'keep_leading_trailing_space'` -\u003e `'no_trim'`\n    - Changed how we handle custom functions: previously we always use strings as in param type as well as result param\n    type. Not anymore, all types are supported for custom function in and out params.\n    - Changed the way we package custom functions for extensions: previously we collected custom functions from all\n    extensions and then passed all of them to the extension that is used; this feels weird, now only the custom\n    functions included in a particular extension are used in that extension.\n    - Deprecated/removed most of the custom functions in favor of using 'javascript'.\n    - A number of package renaming.\n- Added CSV file format support in omniv2 handler.\n- Introduced IDR node cache for allocation recycling.\n- Introduced [IDR](./doc/idr.md) for in-memory data representation.\n- Added trie based high performance `times.SmartParse`.\n- Command line interface (one-off `transform` cmd or long-running http `server` mode).\n- `javascript` engine integration as a custom_func.\n- JSON stream parser.\n- Extensibility:\n    - Ability to provide custom functions.\n    - Ability to provide custom schema handler.\n    - Ability to customize the built-in omniv2 schema handler's parsing code.\n    - Ability to provide a new file format support to built-in omniv2 schema handler.\n\n## Footnotes\n- omniparser is a collaboration effort of [jf-tech](https://github.com/jf-tech/),[Simon](https://github.com/liangxibing)\nand [Steven](http://github.com/wangjia007bond).\n","funding_links":["https://github.com/sponsors/jf-tech","https://github.com/sponsors/jf-tech/"],"categories":["Text Processing","Libraries","数据 Data","Go","Data Integration Frameworks","Bot Building","Specific Formats","文本处理"],"sub_categories":["Parsers/Encoders/Decoders","Golang","HTTP Clients","标准CLI","Standard CLI","解析 器/Encoders/Decoders"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjf-tech%2Fomniparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjf-tech%2Fomniparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjf-tech%2Fomniparser/lists"}