{"id":18897742,"url":"https://github.com/uk-ipop/mmi-parser-rs","last_synced_at":"2026-03-01T04:30:21.913Z","repository":{"id":42057127,"uuid":"477494541","full_name":"UK-IPOP/mmi-parser-rs","owner":"UK-IPOP","description":"MMI Parser written with love in Rust.","archived":false,"fork":false,"pushed_at":"2023-06-10T07:29:33.000Z","size":143,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-04-24T16:10:53.192Z","etag":null,"topics":["command-line","metamap","parser","rust"],"latest_commit_sha":null,"homepage":"https://docs.rs/crate/mmi-parser/latest","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UK-IPOP.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-03T23:46:57.000Z","updated_at":"2022-04-29T15:49:27.000Z","dependencies_parsed_at":"2024-11-08T08:41:17.003Z","dependency_job_id":"eac18c9c-26f0-48f9-9ae9-7da0dc8de4a7","html_url":"https://github.com/UK-IPOP/mmi-parser-rs","commit_stats":{"total_commits":77,"total_committers":1,"mean_commits":77.0,"dds":0.0,"last_synced_commit":"f973beb7b8d8612095660ec590edc2f2b5cd8ceb"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UK-IPOP%2Fmmi-parser-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UK-IPOP%2Fmmi-parser-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UK-IPOP%2Fmmi-parser-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UK-IPOP%2Fmmi-parser-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UK-IPOP","download_url":"https://codeload.github.com/UK-IPOP/mmi-parser-rs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239879148,"owners_count":19712174,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line","metamap","parser","rust"],"created_at":"2024-11-08T08:39:27.536Z","updated_at":"2026-03-01T04:30:21.867Z","avatar_url":"https://github.com/UK-IPOP.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `mmi-parser`\n\n![MMI parser](https://user-images.githubusercontent.com/45318637/167729532-51835195-0405-4757-ad5c-20841709e1b1.svg)\n\n`mmi-parser` is a rust command line tool (crate) for parsing out Fielded MetaMap Indexing (MMI) output from the National Library of Medicine's (NLM) [MetaMap tool](https://lhncbc.nlm.nih.gov/ii/tools/MetaMap.html) into jsonlines data.\n\nThe primary reference for the Fielded MMI output can be found [here](https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/Docs/MMI_Output_2016.pdf).\n\n\u003e ! Requires MetaMap 2016 _(or newer)_ due to changes in MMI formatting !\n\n\n- [`mmi-parser`](#mmi-parser)\n  - [Description](#description)\n  - [Requires](#requires)\n  - [Installation](#installation)\n  - [Usage](#usage)\n    - [Brief MetaMap Intro](#brief-metamap-intro)\n    - [mmi-parser (CLI)](#mmi-parser-cli)\n      - [Output Types](#output-types)\n    - [mmi-parser (API)](#mmi-parser-api)\n  - [Example Workflow](#example-workflow)\n  - [Support](#support)\n  - [Contributing](#contributing)\n  - [MIT License](#mit-license)\n\n## Description\n\nDue to the (relatively) technical nature of running the MetaMap program (locally requires command line familiarity), it is assumed users will also be able to install and work with other command line tools (i.e. cargo).\n\nThis project uses [Rust](https://www.rust-lang.org) to parse the Fielded MMI output into\n[jsonlines](https://jsonlines.org) annotated data. While not entirely a different structure, MMI was chosen as the input and jsonlines was chosen as the output for a few reasons.\n\nMMI is by far the most dense/compressed **human-readable** version of MetaMap output, so it makes logical sense to use as input to the parser.\n\nMMI output is often put into separate `.txt` files for each record being run through MetaMap. MMI output also contains one line per concept found. Jsonlines allows us to keep this 1:1 ratio. Each input `.txt` file will have _exactly one_ jsonlines output file with `_parsed` suffixed to the file name to clarify it is parsed output. Jsonlines also has the added benefit of maintaining the 1:1 (concept:line) ratio that the original MMI output has. Thus each jsonline can be tracked to a line in the source (MMI output) text file. This helps with tracing results. Jsonlines, compared to traditional json, also allows file buffer reading which can be a benefit when scanning large files. Finally, while MMI already has fields _associated_ with various parts of the text, jsonlines makes these _implicit_ associations **explicit** in field names.\n\nOne drawback of outputting jsonlines is that the resulting data structure is quite nested (although that is unavoidable due to the _highly_ nested nature of MMI which is used to keep the output dense). While this isn't a problem for data modeling, it may introduce some complications, for example, when trying to analyze the data in a tabular format like dataframes.\n\nFor example:\n\n- `data/sample.txt` --\u003e `data/sample_parsed.jsonl`\n  Where the first line in `data/sample_parsed.jsonl` will represent the first (or last depending on MetaMap) construct found in the source text document but will **always** match the first line in `data/sample.txt`.\n\n\u003e It is worth noting that some MetaMap pipelines produce `.txt` files with a header line indicating when the file was written. Please remove these lines BEFORE running this tool.\n\nIf you need an alternative output, perhaps for a non-technical researcher, I recommend looking at [jq](https://stedolan.github.io/jq/).\n\n## Requires\n\n- [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) package manager (rust toolchain)\n- [just](https://github.com/casey/just) (optional dev-dependency if you clone this repo)\n\n## Installation\n\nCargo is available as a part of the Rust toolchain and is readily available via curl + sh combo (see [here](https://doc.rust-lang.org/cargo/getting-started/installation.html)).\n\nTo install the mmi-parser application, utilize cargo:\n\n```bash\ncargo install mmi-parser\n```\n\nIf you also need MetaMap installed you can find instructions on how to do so [here](https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/documentation/Installation.html).\n\nThere is also an API available on crates.io [here](https://crates.io/crates/mmi-parser). The scope of this API is limited to reduce maintenance burden as the primary goal of this project was an executable parser.\n\nThe API can be installed to your local Rust project by simply adding the crate to your `Cargo.toml`:\n\n```toml\nmmi-parser = \"1.1.0\"\n```\n\n## Usage\n\n### Brief MetaMap Intro\n\nUsage of MetaMap can be found extensively documented on the NLM's website or more directly in [this](https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/Docs/MM_2016_Usage.pdf) document.\n\nFor our use case, we are going to assume you use a command similar in functionality to:\n\n```bash\necho lung cancer | metamap -N \u003e metamap_results.txt\n```\n\nThe `-N` flag here is important as it signals ot MetaMap to use the MMI output.\n\nThis should result in an output similar to the following:\n\n```bash\n/home/nanthony/public_mm/bin/SKRrun.20 /home/nanthony/public_mm/bin/metamap20.BINARY.Linux --lexicon db -Z 2020AA -N\nUSER|MMI|5.18|Carcinoma of lung|C0684249|[neop]|[\"LUNG CANCER\"-tx-1-\"lung cancer\"-noun-0]|TX|0/11|\nUSER|MMI|5.18|Malignant neoplasm of lung|C0242379|[neop]|[\"Lung Cancer\"-tx-1-\"lung cancer\"-noun-0]|TX|0/11|\nUSER|MMI|5.18|Primary malignant neoplasm of lung|C1306460|[neop]|[\"Lung cancer\"-tx-1-\"lung cancer\"-noun-0]|TX|0/1\n```\n\nAs you can see the output is prefaced with a log-line of my metamap installation. This line must be removed BEFORE running the mmi-parser.\n\nIn other words, we expect `metamap_results.txt` to contain:\n\n```bash\nUSER|MMI|5.18|Carcinoma of lung|C0684249|[neop]|[\"LUNG CANCER\"-tx-1-\"lung cancer\"-noun-0]|TX|0/11|\nUSER|MMI|5.18|Malignant neoplasm of lung|C0242379|[neop]|[\"Lung Cancer\"-tx-1-\"lung cancer\"-noun-0]|TX|0/11|\nUSER|MMI|5.18|Primary malignant neoplasm of lung|C1306460|[neop]|[\"Lung cancer\"-tx-1-\"lung cancer\"-noun-0]|TX|0/11|\n```\n\nYou could try to hack your way around piping the output of MetaMap to this tool but it is beyond the scope for our use case.\n\nI would recommend `sed` to remove these headers. While it is not the _most_ performant option, its use is straightforward. Simply:\n\n```bash\nsed -i '1d' \u003ctarget folder\u003e/*.txt\n```\n\nThe `-i` removes the headers in place, and the `'1d` simply means delete the first line.\n\n### mmi-parser (CLI)\n\nOnce you have some MetaMap output, you can parse it into jsonlines simply by specifying the folder in which your output lives. The `mmi-parser` will go through each line in each `.txt` file in the specified directory and parse it into jsonlines.\n\nFor example, in this repo there is a provided [`data/AA_sample.txt`](data/AA_sample.txt) which contains the sample MMI output from the explanatory document linked at the top of this file.\n\nYou can run `mmi-parser` on this file by simply running:\n\n```bash\ncargo run --example parse_aa\n```\n\nThis runs [`examples/parse.rs`](examples/parse_aa.rs) which passes `data` as the target directory to the `mmi-parser` tool.\n\nYou can do the same for MmiOutput type lines by using the mmi example:\n\n```bash\ncargo run --example parse_mmi\n```\n\nwhich loads [`data/MMI_sample.txt`](data/MMI_sample.txt) and outputs [`data/MMI_sample_parsed.jsonl`](data/MMI_sample_parsed.jsonl).\n\nYou can then see the jsonlines output in your [`data/sample_parsed.jsonl`](data/AA_sample_parsed.jsonl) which is also provided in this repo.\n\nWhen running the full program (i.e. `mmi-parser \u003cFOLDER\u003e`), the different result types will be auto-detected for you.\n\nThe tool will also show you any errors it detects and provide the file name and the line of the error in addition to the line itself. While this information\nis very helpful, it can sometimes be obscured by the progress bar depending on your terminal settings. Therefore it is recommended to run the program using a log-file\nto capture the logs while keeping the progress bar visible for sanity. For example:\n\n```bash\nmmi-parser data \u003e errors.log\n```\n\nwould redirect all of the messages/output to the log file where you can scan/read it for more information on the results.\n\n#### Output Types\n\nIt is important to note that there are two distinct output types even though three were described in the [source](https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/Docs/MMI_Output_2016.pdf) file.\n\nThe obvious main MMI type and then we combined the remaining AA/UA types into one (AA).\n\nIn the jsonlines output you will see the first key presents the type associated with\nthat MetaMap output line. This helps with building models/types to represent each of\nthe possibilities and also makes for quick eye-examinations.\n\n### mmi-parser (API)\n\nIf you wish to use the mmi-parser crate in your application the easiest and most convenient method is to create an `MmiOutput` or `AaOutput` type by passing a string reference (most likely a single line of fielded MMI data from a file). The `parse_record()` function will decide which of these types the record belongs to and assemble the type for you. 😃\n\nFull API documentation can be found on [docs.rs](https://docs.rs/mmi-parser/latest/mmi_parser/).\n\n## Example Workflow\n\nFor analytical purposes, I would suggest combining all of these jsonlines files into one larger file and then you can process it with a tool like [jq](https://stedolan.github.io/jq/) or Python - [Pandas](https://pandas.pydata.org) depending on your use case. 🙂\n\n## Support\n\nIf you encounter any issues or need support please either contact [@nanthony007](\u003c[github.com/](https://github.com/nanthony007)\u003e) or [open an issue](https://github.com/UK-IPOP/mmi-parser-rs/issues/new).\n\n## Contributing\n\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\nPlease make sure to update tests as appropriate.\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for more details. 😃\n\n## MIT License\n\n[LICENSE](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuk-ipop%2Fmmi-parser-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuk-ipop%2Fmmi-parser-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuk-ipop%2Fmmi-parser-rs/lists"}