{"id":23438526,"url":"https://github.com/lszeremeta/sdfeater","last_synced_at":"2025-07-17T04:32:35.166Z","repository":{"id":43253512,"uuid":"109877253","full_name":"lszeremeta/SDFEater","owner":"lszeremeta","description":"Always hungry SDF chemical file format parser with many output formats","archived":false,"fork":false,"pushed_at":"2024-08-19T02:27:49.000Z","size":335,"stargazers_count":8,"open_issues_count":7,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-13T06:28:56.254Z","etag":null,"topics":["chebi","chemical-data","chemical-elements","cheminformatics","cli","cvme","cypher","data-structures","database","docker-image","drugbank","jar","java","molecularentity","neo4j","parser","parsers","periodic-table","sdf","sdf-files"],"latest_commit_sha":null,"homepage":"https://hub.docker.com/r/lszeremeta/sdfeater","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lszeremeta.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2017-11-07T18:54:08.000Z","updated_at":"2025-04-12T15:36:17.000Z","dependencies_parsed_at":"2023-02-08T22:16:12.896Z","dependency_job_id":"82dbdfdc-8167-4464-ac35-e3cc4ba4e975","html_url":"https://github.com/lszeremeta/SDFEater","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/lszeremeta/SDFEater","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lszeremeta%2FSDFEater","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lszeremeta%2FSDFEater/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lszeremeta%2FSDFEater/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lszeremeta%2FSDFEater/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lszeremeta","download_url":"https://codeload.github.com/lszeremeta/SDFEater/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lszeremeta%2FSDFEater/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265564872,"owners_count":23788929,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chebi","chemical-data","chemical-elements","cheminformatics","cli","cvme","cypher","data-structures","database","docker-image","drugbank","jar","java","molecularentity","neo4j","parser","parsers","periodic-table","sdf","sdf-files"],"created_at":"2024-12-23T14:49:53.203Z","updated_at":"2025-07-17T04:32:35.149Z","avatar_url":"https://github.com/lszeremeta.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003cimg src=\"https://raw.githubusercontent.com/lszeremeta/SDFEater/master/logo/SDFEater.png\" alt=\"SDFEater logo\" width=\"300\"\u003e\n\n[![Codacy Badge](https://app.codacy.com/project/badge/Grade/fc5d5e2e22ce4616a041d97cdf1f3a11)](https://www.codacy.com/gh/lszeremeta/SDFEater/dashboard?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=lszeremeta/SDFEater\u0026amp;utm_campaign=Badge_Grade) [![Docker Image Size (latest by date)](https://img.shields.io/docker/image-size/lszeremeta/sdfeater?label=Docker%20image%20size)](https://hub.docker.com/r/lszeremeta/sdfeater)\n\n[SDF](https://pubs.acs.org/doi/abs/10.1021/ci00007a012) parser written in Java is run from the command-line interface (CLI). You don't need to have new Java installed! Java 8 and above are supported. Do you love️ Docker? You can use a lightweight [SDFEater container](https://hub.docker.com/r/lszeremeta/sdfeater)! SDFEater not only ~~eats~~ parses your SDF files, but also can add additional data to the output. The choice of output formats is really wide.\n\n## Quick start\n\nUse SDFEater in 3 easy steps. In this example, we will use the [ChEBI](https://www.ebi.ac.uk/chebi/init.do) dataset and ready to use JAR file. You need Java 8+ installed.\n\n1. Download the ready-to-use JAR `SDFEater-VERSION-jar-with-dependencies.jar` file from [project release](https://github.com/lszeremeta/SDFEater/releases) asset.\n\nSDFEater is also available as a [Docker image](#docker-image). In most scenarios, JAR file or the Docker image should be sufficient and convenient to run SDFEater, but you may want to [build everything yourself](https://github.com/lszeremeta/SDFEater/wiki/Manual-project-build).\n\n2. [Download ChEBI complete 3-star dataset file](https://www.ebi.ac.uk/chebi/downloadsForward.do) and unpack downloaded gz archive. ChEBI datasets are shared via FTP, so if your browser or operating system does not support FTP, you may need an additional program such as [FileZilla](https://filezilla-project.org/).\n3. Assuming the `ChEBI_complete_3star.sdf` file is in the current directory and the output format you're interested in is RDFa, the command will be as follows:\n\n```shell\n    java -jar SDFEater-VERSION-jar-with-dependencies.jar -f rdfa -i ChEBI_complete_3star.sdf \u003e ChEBI_complete_3star_rdfa.html\n```\n\nThat's all. Now you have the RDFa file ready in the current directory. You can try other output formats and options as described below. You can also use SDFEater to convert [DrugBank SDF files](https://go.drugbank.com/releases/latest#structures) or prepare your own SDF input file with [supported keys](https://github.com/lszeremeta/SDFEater/wiki/Supported-SDF-keys).\n\n## Docker image\n\nIf you have [Docker](https://docs.docker.com/engine/install/) installed, you can use a tiny SDFEater Docker image from [Docker Hub](https://hub.docker.com/r/lszeremeta/sdfeater).\n\nBecause the tool is closed inside the container, you have to [mount](https://docs.docker.com/storage/bind-mounts/#start-a-container-with-a-bind-mount) a local directory with your input file. The default working directory of the image is `/app`. You need to mount your local directory inside it (e.g. `/app/input`):\n\n```shell\ndocker run --rm --name sdfeater-app --mount type=bind,source=/home/user/input,target=/app/input,readonly lszeremeta/sdfeater:latest\n```\n\nIn this case, the local directory `/home/user/input` has been mounted under `/app/input`.\n\nYou can also simply mount the current working directory using `$(pwd)` sub-command:\n\n```shell\ndocker run --rm --name sdfeater-app --mount type=bind,source=\"$(pwd)\",target=/app/input,readonly lszeremeta/sdfeater:latest\n```\n\n## CLI options\n\nRunning SDFEater without parameters displays help.\n\n* `-i,--input \u003carg\u003e` - input SDF file path (required)\n* `-f,--format \u003carg\u003e` - output format (e.g. `cypher`, `jsonld`, `cvme`, `smiles`, `inchi`) (required; full list below)\n* `-s,--subject \u003carg\u003e` - subject type (`iri`, `uuid`, `bnode`; `iri` by default; for all formats excluding cypher, cvme, smiles, inchi)\n* `-b,--base \u003carg\u003e` - molecule subject base for 'iri' subject type ('https://example.com/molecule#entity' by default)\n\nRemember about the appropriate file path when using the Docker image. Suppose you mounted your local directory `/home/user/input` under `/app/input` and the path to the SDF file you want to use in SDFEater is `/home/user/input/file.sdf`. In this case, enter the path `/app/input/file.sdf` or `input/file.sdf` as the value of the `-i` argument.\n\n## Output formats\n\nYou can specify the output format using `-f,--format`. Available output formats:\n\n* `cypher` - [Cypher](https://neo4j.com/developer/cypher-query-language/) molecule, atoms, bonds and relation ready to [import to the Neo4j graph database](https://neo4j.com/developer/kb/export-sub-graph-to-cypher-and-import/),\n* `cypheru` - the same as `cypher` option, but try to generate full database URLs instead of IDs,\n* `cypherp` - the same as `cypher` option, but add additional atoms data from [periodic table](https://github.com/lszeremeta/SDFEater/blob/master/src/main/resources/pl/edu/uwb/ii/sdfeater/periodic_table.json),\n* `cypherup` - the same as `cypher` option, but added URLs and additional atoms data from [periodic table](https://github.com/lszeremeta/SDFEater/blob/master/src/main/resources/pl/edu/uwb/ii/sdfeater/periodic_table.json),\n* `cvme` - [CVME](http://cs.aalto.fi/en/current/events/2017-09-22-002/) file format based on SKOS,\n* `smiles` - plain text SMILES (if available in the molecule property)\n* `inchi` - plain text InChI (if available in the molecule property)\n* `turtle` - [Terse RDF Triple Language](https://www.w3.org/TR/turtle/) (based on [MolecularEntity profile](https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE/))\n* `ntriples` - [N-Triples](https://www.w3.org/TR/n-triples/) (based on [MolecularEntity profile](https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE/))\n* `rdfxml` - [RDF/XML](https://www.w3.org/TR/rdf-syntax-grammar/) (based on [MolecularEntity profile](https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE/))\n* `rdfthrift` - [RDF Binary encoding using Thrift](https://afs.github.io/rdf-thrift/rdf-binary-thrift.html) (based on [MolecularEntity profile](https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE/))\n* `jsonldhtml` - [JSON-LD](https://json-ld.org/) with HTML (based on [MolecularEntity profile](https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE/))\n* `jsonld` - [JSON-LD](https://json-ld.org/) (based on [MolecularEntity profile](https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE/))\n* `rdfa` - Simple HTML with [RDFa](http://rdfa.info/) (based on [MolecularEntity profile](https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE/))\n* `microdata` - Simple HTML with [Microdata](https://www.w3.org/TR/microdata/) (based on [MolecularEntity profile](https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE/))\n\n## What is structured data\n\n[Structured data](https://developers.google.com/search/docs/guides/intro-structured-data) is additional data placed on websites. It is not visible to ordinary internet users but can be easily processed by machines. There are 3 formats that we can use to save structured data - [JSON-LD](https://json-ld.org/), [RDFa](http://rdfa.info/), and [Microdata](https://www.w3.org/TR/microdata/). SDFEater supports them all and uses the [MolecularEntity profile](https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE/).\n\n## Additional examples\n\n```shell\njava -jar SDFEater-VERSION-jar-with-dependencies.jar -i ../examples/chebi_special_char_test.sdf -f cypherup\n```\n\nReturns [Cypher](https://neo4j.com/developer/cypher-query-language/) with added periodic table data for atoms and replaced chemical database IDs with URL. SDFEater run from a JAR file.\n\n```shell\njava -jar SDFEater-VERSION-jar-with-dependencies.jar -i ../examples/chebi_test.sdf -f jsonld  \u003e molecules.jsonld\n```\n\nReturns [JSON-LD](https://json-ld.org/) and redirects output to `molecules.jsonld` file. SDFEater runs from a JAR file.\n\n```shell\ndocker run --rm --name sdfeater-app --mount type=bind,source=/home/user/input,target=/app/input,readonly lszeremeta/sdfeater:latest -i input/chebi_test.sdf -f microdata  \u003e molecules.html\n```\n\nReturns simple HTML with added [Microdata](https://www.w3.org/TR/microdata/) and redirects output to `molecules.html` file. Run from pre-build Docker image.\n\nIn the `examples` directory you can find example of SDF files based on data from [ChEBI](https://www.ebi.ac.uk/chebi/init.do) and [DrugBank  open structures](https://www.drugbank.ca/releases/latest#open-data) databases.\n\n## Publications and resources\n\nIf you need more detailed information, take a look at these publications and resources. There you will find a detailed description of the parser, performance tests, and example of Cypher outputs.\n\n1. Ł. Szeremeta, \"SDFEater: A Parser for Chemoinformatics Formats\" 9 2018 \\[Online]. Available: \u003chttps://doi.org/10.26434/chemrxiv.7123193\u003e.\n2. D. Tomaszuk and Ł. Szeremeta, \"Named Property Graphs\" in Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 15. IEEE, 2018, pp. 173–177. (2018) \\[Online]. Available: \u003chttp://dx.doi.org/10.15439/2018F103\u003e.\n3. Ł. Szeremeta and D. Tomaszuk, “SDFParser example Cypher outputs”. figshare, 10-May-2018 \\[Online]. Available: \u003chttps://doi.org/10.6084/m9.figshare.6249962\u003e.\n4. D. Tomaszuk, “chemskos”. figshare, 29-Aug-2018 \\[Online]. Available: \u003chttps://doi.org/10.6084/m9.figshare.7022144\u003e.\n\n## Used open source projects\n\n* [Apache Commons CLI](https://github.com/apache/commons-cli) as CLI controller ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)),\n* [Gson](https://github.com/google/gson) as periodic table JSON parser ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)),\n* [periodic-table](https://github.com/andrejewski/periodic-table) - base JSON periodic table file ([ISC License](https://choosealicense.com/licenses/isc/)),\n* [Apache Jena](https://jena.apache.org/) - for some output formats ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)),\n* [Apache Commons Text](https://commons.apache.org/proper/commons-text/) - to HTML escape for RDFa and Microdata formats ([Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)).\n\nThe sample SDF files in the examples and test directory are based on data from [ChEBI](https://www.ebi.ac.uk/chebi/init.do) ([CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)) and [DrugBank](https://www.drugbank.ca/releases/latest#open-data) open structures ([CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/)) databases.\n\n## Contribution\n\nWould you like to improve the SDFEater? Great! We are waiting for your help and suggestions. If you are new to open source contributions, read [How to Contribute to Open Source](https://opensource.guide/how-to-contribute/).\n\n## License\n\nDistributed under [MIT License](https://github.com/lszeremeta/chebi-sdf-parser/blob/master/LICENSE).\n\n## See also\n\nThese projects can also be useful:\n\n* [Molstruct](https://github.com/lszeremeta/molstruct) - Convert chemical molecule data CSV files to structured data formats\n* [MEgen](https://github.com/lszeremeta/MEgen) - Convenient online form to generate structured data about molecules\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flszeremeta%2Fsdfeater","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flszeremeta%2Fsdfeater","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flszeremeta%2Fsdfeater/lists"}