{"id":21982127,"url":"https://github.com/lhncbc/chemapp","last_synced_at":"2026-05-19T09:02:15.597Z","repository":{"id":86312611,"uuid":"265281854","full_name":"LHNCBC/chemapp","owner":"LHNCBC","description":"A baseline chemical annotator","archived":false,"fork":false,"pushed_at":"2021-12-22T18:52:51.000Z","size":238,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-23T01:21:56.282Z","etag":null,"topics":["clojure","java","named-entity-recognition"],"latest_commit_sha":null,"homepage":null,"language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LHNCBC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-19T15:10:47.000Z","updated_at":"2021-12-22T18:52:54.000Z","dependencies_parsed_at":"2023-03-13T09:45:52.726Z","dependency_job_id":null,"html_url":"https://github.com/LHNCBC/chemapp","commit_stats":null,"previous_names":["nlm-lhc/chemapp","lhncbc/chemapp"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LHNCBC/chemapp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LHNCBC%2Fchemapp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LHNCBC%2Fchemapp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LHNCBC%2Fchemapp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LHNCBC%2Fchemapp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LHNCBC","download_url":"https://codeload.github.com/LHNCBC/chemapp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LHNCBC%2Fchemapp/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267585676,"owners_count":24111576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-28T02:00:09.689Z","response_time":68,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","java","named-entity-recognition"],"created_at":"2024-11-29T17:22:17.694Z","updated_at":"2026-05-19T09:02:10.554Z","avatar_url":"https://github.com/LHNCBC.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"# chemapp/chemrecog\n\nchemapp - Chemical Entity Recognition System\n\n## Installation\n\n### Standalone web app and Socket Server Setup\n\nThe socket server and standalone web use the same directory\norganization.  The directories config and data must be in the\ntop-level directory \"chem\" as chem/config and chem/data.\n\n### Deployment to Tomcat\n\nBoth config and data must reside in chem/war-resources.  This can be\ndone by symbolically linking the directories to the ones in\nchem/config and chem/data.  The \"lein ring uberwar\" will automatically\ncopy the content of the directories to the deployment jar.\n\n## Socket Server Usage\n\nTo create the socket server use the command:\n\n    lein uberjar\n\nThe standalone jar file target/chem-0.1.1-SNAPSHOT-standalone.jar will\nbe generated.\n\nThe command line arguments the socket server are: [hostname [port]]\nThe server defaults to localhost on port 32000 unless specified.\n\n    $ java -jar target/chem-0.1.1-SNAPSHOT-standalone.jar [args]\n\n## Caveats\n\nThis system was primarily an experiment started in 2013 and has quite\na few short comings.   I was learning Clojure at the time after\npreviously attempting to implement the system in Python and then\nJava.   Initially, Clojure was intended to be used only for\nprototyping and then the final system would be written in Java.  That\nsystem never materialized.\n\nThe current dispatch system in process.clj should be re-implemented\nusing multi-methods.\n\nRecognition technologies conceived later are absent, including Deep\nLearning approaches such as Bi-LSTMs, CNN/RNN, Attention, and\nTransformers.  These could probably be added using Neanderthal\n(https://neanderthal.uncomplicate.org/) or\n(https://aria42.com/blog/2017/11/Flare-Clojure-Neural-Net)\n\n## Socket Server Arguments\n\nThe socket server accepts hostname and port or just the hostname in\nwhich case it uses the port 32000.  If no arguments are supplied then\nport 32000 on localhost (127.0.0.1) is used.\n\n## External Data (Mostly indexes)\n\nThe following files and directories data directory reside in the data\ndirectory.  When using the socket server the data directory resides in\ndirectory the server is executed from.  When using the servlet the\ndata directory resides in the servlet engine directory (usually\ntomcat) or in some cases it must be deployed with the servlet and\nresides in the servlet deployment directory (for instance:\nwebapps/chemapp/data.)\n\n+ corncob_lowercase.txt  - a file of approximately 58,000 english terms\n+ ivf       - contains IRutils version of normchem database (2017)\n+ lucenedb  - contains lucene version of normchem database (2015)\n+ models    - OpenNLP and Mallet CRF models\n\nAn tar bzipped archive containing these \ndata files is at:\nhttps://ii-public1.nlm.nih.gov/Xfer/chemappdata/chem-data.tbz.  After\ndownloading, extract the archive in the chemapp directory.\n\n## Examples\n\n### Example Request\n\ncombine5|Down-regulation of the expression of alcohol dehydrogenase 4 and CYP2E1 by the combination of α-endosulfan and dioxin in HepaRG human cells.|Pesticides and other persistent organic pollutants are considered as risk factors for liver diseases. We treated the human hepatic cell line HepaRG with both 2,3,7,8 tetrachlorodibenzo-p-dioxin (TCDD) and the organochlorine pesticide, α-endosulfan, to evaluate their combined impact on the expression of hepatic genes involved in alcohol metabolism. We show that the combination of the two pollutants (25nM TCDD and 10μM α-endosulfan) led to marked decreases in the amounts of both the mRNA (up to 90%) and protein (up to 60%) of ADH4 and CYP2E1. Similar results were obtained following 24h or 8days of treatment with lower concentrations of these pollutants. Experiments with siRNA and AHR agonists and antagonist demonstrated that the genomic AHR/ARNT pathway is necessary for the dioxin effect. The PXR, CAR and estrogen receptor alpha transcription factors were not modulators of the effects of α-endosulfan, as assessed by siRNA transfection. In another human hepatic cell line, HepG2, TCDD decreased the expression of ADH4 and CYP2E1 mRNAs whereas α-endosulfan had no effect on these genes. Our results demonstrate that exposure to a mixture of pollutants may deregulate hepatic metabolism.\n\n### Response\n\n    alcohol|C091419|37,44;471,478|\n    CYP2E1|D001141|65,71;680,686;1174,1180|\n    α-endosulfan|D004726|94,106;1195,1207|\n    dioxin|D013749|111,117;924,930|\n    tetrachlorodibenzo-p-dioxin|D013749|307,334|\n    TCDD|D013749|336,340;548,552;1132,1136|\n    organochlorine||350,364|\n    CAR|C024888|948,951|\n    estrogen||956,964|\n    EOF\n\n## Mallet\n\nSee scripts/setup.clj, src/chem/setup.clj, src/chem/paths.clj and \nsrc/chem/mallet.clj for information on generating features.\n\n+ scripts/setup.clj       -\n+ scripts/setupscript.clj -\n+ scripts/\n\n+ src/chem/setup.clj      - loads development and training corpora into memory\n+ src/chem/paths.clj      - file paths to training and development corpori\n+ src/chem/mallet.clj     - convenience functions for mallet.\n\n\n### Running Mallet \n\nTraining\n========\n\njava -cp /usr/local/pub/machinelearning/mallet-2.0.7/class:/usr/local/pub/machinelearning/mallet-2.0.7/lib/mallet-deps.jar \\\n    cc.mallet.fst.SimpleTagger --train true --iterations 250 \\\n    --threads 8 --model-file chemicalcrf data/features/training/*\n\nTesting\n=======\n\n    java -cp /usr/local/pub/machinelearning/mallet-2.0.7/class:/usr/local/pub/machinelearning/mallet-2.0.7/lib/mallet-deps.jar \\\n      cc.mallet.fst.SimpleTagger\n\n## Evaluation\n\nTo evaluate using training-set for testing:\n\n    bc-evaluate chemdner-resultlist.txt /nfsvol/nlsaux16/II\\_Group\\_WorkArea/Lan/projects/BioCreative/2013/CHEMDNER\\_TRAIN\\_V01/cdi\\_ann\\_training\\_13-07-31.txt\n\nTo evaluate using development-set for testing:\n\n    bc-evaluate chemdner-resultlist.txt /nfsvol/nlsaux16/II\\_Group\\_WorkArea/Lan/projects/BioCreative/2013/CHEMDNER\\_DEVELOPMENT\\_V02/cdi\\_ann\\_development_13-08-18.txt\n\nTo evaluate:\n\n    bc-evaluate subsume-chemdner-resultlist.txt /nfsvol/nlsaux16/II\\_Group\\_WorkArea/Lan/projects/BioCreative/2013/CHEMDNER\\_TRAIN\\_V01/cdi\\_ann\\_training_13-07-31.txt\n\n  \n### Bugs\n\nThe NormChem dictionary using Lucene indexes is currently broken.  The\nIRUtils dictionary currently provides the same function so the Lucene\nversion will probably be deprecated.\n\n## License\n\nSee [LICENSE](LICENSE.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flhncbc%2Fchemapp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flhncbc%2Fchemapp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flhncbc%2Fchemapp/lists"}