{"id":19710778,"url":"https://github.com/rootstrap/ctakes","last_synced_at":"2025-07-20T07:36:43.524Z","repository":{"id":39634233,"uuid":"288735567","full_name":"rootstrap/ctakes","owner":"rootstrap","description":"cTAKES - instructions and example","archived":false,"fork":false,"pushed_at":"2021-07-11T19:30:24.000Z","size":897,"stargazers_count":5,"open_issues_count":1,"forks_count":2,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-01-10T14:20:05.453Z","etag":null,"topics":["hacktoberfest"],"latest_commit_sha":null,"homepage":"","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rootstrap.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-19T13:11:57.000Z","updated_at":"2024-08-22T05:38:51.000Z","dependencies_parsed_at":"2022-09-20T07:12:01.607Z","dependency_job_id":null,"html_url":"https://github.com/rootstrap/ctakes","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootstrap%2Fctakes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootstrap%2Fctakes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootstrap%2Fctakes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rootstrap%2Fctakes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rootstrap","download_url":"https://codeload.github.com/rootstrap/ctakes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241029305,"owners_count":19896892,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hacktoberfest"],"created_at":"2024-11-11T22:08:24.760Z","updated_at":"2025-02-27T15:32:05.776Z","avatar_url":"https://github.com/rootstrap.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# cTAKES\n\nThis project aims to provide instructions for install and execute cTAKES. \n[Apache cTAKES](https://ctakes.apache.org/) is a natural language processing system for extraction of information from electronic medical record clinical free-text. cTAKES stands for clinical Text Analysis and Knowledge Extraction System. \n\nThis image shows different types of nodes that cTakes can identify.\n\n![](images/ctakes_image.png)\n\n\nThese types of nodes are defined in a dictionary [Unified Medical Language System (UMLS)](https://www.nlm.nih.gov/research/umls/index.html).     \n\n## Extraction\nIt has different functionalities:   \n- Entity recognition   \n- Boundary detection     \n- Tokenization    \n- Normalization      \n- Part-of-speech tagging     \n- Extract entity properties (negated/subject)     \n\n## cTAKES Terminology\n- Pipeline: sequence of cTAKES annotators performing a comprehensive NLP task.    \n- Analysis engine: A single cTAKES annotator used in a pipeline.    \n- Piper file: A plaintext file describing a pipeline      \n- CAS (Common Analysis Structure): This is the data structure through which the annotators in a pipeline communicate. \n\nThe [Default Clinical Pipeline](https://cwiki.apache.org/confluence/display/CTAKES/Default+Clinical+Pipeline) produces the most commonly desired output from cTAKES. You only need to run a bash script, setting the input and output directories. \n\nYou can create your own pipeline and edit the code of the componentes of the pipeline in order to improve the extraction according to your data or type of problem. \n \n## Running the docker container\n\n**Create user** at [https://uts.nlm.nih.gov/license.html](https://uts.nlm.nih.gov/license.html) and copy the API key\n\n\n**Build the container:**\n\n```bash\n   docker build --build-arg ctakes_key={KEY} --rm -t rootstrap/ctakes:latest .\n```\n\n**Run the container:**   \n\nRun the container including the environment variables CTAKES_KEY, INPUT_DIR and OUTPUT_DIR a volume for the input files and a volume for the output files \n\n```bash\n  docker run -ti --env CTAKES_KEY={KEY} --env INPUT_DIR=/input --env OUTPUT_DIR=/output -v $(pwd)/input:/input -v $(pwd)/output:/output rootstrap/ctakes:latest\n```\nThe results files will be at output directory.  \n\n## Example\nYou can use [XML Viewer](https://jsonformatter.org/xml-viewer) to inspect the result files. \n\n![](images/image_output.png)\n\n## Observation\n- cTAKES can be downloaded from  [http://ctakes.apache.org/downloads.html](http://ctakes.apache.org/downloads.html)    \n- The dictionaries can be downloaded from [http://ctakes.apache.org/downloads.html](http://ctakes.apache.org/downloads.html)      \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frootstrap%2Fctakes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frootstrap%2Fctakes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frootstrap%2Fctakes/lists"}