{"id":26031674,"url":"https://github.com/apache/uima-uimacpp","last_synced_at":"2025-04-10T02:42:58.354Z","repository":{"id":12416446,"uuid":"15070663","full_name":"apache/uima-uimacpp","owner":"apache","description":"C++ support for Apache UIMA","archived":false,"fork":false,"pushed_at":"2024-08-21T02:21:14.000Z","size":2317,"stargazers_count":16,"open_issues_count":15,"forks_count":18,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-03T04:17:11.959Z","etag":null,"topics":["apache","java","text-analysis","uima"],"latest_commit_sha":null,"homepage":"https://uima.apache.org/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2013-12-10T08:00:24.000Z","updated_at":"2025-03-29T23:08:16.000Z","dependencies_parsed_at":"2024-05-28T06:34:36.038Z","dependency_job_id":"d9630133-94a1-42b1-aafd-c0f39ac187ea","html_url":"https://github.com/apache/uima-uimacpp","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fuima-uimacpp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fuima-uimacpp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fuima-uimacpp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fuima-uimacpp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/uima-uimacpp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247715093,"owners_count":20983991,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache","java","text-analysis","uima"],"created_at":"2025-03-06T20:20:53.691Z","updated_at":"2025-04-10T02:42:58.327Z","avatar_url":"https://github.com/apache.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"Apache UIMA C++ SDK\n===================\n\nThe UIMA C++ framework is currently undergoing a number of enhancements to allow for full standalone pipelines written in C++ or in supported scripting languages. As such, there is no available distribution and there are some [major enhancements](https://github.com/apache/uima-uimacpp/issues/6) being [worked on](https://cwiki.apache.org/confluence/display/COMDEV/GSoC+2024+Ideas+list#GSoC2024Ideaslist-UIMA). If interested in contributing, contact the [current maintainer](https://github.com/DrDub).\n\n\nWhat is the UIMA C++ SDK?\n-------------------------\n\nThe UIMA C++ framework is designed to facilitate the creation of UIMA compliant Analysis Engines (AE) from analytics written in C++, or written in languages that can utilize C++ libraries. The UIMACPP SDK directly supports C++, and indirectly supports Perl and Python languages via SWIG (https://www.swig.org/). Existing analytic programs in any of these languages can be wrapped with a UIMACPP annotator and integrated with other UIMA compliant analytics or UIMA-based applications. \n\n![Framework Core](docs/images/framework-core.png)\n\nThe UIMA C++ framework supports testing and embedding UIMA components into native processes. A UIMA C++ test driver, `runAECpp`, is available so that UIMA C++ components can be fully developed and tested in the native environment, no use of Java is needed.\n\nUIMA C++ includes APIs to parse component descriptors, instantiate and call analysis engines, so that UIMA C++ compliant AE can be used in native applications. The Apache UIMA C++ SDK is Docker-based. For interoperability, UIMA C++ components are expected to be built and distributed against a particular Docker image, thus ensuring correct compiler and dependent library settings.\n\n\nBuilding\n--------\n\n### Checking out the code\n\nCheckout the source code as follows:\n\n    git clone https://github.com/apache/uima-uimacpp.git\n\nUIMACPP runtime prerequisites are APR, ICU, Xerces-C, APR-Util and a JDK for building the JNI interface. \nThe SDK also requires doxygen for building the documentation. See the [Dockerfile](Dockerfile) for details.\n\n\n### Building the Docker image\n\nThe Docker image is built on top of Debian stable slim image. After cloning the project, on the root directory do:\n\n```bash\nsudo docker build . -t apache:uimacpp\n```\nThis should create an image about 250+ Mb in size.\n\n### Testing the Docker image\n\nThe easier way to test it is by running the Perltorator:\n\n```bash\nmkdir out\nsudo docker run --interactive --tty --name uimacppdev \\\n  --mount type=bind,source=\"$(pwd)\"/examples/data,target=/data \\\n  --mount type=bind,source=\"$(pwd)\"/out,target=/out \\\n  apache:uimacpp \\\n  /usr/local/uimacpp/desc/Perltator.xml /data /out\n```\n\nThe `out` folder will be populated by XMI files with the same name as the original files in `data`.\n\nOther useful Docker commands:\n\n```bash\nsudo docker rm uimacppdev\n```\n\nTo remove an old container.\n\n```bash\nsudo docker run --interactive --tty --name uimacppdev  --entrypoint /bin/bash apache:uimacpp\n```\n\nTo run a container interactively using `bash`.\n\n\n\nExamples\n--------\n\nThe UIMACPP package includes several sample UIMA C++ annotators and a sample C++ application that instantiates and uses a C++ annotator. More details on how to build and run the examples will be available over time.\n\nA UIMA C++ annotator descriptor differs from a Java descriptor in the frameworkImplementation, specifying\n\n    \u003cframeworkImplementation\u003eorg.apache.uima.cpp\u003c/frameworkImplementation\u003e\n\nFor a C++ annotator, the annotatorImplementationName specifies the name of a dynamic link library. UIMACPP will add the OS appropriate suffix and search the active dynamic libary path (`LD_LIBRARY_PATH` for Linux). The suffix is not automatically added when the annotatorImplementationName includes a path.\nAn annotator library is derived from the UIMACPP class \"Annotator\" and must implement basic annotator methods. Annotators in Perl and Python languages each use a C++ annotator to instantiate the appropriate interpreter, load the specified annotator source and call the annotator methods.\n\n\n\nUIMACPP Example - Running a C++ analytic in a Native Process\n------------------------------------------------------------\n\nAs in UIMA, UIMACPP includes application level methods to instantiate an Analysis Engine from a UIMA annotator descriptor, create a CAS using the AE type system, and call AE methods.\n\n`examples/src/ExampleApplication.cpp` is a simple program that instantiates the specified annotator, reads a directory of txt files, and for each file sets the document text in a CAS and calls the AE process method. For annotator development, this program can be modified to create arbitrary CAS content to drive the annotator. Because the entire application is C++, standard tools such as `gdb` or `devenv` can be easily used for debugging.\n\n`runAECpp` is a UIMA C++ application driver modeled closely after the Java tool runAE. Like `ExampleApplication`, this tool can read a directory of text files and exercise the given annotator. In addition, `runAECpp` can take input from XML format CAS files, call the annotator's `process()` method, and output the resultant CAS in XML format files. XML format CAS input files can be created from upstream UIMA components, or created manually with the content needed to develop and unit test an annotator. This is the default [entrypoint point](docker-entrypoint.sh) for the Docker image.\n\n![UIMA CPP Native Deployment](docs/images/uimacppnative.png)\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fuima-uimacpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Fuima-uimacpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fuima-uimacpp/lists"}