{"id":14987989,"url":"https://github.com/apache/uima-ruta","last_synced_at":"2025-10-19T12:30:30.807Z","repository":{"id":830353,"uuid":"263178771","full_name":"apache/uima-ruta","owner":"apache","description":"Apache UIMA Ruta","archived":false,"fork":false,"pushed_at":"2025-01-08T12:26:16.000Z","size":20638,"stargazers_count":18,"open_issues_count":3,"forks_count":5,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-01-30T10:34:17.726Z","etag":null,"topics":["apache","java","ruta","text-analysis","uima"],"latest_commit_sha":null,"homepage":"https://uima.apache.org","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-11T23:07:09.000Z","updated_at":"2025-01-08T11:45:48.000Z","dependencies_parsed_at":"2023-07-06T21:01:21.345Z","dependency_job_id":"ea083723-e923-4842-8e5a-7773f99b064a","html_url":"https://github.com/apache/uima-ruta","commit_stats":{"total_commits":2349,"total_committers":11,"mean_commits":"213.54545454545453","dds":"0.16432524478501487","last_synced_commit":"e072334f612516b38f50933e0d9cf4e6242a0db3"},"previous_names":[],"tags_count":29,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fuima-ruta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fuima-ruta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fuima-ruta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fuima-ruta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/uima-ruta/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237125516,"owners_count":19259298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache","java","ruta","text-analysis","uima"],"created_at":"2024-09-24T14:15:55.084Z","updated_at":"2025-10-19T12:30:30.802Z","avatar_url":"https://github.com/apache.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Maven Central](https://img.shields.io/maven-central/v/org.apache.uima/ruta-core?style=for-the-badge)](https://search.maven.org/search?q=g:org.apache.uima%20a:ruta*)\n\n[![Build Status](https://ci-builds.apache.org/buildStatus/icon?job=UIMA%2Fuima-ruta%2Fmain\u0026subject=main%20build)](https://ci-builds.apache.org/job/UIMA/job/uima-ruta/job/main/) [![Build Status](https://ci-builds.apache.org/buildStatus/icon?job=UIMA%2Fuima-ruta%2Fmain\u0026subject=main%20build)](https://ci-builds.apache.org/job/UIMA/job/uima-ruta/job/main/)\n\nWhat is Apache UIMA Ruta?\n-------------------------\n\nApache UIMA Ruta™ is a rule-based script language supported by Eclipse-based tooling.\nThe language is designed to enable rapid development of text processing applications within Apache UIMA™.\nA special focus lies on the intuitive and flexible domain specific language for defining patterns of annotations.\nWriting rules for information extraction or other text processing applications is a tedious process.\nThe Eclipse-based tooling for UIMA Ruta, called the Apache UIMA Ruta Workbench, was created to support the user and to facilitate every step when writing UIMA Ruta rules.\nBoth the Ruta rule language and the UIMA Ruta Workbench integrate smoothly with Apache UIMA.\n\n\nRule Language\n-------------\n\nThe UIMA Ruta language is an imperative rule language extended with scripting elements.\nA rule defines a pattern of annotations with additional conditions. If this pattern applies, then the actions of the rule are performed on the matched annotations.\nA rule is composed of a sequence of rule elements and a rule element usually consists of four parts: \nA matching condition, an optional quantifier, a list of conditions and a list of actions.\nThe matching condition is typically a type of an annotation by which the rule element matches on the covered text of one of those annotations.\nThe quantifier specifies, whether it is necessary that the rule element successfully matches and how often the rule element may match.\nThe list of conditions specifies additional constraints that the matched text or annotations need to fulfill.\nThe list of actions defines the consequences of the rule and often creates new annotations or modifies existing annotations.\n\nThe following example rule consists of three rule elements. The first one (`ANY...`) matches on every token, which has a covered text that occurs in a word lists, named `MonthsList`.\nThe second rule element (`PERIOD?`) is optional and does not need to be fulfilled, which is indicated by the quantifier `?`. \nThe last rule element (`NUM...`) matches on numbers that fulfill the regular expression `REGEXP(\".{2,4}\")` and are therefore at least two characters to a maximum of four characters long.\nIf this rule successfully matches on a text passage, then its three actions are executed:\nAn annotation of the type `Month` is created for the first rule element, an annotation of the type `Year` is created for the last rule element and an annotation of the type `Date` is created for the span of all three rule elements.\nIf the word list contains the correct entries, then this rule matches on strings like `Dec. 2004`, `July 85` or `11.2008` and creates the corresponding annotations.\n  \n~~~~\n(ANY{INLIST(MonthsList) -\u003e Month} PERIOD? @NUM{REGEXP(\".{2,4}\") -\u003e Year}){-\u003e Date};\n~~~~\n\nHere is a short overview of additional features of the rule language:\n\n* Expressions and variables\n* Import and execution of external components\n* Flexible matching with filtering\n* Modularization in different files or blocks\n* Control structures, e.g., for windowing\n* Score-based extraction\n* Modification\n* HTML support \n* Dictionaries\n* Extensible language definition\n\n\nWorkbench\n---------\n\nThe UIMA Ruta Workbench was created to facilitate all steps in creating Analysis Engines based on the UIMA Ruta language.\nHere is a short overview of included features: \n\n**Editing support:** The full-featured editor for the UIMA Ruta language provides syntax and semantic highlighting, \nsyntax checking, context-sensitive auto-completion, template-based completion, open declaration and more.\n\n**Rule Explanation:** Each step in the matching process can be explained: This includes how often a rule was applied, \nwhich condition was not fulfilled, or by which rule a specific annotation was created. Additionally, profile information \nabout the runtime performance can be accessed.\n\n**Automatic Validation:** UIMA Ruta scripts can automatically validated against a set of annotated documents (F1 score, test-driven development) \nand even against unlabeled documents (constraint-driven evaluation). \n\n**Rule learning:** The supervised learning algorithms of the included TextRuler framework are able to induce rules \nand, therefore, enable semi-automatic development of rule-based components.\n\n**Query:** Rules can be used as query statements in order to investigate annotated documents.\n\n\nThe UIMA Ruta Workbench can be installed via Eclipse update site [https://downloads.apache.org/uima/eclipse-update-site-v3](https://downloads.apache.org/uima/eclipse-update-site-v3).\n\n\nBuilding from the Source Distribution\n-------------------------------------\n\nWe use Maven 3.9.9 and Java 17 or later for building; download this if needed, \nand set the environment variable `MAVEN_OPTS` to `-Xmx800m`.\n\nThen do the build by going into the UIMA Ruta directory, and issuing the command\n   \n   mvn clean install\n   \nThis builds everything except the `...source-release.zip` file. If you want that,\nchange the command to \n\n   mvn clean install -Papache-release\n   \nFor more details, please see [https://uima.apache.org/building-uima.html](https://uima.apache.org/building-uima.html).\n\n**Build options**\n* `-Ddisable-build-eclipse-plugins`: do not build the Eclipse plugins\n\nHow to Get Involved\n-------------------\n\nThe Apache UIMA project really needs and appreciates any contributions, including documentation \nhelp, source code and feedback. If you are interested in contributing, please visit \n[http://uima.apache.org/get-involved.html](http://uima.apache.org/get-involved.html).\n\n\nHow to Report Issues\n--------------------\n\nThe Apache UIMA project uses GitHub for issue tracking. Please report any issues you find at \n[our issue tracker](https://github.com/apache/uima-ruta/issues).\n\n\nUseful tips\n-----------\n\nThis product was originally released as Apache UIMA TextMarker. The UIMA Ruta Workbench provides\na command for updating old projects. Please right-click on a project and select **UIMA Ruta -\u003e Update Project**. \n\nThe UIMA Ruta analysis engine requires type priorities for the correct execution of rules. \nIf a CAS is created using the `CasCreationUtils`, please provide the type priorities, e.g., by:\n\n    URL tpUrl = this.getClass().getResource(\"/org/apache/uima/ruta/engine/TypePriorities.xml\");\n    TypePriorities typePriorities = UIMAFramework.getXMLParser().parseTypePriorities(\n        new XMLInputSource(tpUrl));\n    CAS cas = CasCreationUtils.createCas(descriptor, typePriorities, new FsIndexDescription[0]);\n\nUsing the `jcasgen-maven-plugin` may cause problems if it creates duplicate classes for the \ninternal UIMA Ruta types (overwriting the implementation of _RutaBasic_). Depending on the location \nof the type system descriptors, the plugin should be configured to be limited on the project, \nor the UIMA Ruta type system descriptors should explicitly be excluded:\n\n    \u003cconfiguration\u003e\n      \u003ctypeSystemExcludes\u003e\n        \u003ctypeSystemExclude\u003e/**/BasicTypeSystem.xml\u003c/typeSystemExclude\u003e\n        \u003ctypeSystemExclude\u003e/**/InternalTypeSystem.xml\u003c/typeSystemExclude\u003e\n      \u003c/typeSystemExcludes\u003e\n    \u003c/configuration\u003e\n\n\nUseful links\n------------\n\n* [Apache UIMA](https://uima.apache.org)\n* [Apache UIMA Ruta Documentation](https://uima.apache.org/d/ruta-current/tools.ruta.book.html)\n* [Averbis Ruta Training material](https://github.com/averbis/ruta-training) (external)\n\n\nReference\n---------\n\nIf you use UIMA Ruta to support academic research, then please consider citing the following paper as appropriate:\n\n~~~~\n@article{NLE:10051335,\n  author = {Kluegl, Peter and Toepfer, Martin and Beck, Philip-Daniel and Fette, Georg and Puppe, Frank},\n  title = {UIMA Ruta: Rapid development of rule-based information extraction applications},\n  journal = {Natural Language Engineering},\n  volume = {22},\n  issue = {01},\n  month = {1},\n  year = {2016},\n  issn = {1469-8110},\n  pages = {1--40},\n  numpages = {40},\n  doi = {10.1017/S1351324914000114},\n  URL = {https://journals.cambridge.org/article_S1351324914000114},\n}\n~~~~\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fuima-ruta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Fuima-ruta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fuima-ruta/lists"}