{"id":15034106,"url":"https://github.com/microsoft/recognizers-text","last_synced_at":"2025-05-14T22:05:44.509Z","repository":{"id":37454259,"uuid":"88544417","full_name":"microsoft/Recognizers-Text","owner":"microsoft","description":"Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, date/time, etc. in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI, NL. Partial support for JA, KO, AR, SV). Packages available at: https://www.nuget.org/profiles/Recognizers.Text, https://www.npmjs.com/~recognizers.text","archived":false,"fork":false,"pushed_at":"2025-02-19T04:46:37.000Z","size":51730,"stargazers_count":1712,"open_issues_count":211,"forks_count":434,"subscribers_count":64,"default_branch":"master","last_synced_at":"2025-05-07T21:16:40.460Z","etag":null,"topics":["date","datetime","datetime-normalization-and-resolution","entity-extraction","hacktoberfest","ner","nlp","number-expression","numbers","numex","parser","parser-library","time","time-expression","time-expression-recognition","timex"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-17T19:45:47.000Z","updated_at":"2025-05-01T14:40:59.000Z","dependencies_parsed_at":"2024-01-13T17:47:33.506Z","dependency_job_id":"271a7113-c290-4cb7-8410-59332e80225f","html_url":"https://github.com/microsoft/Recognizers-Text","commit_stats":{"total_commits":2050,"total_committers":136,"mean_commits":"15.073529411764707","dds":0.8541463414634146,"last_synced_commit":"90e968ea44dc8fda3f2e1c3e4b20aa233817488d"},"previous_names":[],"tags_count":64,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRecognizers-Text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRecognizers-Text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRecognizers-Text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FRecognizers-Text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/Recognizers-Text/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254235687,"owners_count":22036962,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["date","datetime","datetime-normalization-and-resolution","entity-extraction","hacktoberfest","ner","nlp","number-expression","numbers","numex","parser","parser-library","time","time-expression","time-expression-recognition","timex"],"created_at":"2024-09-24T20:23:57.561Z","updated_at":"2025-05-14T22:05:44.414Z","avatar_url":"https://github.com/microsoft.png","language":"C#","readme":"# Microsoft Recognizers Text Overview\r\n\r\n![Build Status](https://msrasia.visualstudio.com/_apis/public/build/definitions/310c848f-b260-4305-9255-b97bfb69974b/116/badge)\r\n![Build Status](https://ci.appveyor.com/api/projects/status/github/Microsoft/Recognizers-Text?branch=master\u0026svg=true\u0026passingText=all%20plats%20-%20OK)\r\n\r\nMicrosoft.Recognizers.Text provides robust recognition and resolution of entities like numbers, units, and date/time; expressed in multiple languages. Full support for Chinese, English, French, Spanish, Portuguese, German, Italian, Turkish, Hindi, and Dutch. Partial support for Japanese, Korean, Arabic, and Swedish. More on the way.\r\n\r\n# Utilizing the Project\r\n\r\nMicrosoft.Recognizers.Text powers pre-built entities in [**LUIS: Language Understanding Intelligent Service**](https://www.luis.ai/home), [**Power Virtual Agents**](https://powervirtualagents.microsoft.com/en-us/), and [**Microsoft Bot Framework**](https://dev.botframework.com/); base entity types in [**Text Analytics Cognitive Service**](https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking); and it is also available as standalone packages (for the base classes and the different entity recognizers).\r\n\r\nThe Microsoft.Recognizers.Text packages currently target four platforms:\r\n* [C#/.NET](https://github.com/Microsoft/Recognizers-Text/tree/master/.NET) - **NuGet packages** available at: https://www.nuget.org/profiles/Recognizers.Text\r\n* [JavaScript/TypeScript](https://github.com/Microsoft/Recognizers-Text/tree/master/JavaScript/packages/recognizers-text-suite) - **NPM packages** available at: https://www.npmjs.com/~recognizers.text\r\n* [Python](https://github.com/Microsoft/Recognizers-Text/tree/master/Python) - **PyPI packages** available at: https://pypi.org/user/recognizers-text/ (alpha)\r\n* [Java](https://github.com/Microsoft/Recognizers-Text/tree/master/Java) (in progress)\r\n\r\nContributions are greatly welcome! Both for fixes and extensions in the currently supported languages and for expansion to new ones.\r\nEspecially for Japanese, Korean, Arabic, Swedish, and others! More info below.\r\n\r\n.NET is the primary package version and contributions propagate to the other platforms with time.\r\n\r\n## Citing the Recognizers-Text project\r\n\r\nIf you utilize the recognizers in academic works, please cite it as below (you can omit the version number or update it to a specific version if relevant):\r\n\r\n```tex\r\n@software{soft:recognizers-text,\r\n  author    = {Wenhao Huang and Zijia Lin and Chris McConnell and B{\\\"{o}}rje F. Karlsson},\r\n  title     = {{Recognizers-Text}: {R}ecognition and resolution of numbers, units, and date/time entities expressed across multiple languages},\r\n  month     = jul,\r\n  year      = 2017,\r\n  publisher = {Zenodo},\r\n  version   = {1.0.0},\r\n  doi       = {10.5281/zenodo.6860598},\r\n  url       = {https://doi.org/10.5281/zenodo.6860598}\r\n}\r\n```\r\n\r\nFeel free to change \"@software\" to \"@misc\" if it better fits your templates.\r\n\r\n# Help\r\n\r\nIf you have any questions, please go ahead and [open an issue](https://github.com/Microsoft/Recognizers-Text/issues/new/choose), even if it's not an actual bug. Issues are an acceptable discussion forum as well.\r\n\r\n# Contributing\r\n\r\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\r\n\r\nGood starting points for contribution are:\r\n* the list of [open issues](https://github.com/Microsoft/Recognizers-Text/issues) (especially those marked as ```help wanted```); \r\n* the json spec cases temporarily marked as ```NotSupported``` ([Specs](./Specs)); and\r\n* translating json test spec cases that work in English, but don't yet exist in a target language.\r\n\r\nThe links below describe the project structure and provide both an overview and tips on how to contribute (although some steps may have become a little out-of-date). Thank you!\r\n\r\n* [Overview and language resources](https://blog.botframework.com/2018/01/24/contributing-luis-microsoft-recognizers-text-part-1/)\r\n* [Implementing language specific behaviour](https://blog.botframework.com/2018/02/01/contributing-luis-microsoft-recognizers-text-part-2/)\r\n* [Test specs and testing in general](https://blog.botframework.com/2018/02/12/contributing-luis-microsoft-recognizers-text-part-3/)\r\n\r\n# Supported Entities across Cultures\r\n\r\nThe table below summarizes the currently supported entities. Support for English is usually more complete than others. The primary platform is .NET (shown in table) and support should propagate to the others.\r\n\r\n| Entity Type       | EN      | ZH-CN   | NL    | FR     | DE    | IT      | JA     | KO     | PT     | ES      |\r\n|:-----------------:|:-------:|:-------:|:-----:|:------:|:-----:|:-------:|:------:|:------:|:------:|:-------:| \r\n| Number (cardinal)    | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | ✓      | ✓     | ✓      | ✓       |\r\n| Ordinal              | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | ✓      | ✓     | ✓      | ✓       |\r\n| Percentage           | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | ✓      | ✓     | ✓      | ✓       |\r\n| Number Range         | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | PA/EO  | ✓      | ✓     | ✓       |\r\n| Unit - Age           | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | ✓      | PA/EO  | ✓     | ✓       |\r\n| Unit - Currency      | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | ✓      | PA/EO  | ✓     | ✓       |\r\n| Unit - Dimensions    | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | ✓      | PA/EO  | ✓      | ✓      | \r\n| Unit - Temperature   | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | ✓      | ✓     | ✓      | ✓      | \r\n| Choice - Boolean     | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | ✓      | **SO** | ✓     | ✓       | \r\n| Seq. - E-mail        | G    | G*       | G    | G      | G     | G       | G*     | G*     | G      | G       |\r\n| Seq. - GUID          | G    | G        | G    | G      | G     | G       | G      | G      | G      | G       |\r\n| Seq. - Social        | G    | G        | G    | G      | G     | G       | G      | G      | G      | G       |\r\n| Seq. - IP Address    | G    | G        | G    | G      | G     | G       | G      | G      | G      | G       |\r\n| Seq. - Phone Number  | G    | G        | G    | G      | G     | G       | G      | G      | G      | G       |\r\n| Seq. - URL           | G    | G*       | G    | G      | G     | G       | G*     | G*     | G      | G       |\r\n| DateTime (+subtypes) | ✓    | ✓       | ✓    | ✓     | ✓     | ✓       | ✓      | **SO** | ✓     | ✓       | \r\n\r\n| Entity Type       | SV      | BG      | TR    | HI     | AR     |         |        |        |        |         |\r\n|:-----------------:|:-------:|:-------:|:-----:|:------:|:------:|:-------:|:------:|:------:|:------:|:-------:| \r\n| Number (cardinal)    | ✓    | :x:     | ✓    | ✓      | PA/EO  |         |        |        |        |         |\r\n| Ordinal              | ✓    | :x:     | ✓    | ✓      | PA/EO  |         |        |        |        |         |\r\n| Percentage           | ✓    | :x:     | ✓    | ✓      | PA/EO  |         |        |        |        |         |\r\n| Number Range         | :x:  | :x:     | ✓     | ✓     | PA/EO  |         |        |        |        |         |\r\n| Unit - Age           | ✓    | :x:     | ✓     | ✓     | :x:    |         |        |        |        |         |\r\n| Unit - Currency      | ✓    | :x:     | ✓     | ✓     | :x:    |         |        |        |        |         |\r\n| Unit - Dimensions    | ✓    | :x:     | ✓     | ✓     | :x:    |         |        |        |        |         | \r\n| Unit - Temperature   | ✓    | :x:     | ✓     | ✓     | :x:    |         |        |        |        |         | \r\n| Choice - Boolean     | ✓    | ✓      | ✓     | ✓      | ✓     |         |        |        |        |         |\r\n| Seq. - E-mail        | G    | G       | G     | G      | G      |         |        |        |        |         |\r\n| Seq. - GUID          | G    | G       | G     | G      | G      |         |        |        |        |         |\r\n| Seq. - Social        | G    | G       | G     | G      | G      |         |        |        |        |         |\r\n| Seq. - IP Address    | G    | G       | G     | G      | G      |         |        |        |        |         |\r\n| Seq. - Phone Number  | :x:  | :x:     | :x:   | :x:    | :x:    |         |        |        |        |         |\r\n| Seq. - URL           | G    | G       | G     | G*     | G*     |         |        |        |        |         |\r\n| DateTime (+subtypes) | **SP** | :x:     | ✓     | ✓     | **SO** |         |        |        |        |         |\r\n\r\n* G: Generic entity, not language-specific (* unicode TLDs not-supported);\r\n* EO: Extraction-only (parsing/resolution/normalization pending);\r\n* PA: Partial support (type not fully supported);\r\n* SO: Specs-only (test specs coverage OK, but support pending);\r\n* SP: Partial specs;\r\n* SI: Very initial specs (typically language support start for a new language).\r\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Frecognizers-text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Frecognizers-text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Frecognizers-text/lists"}