Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/paulrinckens/timexy
A spaCy custom component that extracts and normalizes temporal expressions
https://github.com/paulrinckens/timexy
date-parser datetime natural-language-processing nlp python spacy spacy-extension timeml timex3
Last synced: 3 months ago
JSON representation
A spaCy custom component that extracts and normalizes temporal expressions
- Host: GitHub
- URL: https://github.com/paulrinckens/timexy
- Owner: paulrinckens
- License: mit
- Created: 2022-02-17T21:02:36.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-02-13T23:00:14.000Z (almost 2 years ago)
- Last Synced: 2024-10-01T05:41:27.823Z (4 months ago)
- Topics: date-parser, datetime, natural-language-processing, nlp, python, spacy, spacy-extension, timeml, timex3
- Language: Python
- Homepage:
- Size: 19.5 KB
- Stars: 53
- Watchers: 2
- Forks: 8
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# ⏳ Timexy
A [spaCy](https://spacy.io/) [custom component](https://spacy.io/usage/processing-pipelines#custom-components) that extracts and normalizes dates and other temporal expressions.
## Features
- :boom: Extract dates and durations for various languages. See [here](#supported-languages) a list of currently supported languages
- :boom: Normalize dates to timestamps or normalize dates and durations to the [TimeML TIMEX3 standard](https://timeml.github.io/site/publications/timeMLdocs/timeml_1.2.1.html)## Supported Languages
- 🇩🇪 German
- :uk: English
- 🇫🇷 French## Installation
````
pip install timexy
````
## Usage
After installation, simply integrate the timexy component in any of your spaCy pipelines to extract and normalize dates and other temporal expressions:```py
import spacy
from timexy import Timexynlp = spacy.load("en_core_web_sm")
# Optionally add config if varying from default values
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")doc = nlp("Today is the 10.10.2010. I was in Paris for six years.")
for e in doc.ents:
print(f"{e.text}\t{e.label_}\t{e.kb_id_}")
``````bash
>>> 10.10.2010 timexy TIMEX3 type="DATE" value="2010-10-10T00:00:00"
>>> six years timexy TIMEX3 type="DURATION" value="P6Y"
```### Normalization of temporal expressions
Timexy allows the normalization of all temporal expressions to
- TimeML Timex3 standard
- timestampThe normalization is configured with the `kb_id_type` config parameter:
```python
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")
```> **_NOTE:_** Normalizing temporal expressions that are not concrete dates to timestamp is not viable. Therefore, all non-date temporal expressions are always normalized to timex3 regardless of the `kb_id_type` config.
## Contributing
Please refer to the contributing guidelines [here](https://github.com/paulrinckens/timexy/blob/main/CONTRIBUTING.md).