Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/paulrinckens/timexy

A spaCy custom component that extracts and normalizes temporal expressions
https://github.com/paulrinckens/timexy

date-parser datetime natural-language-processing nlp python spacy spacy-extension timeml timex3

Last synced: 3 months ago
JSON representation

A spaCy custom component that extracts and normalizes temporal expressions

Awesome Lists containing this project

README

        

# ⏳ Timexy


Package version


Codecov

A [spaCy](https://spacy.io/) [custom component](https://spacy.io/usage/processing-pipelines#custom-components) that extracts and normalizes dates and other temporal expressions.

## Features
- :boom: Extract dates and durations for various languages. See [here](#supported-languages) a list of currently supported languages
- :boom: Normalize dates to timestamps or normalize dates and durations to the [TimeML TIMEX3 standard](https://timeml.github.io/site/publications/timeMLdocs/timeml_1.2.1.html)

## Supported Languages
- 🇩🇪 German
- :uk: English
- 🇫🇷 French

## Installation
````
pip install timexy
````
## Usage
After installation, simply integrate the timexy component in any of your spaCy pipelines to extract and normalize dates and other temporal expressions:

```py
import spacy
from timexy import Timexy

nlp = spacy.load("en_core_web_sm")

# Optionally add config if varying from default values
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")

doc = nlp("Today is the 10.10.2010. I was in Paris for six years.")
for e in doc.ents:
print(f"{e.text}\t{e.label_}\t{e.kb_id_}")
```

```bash
>>> 10.10.2010 timexy TIMEX3 type="DATE" value="2010-10-10T00:00:00"
>>> six years timexy TIMEX3 type="DURATION" value="P6Y"
```

### Normalization of temporal expressions
Timexy allows the normalization of all temporal expressions to
- TimeML Timex3 standard
- timestamp

The normalization is configured with the `kb_id_type` config parameter:
```python
config = {
"kb_id_type": "timex3", # possible values: 'timex3'(default), 'timestamp'
"label": "timexy", # default: 'timexy'
"overwrite": False # default: False
}
nlp.add_pipe("timexy", config=config, before="ner")
```

> **_NOTE:_** Normalizing temporal expressions that are not concrete dates to timestamp is not viable. Therefore, all non-date temporal expressions are always normalized to timex3 regardless of the `kb_id_type` config.

## Contributing
Please refer to the contributing guidelines [here](https://github.com/paulrinckens/timexy/blob/main/CONTRIBUTING.md).