https://github.com/digamma-ai/timeextractor
Time Extractor NLP project - locate dates and times in text documents
https://github.com/digamma-ai/timeextractor
Last synced: 4 months ago
JSON representation
Time Extractor NLP project - locate dates and times in text documents
- Host: GitHub
- URL: https://github.com/digamma-ai/timeextractor
- Owner: digamma-ai
- License: other
- Created: 2017-10-10T18:08:18.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2022-10-18T20:21:09.000Z (over 3 years ago)
- Last Synced: 2025-09-15T01:30:53.312Z (8 months ago)
- Language: Java
- Size: 9.56 MB
- Stars: 22
- Watchers: 2
- Forks: 6
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-java - TimeExtractor
README
# timeextractor
Time Extractor NLP project - locate dates and times in text documents
## Introduction
The project was developed by [Digamma.ai](http://digamma.ai/). The goal of the project is to develop a library to find and extract time/date information from textual documents.
The main goal is to indentify texts fragments that are related to time/date/period (exact date, time of day, day of the week, months, seasons, time intervals, etc.) and make structural forms from them. We tried to detect a variety of textual representations and handle things like recurring times (e.g. "every Wednesday").
## Installation
Clone the repository and create `.jar` file, with
```
git clone https://github.com/digamma-ai/timeextractor.git timeextractor
cd timeextractor
maven clean install
```
You will find in `target/` folder a jar named like `timeextractor.jar`.
## [Python Installation and Usage](https://github.com/digamma-ai/timeextractor/blob/master/src/main/python/pytimeextractor/README.rst)
## [Project on PyPI](https://pypi.python.org/pypi/pytimeextractor)
## Dependencies
This library is built on:
* [joda-time](https://github.com/JodaOrg/joda-time) Library for the Java date and time classes
* [opencsv](http://opencsv.sourceforge.net/) Parser Library
* [JUnit](http://junit.org/junit5/) Testing Framework
* [Log4j](https://logging.apache.org/log4j/2.x/) Logging Service
* [Gson](https://github.com/google/gson) Json Serialization/Deserialization library
## Quickstart
Class `DateTimeExtractor` is the main class for using Timeextractor. `DateTimeExtractor` is used by first constructing a `DateTimeExtractor` instance and then invoking `extract()` method on it. `extract()` is convenience method to extract date/time fragments from input text.
| **Method** | **Attributes** | **Description** |
| ----| --------- | ----- |
| `extract()` |`String` text | Extracts date/time fragments with default settings |
| Overloading `extract()` | `String` text, `Settings` settings | Extracts date/time fragments with custom settings |
| `extractFromCsv()` | `String` csvPath, `String` outputPath, `String` separator, `Settings` settings | Extracts date/time fragments from .csv file |
| `extractJson()` |`String` text, `Settings` settings | Extracts date/time fragments and saves output to JSON format |
| `extractFromCsvToJson()`| `String` csvPath, `String` outputPath, `String` separator, `Settings` settings | Extracts date/time fragments from .csv file to JSON format |
`TemporalExtraction` class representing an element of extracted date/time fragments.
Here is an example of how `DateTimeExtractor` and `TemporalExtraction` are used:
```
// input string
String inputText = "Reduced entrance fee after 16:30 except for Thursdays. Closed on Mondays.";
// extract date/times fragments
TreeSet result = DateTimeExtractor.extract(inputText);
// print extracted results
for (TemporalExtraction elem : result) {
System.out.println(elem);
}
```
The output will be:
```
1 after 16:30, [Temporal[type=TIME_INTERVAL, group=TimeGroup, rule=timeIntervalRule, duration=null, durationInterval=null, set=null, startDate=TimeDate [time=Time [hours=16, minutes=30, seconds=0, timezoneOffset=0], date=Date [year=2017, month=10, day=24, dayOfWeek=null, weekOfMonth=null]], endDate=null]], 21, 32
2 Thursdays, [Temporal[type=DATE, group=DateGroup, rule=dayOfWeekRule, duration=null, durationInterval=null, set=null, startDate=TimeDate [time=Time [hours=18, minutes=59, seconds=43, timezoneOffset=0], date=Date [year=2017, month=10, day=24, dayOfWeek=TH, weekOfMonth=null]], endDate=TimeDate [time=Time [hours=18, minutes=59, seconds=43, timezoneOffset=0], date=Date [year=2017, month=10, day=24, dayOfWeek=TH, weekOfMonth=null]]]], 44, 54
3 Mondays, [Temporal[type=DATE, group=DateGroup, rule=dayOfWeekRule, duration=null, durationInterval=null, set=null, startDate=TimeDate [time=Time [hours=18, minutes=59, seconds=43, timezoneOffset=0], date=Date [year=2017, month=10, day=24, dayOfWeek=MO, weekOfMonth=null]], endDate=TimeDate [time=Time [hours=18, minutes=59, seconds=43, timezoneOffset=0], date=Date [year=2017, month=10, day=24, dayOfWeek=MO, weekOfMonth=null]]]], 65, 73
```
## Output Description
The ouptut of the extraction process will be `TreeSet` of `TemporalExtraction` class. This class has next attributes:
| **Attributes** | **Description** |
| ---- | ----- |
| `String` temporalExpression | founded date/time fragment |
| `Temporal` temporal | represents date/time fragment's details |
`Temporal` class attributes:
| **Attributes** | **Description** |
| ---- | ----- |
| `String` type | type of founded date/time fragment (date, time, relative date, etc.)|
| `String` group | used group of rules for extracting current date/time fragment |
| `String` rule | used rule for extracting current date/time fragment |
| `Duration` duration | duration of extracting date/time fragment |
| `DurationInterval` temporal | duration interval of extracting date/time fragment |
| `Set` set | set of frequency, interval and days of repetiotion properties |
| `TimeDate` startDate | info about start date of extracting date/time fragment |
| `TimeDate` endDate | info about end date of extracting date/time fragment |
## Advanced settings
You can modify default extraction settings for some specific scenarios, like:
* find closest day of week according to current date for relative date;
* find closest date according to current date for relative date;
* change found time expression according to specified date and timezone;
* filter extraction rules;
* find only dates that are current date or after current date.
A `Settings` can be applied to specify some additional extraction options, like setting local user date/time, time-zone offset, filtering extraction rules and finding latest dates.
`SettingsBuilder` is used for constructing `Settings` instance when you need to set configuration options other than the default. `SettingsBuilder` is best used by creating it, and then invoking its various configuration methods, and finally calling build.
| **Method** | **Attributes** | **Description** |
| ----| --------- | ----- |
| `addRulesGroup()` |`String` rulesGroup | Adds extraction rules from `rulesGroup` group for extracting date/time fragments |
| `excludeRules()` | `String` ruleToExclude | Excludes extraction rule `ruleToExclude` from extracting rules |
| `addUserDate()` | `String` userDate | Changes found time expression according to specified user date
*correct format: "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"*|
| `addTimeZoneOffset()` | `String` timeZoneOffset | Changes found time expression according to specified user time-zone offset in minutes|
| `includeOnlyLatestDates()` | `boolean` includeOnlyLatest | Finds only dates that are current date or after current date |
| `build()` | | Creates a `Settings` instance based on the current configuration.
The following is an example shows how to use the `SettingsBuilder` to construct a `Settings` instance:
```
Settings settings = new SettingsBuilder()
.addRulesGroup("DateGroup")
.excludeRules("holidaysRule")
.addUserDate("2017-10-23T18:40:40.931Z")
.addTimeZoneOffset("100")
.includeOnlyLatestDates(true)
.build();
```
## Extraction rules
All extraction rules are divided into rules groups.
Group
Description
Example
DateGroup
Contains rules associated with the date
dayOfWeekRule
Extracts days of week fragments
Come along to celebrate on Saturday 16
relativeDateRule
Extracts relative dates fragments
It was 1 week ago.
Went there today.
holidaysRule
Extracts holidays dates fragments
We will meet on Christmas day.
monthDayRule
Extracts month-day dates fragments
The Snowy Day and the Art of Ezra Jack Keats (through January 29).
monthYearDayRule
Extracts year/month/day dates fragments
January 13-19, 2014 Show Times".
monthYearRule
Extracts year/month dates fragments
In March 2008, the Golden Gate Bridge District board approved a resolution to implement congestion pricing.
yearRule
Extracts year dates fragments
2013 is also the 850th anniversary of Notre-Dame.
DateIntervalGroup
Contains rules associated with the period between two dates
dateIntervalRule
Extracts intervals between dates
$3 off general admission with your uberX receipt from 10/16/13 - 10/18/13!
Best time to visit is from Tuesday to Thursday.
In main season (May - Sep ) the boat leaves daily exc.
DurationGroup
Contains rules associated with the period of time.
intervalDurationRule
Extracts duration intervals
It's acceptable to include 10 - 15 years of experience.
durationRule
Extracts periods of time
Buy a combined ticket it lasts two days
Was told that the last 30min before closing is free.
RepeatedGroup
Contains rules associated with repeated events.
repeatedRule
Extracts repeated events
Free organ show every Sunday at 4.
Try San Francisco City Guides, who offer free weekly tours
SeasonGroup
Contains rules associated with seasons of the year.
seasonRule
Extracts seasons of the year
In summer months , the park is an anti-urban oasis along the riverfront.
Catch the post-impressionist exhibit in the fall!
TimeGroup
Contains rules associated with the time.
timeRule
Extracts the time
Go before 4pm PST and get there in time for the Tower.
The 'Long Walk' on route to the races at about 1.30pm
timeIntervalRule
Extracts time intervals
Happy hour from 19 till 20 !!
Best between 2:00 pm and 4:00 pm to enjoy the sun
timeZoneRule
Extracts time zones
Closed by 21:00CET.
Last entry 04:15 UTC
WeekendGroup
Contains rules associated with weekends
weekendRule
Extracts seasons of the year
Weekend happy hour 11am-7pm