https://github.com/jaredreisinger/compromise-dates
temporary repo to expose compromise-dates as a GitHub-installable repo
https://github.com/jaredreisinger/compromise-dates
Last synced: about 1 year ago
JSON representation
temporary repo to expose compromise-dates as a GitHub-installable repo
- Host: GitHub
- URL: https://github.com/jaredreisinger/compromise-dates
- Owner: JaredReisinger
- Created: 2021-07-20T16:51:32.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-07-20T16:55:21.000Z (almost 5 years ago)
- Last Synced: 2025-02-10T05:26:16.591Z (over 1 year ago)
- Language: JavaScript
- Size: 233 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: changelog.md
Awesome Lists containing this project
README

This library is an earnest attempt to get date information out of text, in a clear way -
- including all informal text formats, and folksy shorthands.

```js
const nlp = require('compromise')
nlp.extend(require('compromise-dates'))
let doc = nlp('the second monday of february')
doc.dates().get(0)
/*
{ start: '2021-02-08T00:00:00.000Z', end: '2021-02-08T23:59:59.999Z'}
*/
```


• Tokenization and disambiguation with compromise.




### *Things it does well:*
| `explicit-dates` | *description* | `Start` | `End` |
| ------------- |:-------------:| -----:| -----:|
| *march 2nd* | |March 2, 12:00am | March 2, 11:59pm
| *2 march* | | '' | ''
| *tues march 2*| | '' | ''
| *march the second* | *natural-language number* | '' | ''
| *on the 2nd* | *implicit months* | '' | ''
| *tuesday the 2nd* | *date-reckoning* | '' | ''
|
**`numeric-dates:`**| |
| *2020/03/02* |*iso formats* | '' | ''
| *2020-03-02* | | '' | ''
| *03-02-2020* |*british formats* | '' | ''
| *03/02* | | '' | ''
| *2020.08.13* | *alt-ISO*| '' | ''
|
**`named-dates:`**| |
| *today* | | - | -
| *tomorrow* | | '' | ''
| *christmas eve* |*calendar-holidays* | Dec 24, 12:00am | Dec 24, 11:59pm
| *easter* |*astronomical holidays* | -depends- | -
| *q1* | | Jan 1, 12:00am | Mar 31, 11:59pm
|
**`times:`**| |
| *2pm* | | '' | ''
| *2:12pm* | | '' | ''
| *2:12* | | '' | ''
| *02:12:00* | *weird iso-times*| '' | ''
| *two oclock* |*written formats* | '' | ''
| *before 1* | | '' | ''
| *noon* | | '' | ''
| *at night* | *informal daytimes* | '' | ''
| *in the morning* | | '' | ''
| *tomorrow evening* | | '' | ''
|
**`timezones:`**| |
| *eastern time* | *informal zone support*| '' | ''
| *est* |*TZ shorthands* | '' | ''
| *peru time* | | '' | ''
| *..in beirut* | *by location* | '' | ''
| *GMT+9* | *by UTC/GMT offset*| '' | ''
| *-4h* | '' | '' | ''
| *Canada/Eastern* | *IANA codes*| '' | ''
|
**`relative durations:`**| |
| *this march* | | '' | ''
| *this week* | | '' | ''
| *this sunday* | | '' | ''
| *next april* | | '' | ''
| *this past year* | | '' | ''
| *second week of march* | | '' | ''
| *last weekend of march* | | '' | ''
| *last spring* | | '' | ''
| *the saturday after next* | | '' | ''
|
**`punted dates:`**| |
| *in seven weeks* | *now+duration*| '' | ''
| *two days after june 6th* | *date+duration*| '' | ''
| *2 weeks from now* | | '' | ''
| *2 weeks after june* | | '' | ''
| *2 years, 4 months, and 5 days ago* | *complex durations*| '' | ''
| *a week and a half before* | *written-out numbers*| '' | ''
| *a week friday* | *idiom format*| '' | ''
|
**`start/end:`**| |
| *end of the week* | *up-against the ending* | '' | ''
| *start of next year* | *lean-toward starting*| '' | ''
| *middle of q2 last year* |*rough-center calculation* | '' | ''
|
**`date-ranges:`**| |
| *between june and july* |*explicit ranges* | '' | ''
| *from today to next haloween* | | '' | ''
| *aug 1 - aug 31* | *dash-ranges*| '' | ''
| *22-23 February* | | '' | ''
| *today to next friday* | | '' | ''
| *during june* | | '' | ''
| *aug to june 1999* | *shared range info*| '' | ''
| *before [2019]* |*up-to a date* | '' | ''
| *by march* | | '' | ''
| *after february* | *date-to-infinity*| '' | ''
|
**`repeating-intervals:`**| |
| *any wednesday* | *n-repeating dates* | |
| *any day in June* | *repeating-date in range* | June 1 ... | .. June 30
| *any wednesday this week* | | '' | ''
| *weekends in July* | *more-complex interval*| '' | ''
| *every weekday until February* | *interval until date*| '' | ''

### *Things it does awkwardly:*
| *`hmmm,`* | *description* | `Start` | `End` |
| ------------- |:-------------:| :-------------:| :-------------:|
| *middle of 2019/June* | tries to find the sorta-center | June 15 | '' |
| *good friday 2025* | tries to reckon astronomically-set holidays| '' | '' |
| *Oct 22 1975 2am in PST* | historical DST changes (assumes current dates) | '' | '' |

### *Things it doesn't do:*
| *😓,* | *description* | `Start` | `End` |
| ------------- |:-------------:| :-------------:| :-------------:|
| *not this Saturday, but the Saturday after* | self-reference logic | '' | '' |
| *3 years ago tomorrow* | folksy short-hand | '' | '' |
| *2100* | military time formats | '' | '' |
| *may 97* | 'bare' 2-digit years | '' | '' |

## API
- **.dates()** - find dates like `June 8th` or `03/03/18`
- **.dates().get()** - parsed dates as json
- **.dates().json()** - overloaded json output with date metadata
- **.dates().format('')** - convert the dates to specific formats
- **.durations()** - find unspecified lengths of time like `two hours and 30mins`
- **.durations().get()** - parsed durations as json
- **.durations().json()** - overloaded json output with duration metadata
- **.durations().normalize()** - turn 3mins into 3 minutes
- **.times()** - find day-times like `three oclock`
- **.times().get()** - parsed times as json
- **.times().json()** - overloaded json output with time metadata
- **.times().normalize()** - turn 3mins into 3 minutes

### Configuration:
`.dates()` accepts an optional object, that lets you set the context for the date parsing.
```js
const context = {
timezone: 'Canada/Eastern', //the default timezone is 'ETC/UTC'
today: '2020-02-20', //the implicit, or reference day/year
punt: { weeks: 2 }, // the implied duration to use for 'after june 2nd'
dayStart: '8:00am',
dayEnd: '5:30pm'
}
nlp('in two days').dates(context).get()
/*
[{ start: '2020-02-22T08:00:00.000+5:00', end: '2020-02-22T17:30:00.000+5:00' }]
*/
```
## API
- **[.dates()](https://observablehq.com/@spencermountain/compromise-dates)** - find dates like `June 8th` or `03/03/18`
- **[.dates().get()](https://observablehq.com/@spencermountain/compromise-dates)** - simple start/end json result
- **[.dates().json()](https://observablehq.com/@spencermountain/compromise-dates)** - overloaded output with date metadata
- **[.dates().format('')](https://observablehq.com/@spencermountain/compromise-dates)** - convert the dates to specific formats
- **[.dates().toShortForm()](https://observablehq.com/@spencermountain/compromise-dates)** - convert 'Wednesday' to 'Wed', etc
- **[.dates().toLongForm()](https://observablehq.com/@spencermountain/compromise-dates)** - convert 'Feb' to 'February', etc
- **[.durations()](https://observablehq.com/@spencermountain/compromise-dates)** - `2 weeks` or `5mins`
- **[.durations().get()](https://observablehq.com/@spencermountain/compromise-dates)** - return simple json for duration
- **[.durations().json()](https://observablehq.com/@spencermountain/compromise-dates)** - overloaded output with duration metadata
- **[.times()](https://observablehq.com/@spencermountain/compromise-dates)** - `4:30pm` or `half past five`
- **[.durations().get()](https://observablehq.com/@spencermountain/compromise-dates)** - return simple json for times
- **[.times().json()](https://observablehq.com/@spencermountain/compromise-dates)** - overloaded output with time metadata

## *Opinions*:
### *Start of week:*
By default, weeks start on a Monday, and *'next week'* will run from Monday morning to Sunday night.
This can be configued in spacetime, but right now we are not passing-through this config.
### *Implied durations:*
*'after October'* returns a range starting **Nov 1st**, and ending **2-weeks** after, by default.
This can be configured by setting `punt` param in the context object:
```js
doc.dates({punt: { month: 1 }})
```
### *Future bias:*
*'May 7th'* will prefer a May 7th in the future.
The parser will return a past-date though, in the current-month:
```js
// from march 2nd
nlp('feb 30th').dates({today: '2021-02-01'}).get()
```
### *This/Next/Last:*
named-weeks or months eg *'this/next/last week'* are mostly straight-forward.
#### *This monday*
A bare 'monday' will always refer to itself, or the upcoming monday.
* Saying *'this monday'* on monday, is itself.
* Saying *'this monday'* on tuesday , is next week.
Likewise, *'this june'* in June, is itself. *'this june'* in any other month, is the nearest June in the future.
Future versions of this library could look at sentence-tense to help disambiguate these dates - *'i paid on monday'* vs *'i will pay on monday'*.
#### *Last monday*
If it's Tuesday, *'last monday'* will not mean yesterday.
* Saying *'last monday'* on a tuesday will be -1 week.
* Saying *'a week ago monday'* will also work.
* Saying *'this past monday'* will return yesterday.
For reference, **Wit.ai** & **chronic** libraries both return yesterday. **Natty** and **SugarJs** returns -1 week, like we do.
*'last X'* can be less than 7 days backward, if it crosses a week starting-point:
* Saying *'last friday'* on a monday will be only a few days back.
#### *Next Friday*
If it's Tuesday, *'next wednesday'* will not be tomorrow. It will be a week after tomorrow.
* Saying *'next wednesday'* on a tuesday, will be +1 week.
* Saying *'a week wednesday'* will also be +1 week.
* Saying *'this coming wednesday'* will be tomorrow.
For reference, **Wit.ai**, **chronic**, and **Natty** libraries all return tomorrow. **SugarJs** returns +1 week, like we do.
### *Nth Week:*
The first week of a month, or a year is the first week *with a thursday in it*. This is a weird, but widely-held standard. I believe it's a military formalism. It cannot be (easily) configued. This means that the start-date for *first week of January* may be a Monday in December, etc.
As expected, *first monday of January* will always be in January.
### *British/American ambiguity:*
by default, we use the same interpretation of dates as javascript does - we assume `01/02/2020` is Jan 2nd, (US-version) but allow `13/01/2020` to be Jan 13th (UK-version). This should be possible to configure in the near future.
### *Seasons:*
By default, *'this summer'* will return **June 1 - Sept 1**, which is northern hemisphere ISO.
Configuring the default hemisphere should be possible in the future.
### *Day times:*
There are some hardcoded times for *'lunch time'* and others, but mainly, a day begins at `12:00am` and ends at `11:59pm` - the last millisecond of the day.
### *Invalid dates:*
compromise will tag anything that looks like a date, but not validate the dates until they are parsed.
* *'january 34th 2020'* will return **Jan 31 2020**.
* *'tomorrow at 2:62pm'* will return just return 'tomorrow'.
* *'6th week of february* will return the 2nd week of march.
* Setting an hour that's skipped, or repeated by a DST change will return the closest valid time to the DST change.
### *Inclusive/exclusive ranges:*
*'between january and march'* will include all of march. This is usually pretty-ambiguous normally.
### *Date greediness:*
This library makes no assumptions about the input text, and is careful to avoid false-positive dates.
If you know your text is a date, you can crank-up the date-tagger with a [compromise-plugin](https://observablehq.com/@spencermountain/compromise-plugins), like so:
```js
nlp.extend(function (Doc, world) {
// ambiguous words
world.addWords({
weds: 'WeekDay',
wed: 'WeekDay',
sat: 'WeekDay',
sun: 'WeekDay',
})
world.postProcess((doc) => {
// tag '2nd quarter' as a date
doc.match('#Ordinal quarter').tag('#Date')
// tag '2/2' as a date (not a fraction)
doc.match('/[0-9]{1,2}/[0-9]{1,2}/').tag('#Date')
})
})
```
### *Misc:*
* *'thursday the 16th'* - will set to the 16th, even if it's not thursday
* *'in a few hours/years'* - in 2 hours/years
* *'jan 5th 2008 to Jan 6th the following year'* - date-range explicit references
* assume *'half past 5'* is 5pm
### *About:*

1 - Regular-expressions are too-brittle to parse dates.
2 - Neural-nets are too-wonky to parse dates.
3 - A corporation, or startup is the wrong place to build a universal date-parser.

Parsing *dates*, *times*, *durations*, and *intervals* from natural language can be a solved-problem.
A rule-based, community open-source library - *one based on simple NLP* - is the best way to build a natural language date parser - commercial, or otherwise - for the frontend, or the backend.
The *[match-syntax](https://observablehq.com/@spencermountain/compromise-match-syntax)* is effective and easy, *javascript* is prevailing, and the more people who contribute, the better.

### See also
* [Duckling](https://duckling.wit.ai/) - by wit.ai (facebook)
* [Sugarjs/dates](https://sugarjs.com/dates/) - by Andrew Plummer (js)
* [Chronic](https://github.com/mojombo/chronic) - by Tom Preston-Werner (Ruby)
* [SUTime](https://nlp.stanford.edu/software/sutime.shtml) - by Angel Chang, Christopher Manning (Java)
* [Natty](http://natty.joestelmach.com/) - by Joe Stelmach (Java)
* [rrule](https://github.com/jakubroztocil/rrule) - repeating date-interval handler (js)
* [ParseDateTime](https://pypi.org/project/parsedatetime/) by Mike Taylor (Python)
compromise-date is sponsored by 
**MIT** licenced