An open API service indexing awesome lists of open source software.

https://github.com/gaborcsardi/parsedate

R package to parse dates given in arbitrary formats
https://github.com/gaborcsardi/parsedate

Last synced: 15 days ago
JSON representation

R package to parse dates given in arbitrary formats

Awesome Lists containing this project

README

        

---
output: github_document
---

# parsedate — Parse dates from ISO 8601, and guess the format

[![R build status](https://github.com/gaborcsardi/parsedate/workflows/R-CMD-check/badge.svg)](https://github.com/gaborcsardi/parsedate/actions)
[![](https://www.r-pkg.org/badges/version/parsedate)](https://www.r-pkg.org/pkg/parsedate)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/parsedate)](https://r-pkg.org/pkg/parsedate)
[![Codecov test coverage](https://codecov.io/gh/gaborcsardi/parsedate/graph/badge.svg)](https://app.codecov.io/gh/gaborcsardi/parsedate)

This R package has three functions for dealing with dates.

* `parse_iso_8601` recognizes and parses all valid ISO
8601 date and time formats. It can also be used as an ISO 8601
validator.
* `parse_date` can parse a date when you don't know
which format it is in. First it tries all ISO 8601 formats.
Then it tries git's versatile date parser. Lastly, it tries
`as.POSIXct`.
* `format_iso_8601` formats a date (and time) in
specific ISO 8601 format.

## Limitations

The git parser does not work for dates before 1970 and after 2100. For
these dates the current year is used instead:

```{r}
library(parsedate)
parse_date("april 15 1971")
parse_date("april 15 1969")
parse_date("april 15 2110")
```

## Parsing ISO 8601 dates

`parse_iso_8601` recognizes all valid ISO 8601 formats, and
gives an `NA` for invalid dates. Here are some examples

### Dates with missing fields

```{r}
library(parsedate)
parse_iso_8601("2013-02-08 09")
parse_iso_8601("2013-02-08 09:30")
```

### Separator between date and time

```{r}
parse_iso_8601("2013-02-08T09")
parse_iso_8601("2013-02-08T09:30")
parse_iso_8601("2013-02-08T09:30:26")
```

### Fractional seconds, minutes, hours

```{r}
parse_iso_8601("2013-02-08T09:30:26.123")
parse_iso_8601("2013-02-08T09:30.5")
parse_iso_8601("2013-02-08T09,25")
```

### Zulu time zone is UTC

```{r}
parse_iso_8601("2013-02-08T09:30:26Z")
```

### ISO weeks are parsed properly

```{r}
parse_iso_8601("2013-W06-5")
parse_iso_8601("2013-W01-1")
parse_iso_8601("2009-W01-1")
parse_iso_8601("2009-W53-7")
```

### Day of the year

```{r}
parse_iso_8601("2013-039")
parse_iso_8601("2013-039 09:30:26Z")
```

## Guess the format of the date, and parse it

Sometimes one has to work with a large number of dates, in arbitrary
formats. It is of impossible to reliably guess the format of some
dates, because of ambiguity. But it is often not critical to get the
date exactly right in the ambiguous cases, and this is when the
`parse_date` function is useful. It tries a large number of formats,
here is the algorithm is uses:

1. Try parsing dates using all valid ISO 8601 formats, by
calling `parse_iso_8601`.
2. If this fails, then try parsing them using the git
date parser.
3. If this fails, then try parsing them using `as.POSIXct`.
(It is unlikely that this step will parse any dates that the
first two steps couldn't, but it is still a logical fallback,
to make sure that we can parse at least as many dates as
`as.POSIXct`.

Here are some examples. The first ones are easy.

```{r}
parse_date("2014-12-12")
parse_date("04/15/99")
parse_date("15/04/99")
```

### Ambiguous formats

The following formats are ambiguous and are parsed as _month/day/year_.

```{r}
parse_date("12/11/99")
parse_date("11/12/99")
```

### Fill in the current date and time for missing fields

```{r}
parse_date("03/20")
parse_date("12")
```

But not for this, because this is ISO 8601.

```{r}
parse_date("2014")
```

## Formatting dates as ISO 8601

The `format_iso_8601` function formats a date (and time) in a fixed format
that is ISO 8601 valid, and can be used to compare dates as character
strings. It converts the date(s) to UTC.

```{r}
format_iso_8601(parse_iso_8601("2013-02-08"))
format_iso_8601(parse_iso_8601("2013-02-08 09:34:00"))
format_iso_8601(parse_iso_8601("2013-02-08 09:34:00+01:00"))
format_iso_8601(parse_iso_8601("2013-W06-5"))
format_iso_8601(parse_iso_8601("2013-039"))
```