{"id":14065969,"url":"https://github.com/ropensci/datefixR","last_synced_at":"2025-07-29T21:34:28.388Z","repository":{"id":40365867,"uuid":"424185421","full_name":"ropensci/datefixR","owner":"ropensci","description":"🗓 Standardize Dates in Different Formats or with Missing Data","archived":false,"fork":false,"pushed_at":"2024-11-25T12:24:03.000Z","size":1723,"stargazers_count":34,"open_issues_count":4,"forks_count":5,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-25T13:26:52.185Z","etag":null,"topics":["dates","r","r-package","rstats","wrangling"],"latest_commit_sha":null,"homepage":"https://docs.ropensci.org/datefixR/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ropensci.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json"}},"created_at":"2021-11-03T10:48:58.000Z","updated_at":"2024-11-25T12:23:03.000Z","dependencies_parsed_at":"2023-02-14T01:01:34.749Z","dependency_job_id":"dad4f4fc-01b3-465b-833a-ba0d2f471d84","html_url":"https://github.com/ropensci/datefixR","commit_stats":{"total_commits":377,"total_committers":12,"mean_commits":"31.416666666666668","dds":0.2785145888594165,"last_synced_commit":"73a822d65d8417186ebfdb3ed3235bd14855f3a6"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2FdatefixR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2FdatefixR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2FdatefixR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ropensci%2FdatefixR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ropensci","download_url":"https://codeload.github.com/ropensci/datefixR/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228052652,"owners_count":17862105,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dates","r","r-package","rstats","wrangling"],"created_at":"2024-08-13T07:04:52.689Z","updated_at":"2025-07-29T21:34:28.369Z","avatar_url":"https://github.com/ropensci.png","language":"R","readme":"---\noutput:\n  github_document:\n    html_preview: false\neditor_options: \n  markdown: \n    wrap: 80\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\"\n)\n```\n\n# datefixR \u003cimg src=\"man/figures/logo.png\" align=\"right\" width=\"150\"/\u003e\n\n\u003c!-- badges: start --\u003e\n| Usage                                                                                                                                 | Release                                                                                                            | Development   | Translation Status |                                                                                                                                                                                          |\n|---------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|\n| ![R](https://img.shields.io/badge/r-%23276DC3.svg?style=for-the-badge\u0026logo=r\u0026logoColor=white)                      | [![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/datefixR)](https://cran.r-project.org/package=datefixR) | [![R build status](https://github.com/ropensci/datefixR/workflows/CI/badge.svg)](https://github.com/ropensci/datefixR/actions)|[![German localization ](https://gitlocalize.com/repo/8364/de/badge.svg)](https://gitlocalize.com/repo/8364/de?utm_source=badge)  | \n| [![License: GPL-3](https://img.shields.io/badge/License-GPL3-green.svg)](https://opensource.org/license/gpl-3-0)                                       | [![datefixR status badge](https://ropensci.r-universe.dev/badges/datefixR)](https://ropensci.r-universe.dev/datefixR)     | [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)  | [![Spanish localization ](https://gitlocalize.com/repo/8364/es/badge.svg)](https://gitlocalize.com/repo/8364/es?utm_source=badge)                                                    |\n| [![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/grand-total/datefixR?color=blue)](https://r-pkg.org/pkg/datefixR) | [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5655311.svg)](https://doi.org/10.5281/zenodo.5655311)          | [![codecov](https://codecov.io/gh/ropensci/datefixR/branch/main/graph/badge.svg?token=zycOVwlq1m)](https://app.codecov.io/gh/ropensci/datefixR) | [![French localization](https://gitlocalize.com/repo/8364/fr/badge.svg)](https://gitlocalize.com/repo/8364/fr?utm_source=badge)|\n| ![website status](https://img.shields.io/website?down_color=red\u0026down_message=offline\u0026up_color=green\u0026up_message=online\u0026url=https%3A%2F%2Fdocs.ropensci.org%2FdatefixR%2F) | [![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/533_status.svg)](https://github.com/ropensci/software-review/issues/533) | [![Tidyverse style guide](https://img.shields.io/static/v1?label=Code%20Style\u0026message=Tidyverse\u0026color=1f1c30)](https://style.tidyverse.org) |[![Indonesian localization](https://gitlocalize.com/repo/8364/id/badge.svg)](https://gitlocalize.com/repo/8364/id?utm_source=badge) |\n| | | | [![Russian localization](https://gitlocalize.com/repo/8364/ru/badge.svg)](https://gitlocalize.com/repo/8364/ru?utm_source=badge) |\n\n\u003c!-- badges: end --\u003e\n\n**`datefixR` is an R package that automatically standardizes messy date data into consistent, machine-readable formats.** Whether you're dealing with free-text web form entries like \"02 05 92\", \"2020-may-01\", or \"le 3 mars 2013\", `datefixR` intelligently parses diverse date formats and converts them to R's standard Date class.\n\n[![CRAN Version](https://www.r-pkg.org/badges/version/datefixR)](https://cran.r-project.org/package=datefixR) \n[![R Version](https://img.shields.io/badge/R-≥4.1.0-blue)](https://www.r-project.org/)\n[![Development Version](https://img.shields.io/badge/dev-1.7.0.9000-orange)](https://github.com/ropensci/datefixR)\n\n**Key features:**\n\n- **Smart parsing**: Handles mixed date formats, separators, and representations in a single dataset\n- **Multilingual support**: Recognizes dates in English, French, German, Spanish, Indonesian, Russian, and Portuguese\n- **Missing data imputation**: User-controlled behavior for incomplete dates (missing days/months)\n- **Detailed error reporting**: Identifies exactly which dates couldn't be parsed and why\n- **Excel compatibility**: Supports both R and Excel numeric date representations\n- **Shiny integration**: Interactive web app for data exploration and cleaning\n\u003cimg src=\"man/figures/example.svg\" width=\"800\"/\u003e\n\n## Quick Start\n\nHere's a simple example showing how `datefixR` cleans messy date data:\n\n```{R}\nlibrary(datefixR)\n\n# Create some messy date data\nmessy_dates \u003c- c(\"02/05/92\", \"2020-may-01\", \"le 3 mars 2013\", \"1996\")\nmessy_df \u003c- data.frame(id = 1:4, dates = messy_dates)\nprint(messy_df)\n\n# Clean the dates\nclean_dates \u003c- fix_date_char(messy_dates)\nclean_df \u003c- fix_date_df(messy_df, \"dates\")\nprint(clean_df)\n```\n\nThe package automatically standardizes dates from different formats (US/European style, named months, various separators, incomplete dates) into R's standard `yyyy-mm-dd` format. When parts are missing (like the day or month), it intelligently imputes them—defaulting to July 1st for incomplete dates.\n\n## Installation\n\n### Stable Release (Recommended)\n\n`datefixR` is available on CRAN:\n\n```{R Cran, eval = FALSE}\ninstall.packages(\"datefixR\")\n```\n\n### Latest Stable (r-universe)\n\nFor the most up-to-date stable version via [r-universe](https://r-universe.dev/search):\n\n```{R dev, eval = FALSE}\n# Enable universe(s) by ropensci\noptions(repos = c(\n  ropensci = \"https://ropensci.r-universe.dev\",\n  CRAN = \"https://cloud.r-project.org\"\n))\n\ninstall.packages(\"datefixR\")\n```\n\n### Development Version\n\nFor bleeding-edge features (may be unstable):\n\n```{R, eval = FALSE}\nif (!require(\"remotes\")) install.packages(\"remotes\")\nremotes::install_github(\"ropensci/datefixR\", \"devel\")\n```\n\n**Version Compatibility**: `datefixR` requires R ≥ 4.1.0. Current stable version: `r packageVersion(\"datefixR\")`.\n\n## Getting Started\n\nNew to `datefixR`? Start here:\n\n1. **Install the package** (see above)\n2. **Try the Quick Start example** (see below)\n3. **Explore the Shiny app**: [https://nathansam.shinyapps.io/datefixr/](https://nathansam.shinyapps.io/datefixr/)\n4. **Read the full vignette**: `browseVignettes(\"datefixR\")` or visit [online documentation](https://docs.ropensci.org/datefixR/articles/datefixR.html)\n\n## Package vignette\n\n`datefixR` has a \"Getting Started\" vignette which describes how to use this\npackage in more detail than this page. View the vignette by either calling\n\n```{R, eval = FALSE}\nbrowseVignettes(\"datefixR\")\n```\n\nor visiting the vignette on the [package\nwebsite](https://docs.ropensci.org/datefixR/articles/datefixR.html)\n\n## Usage\n\n`datefixR` provides flexible date standardization capabilities across different data structures and formats. This section demonstrates various use cases with practical examples.\n\n### Character Vector Cleaning\n\nThe most basic use case involves cleaning a character vector of messy dates using `fix_date_char()`:\n\n```{r char-vector-example}\nlibrary(datefixR)\n\n# Mixed format dates\nmessy_dates \u003c- c(\n  \"02/05/92\", # US format, 2-digit year\n  \"2020-may-01\", # ISO with named month\n  \"le 3 mars 2013\", # French format\n  \"1996\", # Year only\n  \"22.07.1977\", # European format\n  \"jan 2020\" # Month-year only\n)\n\n# Clean all dates at once\nclean_dates \u003c- fix_date_char(messy_dates)\nprint(clean_dates)\n```\n\nThis function automatically handles various separators (\"-\", \"/\", \".\", spaces), different date orders, named months in multiple languages, and incomplete dates.\n\n### Data Frame Cleaning\n\nFor structured data, use `fix_date_df()` to clean multiple date columns simultaneously:\n\n```{r df-example}\n# Load example dataset\ndata(\"exampledates\")\nknitr::kable(exampledates)\n\n# Fix multiple columns\nfixed_df \u003c- fix_date_df(exampledates, c(\"some.dates\", \"some.more.dates\"))\nknitr::kable(fixed_df)\n```\n\nThe function preserves non-date columns and provides detailed error reporting if any dates fail to parse.\n\n### Excel Serial Numbers\n\n`datefixR` supports both R and Excel numeric date representations:\n\n```{r excel-serial-example}\n# R serial dates (days since 1970-01-01)\nr_serial \u003c- \"19539\" # Represents 2023-07-01\nfix_date_char(r_serial)\n\n# Excel serial dates (days since 1900-01-01, accounting for Excel's leap year bug)\nexcel_serial \u003c- \"45108\" # Also represents 2023-07-01\nfix_date_char(excel_serial, excel = TRUE)\n\n# Mixed serial and text dates\nmixed_dates \u003c- c(\"45108\", \"2023-07-01\", \"july 1 2023\")\nfix_date_char(mixed_dates, excel = TRUE)\n```\n\nThis is particularly useful when importing data from Excel spreadsheets where dates may have been converted to serial numbers.\n\n### Roman Numerals\n\n`datefixR` can handle Roman numerals in month positions, common in some European date formats:\n\n```{r roman-example}\n# Roman numeral months\nroman_dates \u003c- c(\n  \"15.VII.2023\", # July 15, 2023\n  \"3.XII.1999\", # December 3, 1999\n  \"1.I.2000\" # January 1, 2000\n)\n\nfix_date_char(roman_dates)\n```\n\nRoman numerals (I-XII) are automatically recognized and converted to the appropriate numeric months.\n\n### MDY vs DMY Detection\n\nBy default, `datefixR` assumes day-first (DMY) format when the date order is ambiguous. However, you can specify month-first (MDY) format:\n\n```{r mdy-dmy-example}\n# Ambiguous dates that could be interpreted as either MDY or DMY\nambiguous_dates \u003c- c(\"01/02/2023\", \"03/04/2023\", \"05/06/2023\")\n\n# Default: Day-first (DMY) interpretation\ndmy_result \u003c- fix_date_char(ambiguous_dates)\nprint(dmy_result)\n\n# Month-first (MDY) interpretation\nmdy_result \u003c- fix_date_char(ambiguous_dates, format = \"mdy\")\nprint(mdy_result)\n```\n\n\n### Missing Day/Month Imputation\n\n`datefixR` provides flexible control over how missing date components are imputed:\n\n```{r imputation-example}\n# Incomplete dates requiring imputation\nincomplete_dates \u003c- c(\"2023\", \"05/2023\", \"2023-08\", \"march 2022\")\n\n# Default imputation: missing month = July (07), missing day = 1st\ndefault_impute \u003c- fix_date_char(incomplete_dates)\nprint(default_impute)\n\n# Custom imputation: missing month = January (01), missing day = 15th\ncustom_impute \u003c- fix_date_char(incomplete_dates,\n  month.impute = 1,\n  day.impute = 15\n)\nprint(custom_impute)\n\n# For data frames, apply the same logic\nincomplete_df \u003c- data.frame(\n  id = 1:4,\n  dates = incomplete_dates\n)\n\nfixed_incomplete \u003c- fix_date_df(incomplete_df, \"dates\",\n  month.impute = 12, # December\n  day.impute = 31\n) # Last day\nknitr::kable(fixed_incomplete)\n```\n\nThis flexibility allows you to choose imputation strategies that make sense for your specific use case (e.g., fiscal year starts, survey periods, etc.).\n\nFor datasets with hundreds of thousands of rows where speed is critical and you can accept less flexible parsing, consider alternatives like `lubridate` or `clock` packages, which use compiled code and can be orders of magnitude faster.\n\n## Limitations\n\nDate and time data are often reported together in the same variable (known as\n\"datetime\"). However datetime formats are not supported by `datefixR`. The\ncurrent rationale is this package is mostly used to handle dates entered via\nfree text web forms and it is much less common for both date and time to be\nreported together in this input method. However, if there is significant demand\nfor support for datetime data in the future this may added.\n\nThe package is written solely in R and seems fast enough for my current use\ncases (a few hundred rows). However, I may convert the core for loop to C++ in\nthe future if speed becomes an issue.\n\n## Similar packages to datefixR\n\n### `lubridate`\n\n[`lubridate::guess_formats()`](https://lubridate.tidyverse.org/reference/guess_formats.html)\ncan be used to guess a date format and\n[`lubridate::parse_date_time()`](https://lubridate.tidyverse.org/reference/parse_date_time.html)\ncalls this function when it attempts to parse a vector into a POSIXct date-time\nobject. However:\n\n1.  When a date fails to parse in `{lubridate}` then the user is simply told how\n    many dates failed to parse. In `datefixR` the user is told the ID (assumed\n    to be the first column by default but can be user-specified) corresponding\n    to the date which failed to parse and reports the considered date: making it\n    much easier to figure out which dates supplied failed to parse and why.\n2.  When imputing a missing day or month, there is no user-control over this\n    behavior. For example, when imputing a missing month, the user may wish to\n    impute July, the middle of the year, instead of January. However, January\n    will always be imputed in `{lubridate}`. In `datefixR`, this behavior can be\n    controlled by the `month.impute` argument.\n3.  These functions require all possible date formats to be specified in the\n    `orders` argument, which may result in a date format not being considered if\n    the user forgets to list one of the possible formats. By contrast,\n    `datefixR` only needs a format to be specified if month-first is to be\n    preferred over day-first when guessing a date.\n\nHowever, `{lubridate}` of course excels in general date manipulation and is an\nexcellent tool to use alongside `datefixR`.\n\n### `anytime`\n\nAn alternative function is\n[`anytime::anydate()`](https://dirk.eddelbuettel.com/code/anytime.html) which\nalso attempts to convert dates to a consistent format (POSIXct). However\n`{anytime}` assumes year, month, and day have all been provided and does not\npermit imputation. Moreover, if a date cannot be parsed, then the date is\nconverted to an NA object and no warning is raised- which may lead to issues\nlater in the analysis.\n\n### `parsedate`\n\n`parsedate::parse_date()` also attempts to solve the problem of handling\narbitrary dates and parses dates into the `POSIXct` type. Unfortunately,\n`parse_date()` cannot handle years before 1970 -- instead imputing the year using\nthe current year without raising a warning. \n\n```{R}\nparsedate::parse_date(\"april 15 1969\")\n``` \n\nMoreover, `parse_date()` assumes dates are in MDY format and does not allow the\nuser to specify otherwise. However, `{parsedate}` has excellent support for\nhandling dates in ISO 8601 formats. \n\n### `stringi`, `readr`, and `clock`\n\nThese packages all use\n[ICU library](https://unicode-org.github.io/icu/userguide/format_parse/datetime/)\nwhen parsing dates (via `stringi::stri_datetime_parse()`, `readr::parse_date()`,\nor `clock::date_parse()`) and therefore all behave very similarly. Notably, all\nof these functions require the date format to be specified including specifying\na priori if a date is missing. Ultimately, this makes these packages unsuitable\nwhen numerous dates in different formats must be parsed.\n\n```{R}\nreadr::parse_date(\"02/2010\", \"%m/%Y\")\n```\n\nHowever, these packages have support for weekdays and months in around 211\nlocales whereas `datefixR` supports much fewer languages due to support for\nadditional languages needing to be implemented individually by hand.\n\n\n### Performance Comparison\n\nThese alternative packages all use compiled code and therefore have the\npotential to be orders of magnitude faster than `datefixR`. However, performance\nvaries significantly based on use case and data characteristics.\n\n**Trade-offs to consider:**\n\n- **`datefixR`**: Excellent error reporting, flexible imputation, handles mixed formats automatically\n- **`lubridate`**: Faster performance but requires format specification, limited imputation control  \n- **`stringi`/`readr`/`clock`**: Blazing fast but require exact format specification, 211 locale support\n- **`anytime`**: Variable performance, no imputation support, silent failures\n\nIf you have very large datasets with consistent formats and don't need detailed error reporting, consider `lubridate` or ICU-based packages. For messy, mixed-format data where usability and error handling are priorities, `datefixR` is optimized for ease of use over raw speed.\n\n## Contributing to datefixR\n\nIf you are interested in contributing to `datefixR`, please read our\n[contributing\nguide](https://github.com/ropensci/datefixR/blob/main/.github/CONTRIBUTING.md).\n\nPlease note that this package is released with a [Contributor Code of\nConduct](https://ropensci.org/code-of-conduct/). By contributing to this\nproject, you agree to abide by its terms.\n\n## Citation\n\nIf you use this package in your research, please consider citing `datefixR`! An\nup-to-date citation can be obtained by running\n\n```{R, results = \"hide\"}\ncitation(\"datefixR\")\n```\n","funding_links":[],"categories":["R"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2FdatefixR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fropensci%2FdatefixR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fropensci%2FdatefixR/lists"}