{"id":32204672,"url":"https://github.com/arg0naut91/neatranges","last_synced_at":"2025-10-22T05:00:16.070Z","repository":{"id":34931391,"uuid":"179554705","full_name":"arg0naut91/neatRanges","owner":"arg0naut91","description":"A tool for tidying up date/timestamp ranges","archived":false,"fork":false,"pushed_at":"2022-06-09T14:37:56.000Z","size":2411,"stargazers_count":3,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-10-22T05:00:00.092Z","etag":null,"topics":["collapse","date-range","intervals","partitioning","r","timestamp-ranges"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arg0naut91.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-04-04T18:33:47.000Z","updated_at":"2022-06-09T09:50:04.000Z","dependencies_parsed_at":"2022-08-08T03:00:33.861Z","dependency_job_id":null,"html_url":"https://github.com/arg0naut91/neatRanges","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/arg0naut91/neatRanges","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arg0naut91%2FneatRanges","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arg0naut91%2FneatRanges/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arg0naut91%2FneatRanges/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arg0naut91%2FneatRanges/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arg0naut91","download_url":"https://codeload.github.com/arg0naut91/neatRanges/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arg0naut91%2FneatRanges/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280382997,"owners_count":26321423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-22T02:00:06.515Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["collapse","date-range","intervals","partitioning","r","timestamp-ranges"],"created_at":"2025-10-22T05:00:13.880Z","updated_at":"2025-10-22T05:00:16.051Z","avatar_url":"https://github.com/arg0naut91.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"neatRanges\n================\n\n[![CRAN\\_Status\\_Badge](http://www.r-pkg.org/badges/version-last-release/neatRanges)](https://cran.r-project.org/package=neatRanges)\n[![R-CMD-check](https://github.com/arg0naut91/neatRanges/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/arg0naut91/neatRanges/actions/workflows/check-standard.yaml)\n[![codecov](https://codecov.io/gh/arg0naut91/neatRanges/branch/master/graph/badge.svg?token=LTF8oTL8Qy)](https://codecov.io/gh/arg0naut91/neatRanges)\n[![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)\n\nThe aim of `neatRanges` is to provide tools for working with date \u0026\ntimestamp ranges, namely:\n\n  - Collapsing,  \n  - Partitioning,  \n  - Combining,  \n  - Expanding,  \n  - Filling the gaps between ranges.\n\nIt primarily uses `data.table` in order to speed up the operations. One\nof the functions - `collapse_ranges` - also uses `Rcpp`, thanks to the\nidea from\n[Patrikios/customerRelationship](https://github.com/Patrikios/customerRelationship).\n\nYou can install it from CRAN by `install.packages('neatRanges')`.\n\nBelow is a quick overview of all functions, followed by a detailed\ndescription of each one of them.\n\n| Function          | Description                                           | Supports dates   | Supports timestamps |\n| ----------------- | ----------------------------------------------------- | ---------------- | ------------------- |\n| collapse\\_ranges  | Collapse consecutive ranges                           | Yes              | Yes                 |\n| partition\\_ranges | Split a range into multiple ranges (by month or year) | Months and years | No                  |\n| expand\\_\\*        | Expand a range                                        | Yes              | Yes                 |\n| combine\\_ranges   | Combine ranges from multiple tables                   | Yes              | Yes                 |\n| fill\\_ranges      | Add missing ranges                                    | Yes              | Yes                 |\n\n## collapse\\_ranges\n\nThis function can be useful when we want to collapse several consecutive\ndate or timestamp ranges.\n\nLet’s say we have a data frame of different financial actors and their\ncredit ratings for different time ranges.\n\nNow you can see that some of them are actually consecutive. Instead of 6\nrows, we’d therefore like to get only 4.\n\n``` \n    id rating start_date   end_date marketCapStart marketCapEnd\n1 1111     A+ 2014-01-01 2014-12-31            7.2          7.4\n2 1111     AA 2015-01-01 2015-12-31            7.3          7.5\n3 1111     AA 2016-01-01 2016-03-01            7.9          8.0\n4 2222     B- 2017-01-01 2017-01-31            6.5          6.9\n5 2222     B- 2018-01-01 2018-12-31            6.7          6.5\n6 2222     B- 2019-01-01 2020-02-01            6.8          7.0\n```\n\nWe can do this as below. A bit of explanation of some of the arguments:\n\n  - `dimension` defines whether we are dealing with dates or timestamps;\n    defaults to dates;\n\n  - `max_gap` defines what is considered as consecutive. By default, it\n    is set to 0, this means no gap whatsoever. If `dimension` is `date`,\n    this means 0 days; in case of `timestamp`, it refers to 0 seconds. A\n    gap of 1 day / 1 second would be expressed as `max_gap = 1L`;\n\n  - `fmt` defines the format. By default, it uses `%Y-%m-%d` for dates,\n    and `%Y-%m-%d %H:%M:%OS` for timestamps. If your ranges are in\n    another format, you need to modify that accordingly;\n\n  - `groups` argument is optional;\n\n  - `startAttr` and `endAttr` are attributes of every date/timestamp\n    record (optional as arguments). Once the ranges are collapsed, each\n    record will keep the `startAttr` from the very beginning of the\n    range while the `endAttr` will contain the attributes from the very\n    end of the range. It’s possible to include multiple columns both in\n    `startAttr` as well as `endAttr`. Note that all of them will be -\n    for safety reasons - converted to *character* beforehand.\n\n\u003c!-- end list --\u003e\n\n``` r\ndf_collapsed \u003c- collapse_ranges(df, \n                                groups = c(\"id\", \"rating\"), \n                                start_var = \"start_date\", \n                                end_var = \"end_date\",\n                                startAttr = \"marketCapStart\",\n                                endAttr = \"marketCapEnd\",\n                                max_gap = 0L,\n                                fmt = \"%Y-%m-%d\",\n                                dimension = \"date\"\n                                )\n\ndf_collapsed\n```\n\n``` \n    id rating start_date   end_date marketCapStart marketCapEnd\n1 1111     A+ 2014-01-01 2014-12-31            7.2          7.4\n2 1111     AA 2015-01-01 2016-03-01            7.3            8\n3 2222     B- 2017-01-01 2017-01-31            6.5          6.9\n4 2222     B- 2018-01-01 2020-02-01            6.7            7\n```\n\nWe can address timestamps in a similar way, only now we need to specify\nthe `dimension` as `timestamp`.\n\nNote that here two additional arguments are important:\n\n  - `tz` defines the time zone - by default, it is set to `UTC`;\n\n  - `origin` specifies the origin for indexing/converting the dates; by\n    default, it is set to `1970-01-01`.\n\nLet’s say that now we’re dealing with individuals and the way they spend\ntheir time:\n\n``` \n    id       diary          start_time            end_time\n1 1111     reading 2014-01-01 14:00:00 2014-01-01 14:59:59\n2 1111 watching TV 2014-01-01 15:00:00 2014-01-01 16:29:59\n3 1111 watching TV 2014-01-01 16:30:00 2014-01-01 19:00:00\n4 2222     working 2015-01-01 15:00:00 2015-01-01 15:59:59\n5 2222     working 2015-01-01 17:00:00 2015-01-01 18:59:59\n6 2222     working 2015-01-01 19:00:00 2015-01-01 21:00:00\n```\n\nIf we don’t specify the format, it will throw a warning:\n\n``` r\ndf_collapsed \u003c- collapse_ranges(df, \n                                groups = c(\"id\", \"diary\"), \n                                start_var = \"start_time\", \n                                end_var = \"end_time\", \n                                dimension = \"timestamp\",\n                                tz = 'UTC',\n                                origin = '1970-01-01'\n                                )\n```\n\n    Warning in collapse_ranges(df, groups = c(\"id\", \"diary\"), start_var =\n    \"start_time\", : Dimension 'timestamp' selected but format unchanged. Will try to\n    convert to '%Y-%m-%d %H:%M:%OS' ..\n\n``` r\ndf_collapsed\n```\n\n``` \n    id       diary          start_time            end_time\n1 1111     reading 2014-01-01 14:00:00 2014-01-01 14:59:59\n2 1111 watching TV 2014-01-01 15:00:00 2014-01-01 19:00:00\n3 2222     working 2015-01-01 15:00:00 2015-01-01 15:59:59\n4 2222     working 2015-01-01 17:00:00 2015-01-01 21:00:00\n```\n\n## partition\\_ranges\n\nThis function allows users to further split their ranges.\n\nCurrently, this allows only splitting of `Date` formats (partitioning by\neither year or month).\n\nConsider the following data frame:\n\n``` \n  group      start        end\n1     a 2017-05-01 2018-09-01\n2     a 2019-04-03 2020-04-03\n3     b 2011-03-03 2012-05-03\n4     b 2014-05-07 2016-04-02\n5     c 2017-02-01 2017-04-05\n```\n\nPartitioning by year (default mode) would look like:\n\n``` r\npart_by_year \u003c- partition_ranges(df,\n                                 start_var = \"start\",\n                                 end_var = \"end\",\n                                 partition_by = \"year\",\n                                 vars_to_keep = \"group\"\n                                 )\n\nhead(part_by_year)\n```\n\n``` \n  group      start        end\n1     a 2017-05-01 2017-12-31\n2     a 2018-01-01 2018-09-01\n3     a 2019-04-03 2019-12-31\n4     a 2020-01-01 2020-04-03\n5     b 2011-03-03 2011-12-31\n6     b 2012-01-01 2012-05-03\n```\n\nOn the other hand, partitioning by month would take the following\nformat:\n\n``` r\npart_by_month \u003c- partition_ranges(df,\n                                  start_var = \"start\",\n                                  end_var = \"end\",\n                                  partition_by = \"month\",\n                                  vars_to_keep = \"group\"\n                                  )\n\nhead(part_by_month)\n```\n\n``` \n  group      start        end\n1     a 2017-05-01 2017-05-31\n2     a 2017-06-01 2017-06-30\n3     a 2017-07-01 2017-07-31\n4     a 2017-08-01 2017-08-31\n5     a 2017-09-01 2017-09-30\n6     a 2017-10-01 2017-10-31\n```\n\nNote that the `vars_to_keep` argument is optional and basically\nspecifies which columns you’d like to keep.\n\nThere is also `fmt` argument that specifies the `Date` format (set to\n`%Y-%m-%d` by default).\n\n## expand\\_\\* family\n\nThese are light-weight functions that allow the user to expand\ndate/timestamp range into a column of elements of the sequence.\n\nLet’s say we have a data frame with `start` and `end` which we would\nlike to expand:\n\n``` \n    id gender      start        end\n1 1111      M 2018-01-01 2018-01-05\n2 2222      F 2019-01-01 2019-01-07\n3 3333      F 2020-01-01 2020-01-08\n```\n\nThis can be expanded with calling the `expand_dates` function with the\nfollowing arguments:\n\n  - `start_var` and `end_var` as column names of our range;\n\n  - `name` as the future name of our column. This is optional and\n    defaults to ‘Expanded’;\n\n  - `vars_to_keep` as indicators of which columns we’d like to keep when\n    expanding (optional);\n\n  - `unit` which defines the unit for expansion - defaults to each day\n    in the sequence.\n\nOther optional argument is also `fmt` where you can set the format of\nyour dates - by default, it is `%Y-%m-%d`.\n\n``` r\ndf_exp \u003c- expand_dates(df,\n                       start_var = \"start\",\n                       end_var = \"end\",\n                       name = \"exp_seqs\",\n                       vars_to_keep = c(\"id\", \"gender\"),\n                       unit = \"day\"\n                       )\nhead(df_exp)\n```\n\n``` \n    id gender   exp_seqs\n1 1111      M 2018-01-01\n2 1111      M 2018-01-02\n3 1111      M 2018-01-03\n4 1111      M 2018-01-04\n5 1111      M 2018-01-05\n6 2222      F 2019-01-01\n```\n\nYou can also tackle *timestamp* formats in a similar way with\n`expand_times`.\n\nThe arguments are pretty much the same, except that `unit` defaults to\n`hour`, `fmt` to `%Y-%m-%d %H:%M:%OS`, and you can set an additional\nargument `tz` (time zone; defaults to `UTC`).\n\n``` \n    id gender               start                 end\n1 1111      M 2018-01-01 15:00:00 2018-01-01 18:30:00\n2 2222      F 2019-01-01 14:00:00 2019-01-01 17:30:00\n3 3333      F 2020-01-01 19:00:00 2020-01-02 02:00:00\n```\n\n``` r\ndf_exp \u003c- expand_times(df,\n                       start_var = \"start\",\n                       end_var = \"end\",\n                       name = \"exp_seqs\",\n                       vars_to_keep = c(\"id\", \"gender\"),\n                       unit = \"hour\"\n                       )\nhead(df_exp)\n```\n\n``` \n    id gender            exp_seqs\n1 1111      M 2018-01-01 15:00:00\n2 1111      M 2018-01-01 16:00:00\n3 1111      M 2018-01-01 17:00:00\n4 1111      M 2018-01-01 18:00:00\n5 2222      F 2019-01-01 14:00:00\n6 2222      F 2019-01-01 15:00:00\n```\n\n## combine\\_ranges\n\nThis function is essentially a wrapper around `collapse_ranges` that\nallows you to combine ranges scattered around different tables into one\ntable with non-redundant splits \u0026 rows.\n\nLet’s say we have the following data frames `df1`, `df2` and `df3`:\n\n``` \n       start        end group infoScores\n1 2010-01-01 2010-08-05     a          0\n2 2012-06-01 2013-03-03     a          3\n3 2014-10-15 2015-01-01     b          2\n```\n\n``` \n         end group      start score\n1 2012-04-05     b 2009-01-15     8\n2 2014-06-09     a 2012-07-08     2\n3 2009-02-01     b 2008-01-01     3\n```\n\n``` \n         end group      start scoreInfo\n1 2011-04-05     a 2010-02-03         1\n2 2014-12-09     b 2014-07-08         2\n3 2009-02-01     c 2008-01-01         3\n```\n\nNote the column names: the range names (`start` and `end`) as well as\nthe grouping variables (`group`) are the same across all data frames.\nThis is a requirement for the function to run properly. However - as you\ncan see from the example -, there is no need for them to be in the same\norder.\n\nThe arguments are almost identical to those of `collapse_ranges`, except\nthat at the beginning it is possible to pass as many data frames as you\nwould like to combine. They need to be combined together in a list.\n\nFor instance, we can combine the above data frames as follows (we don’t\nneed to specify the `dimension` as it defaults to `date`):\n\n``` r\ndf \u003c- combine_ranges(dfs = list(df1, df2, df3), \n                     start_var = \"start\", \n                     end_var = \"end\", \n                     groups = \"group\"\n                     )\n\ndf\n```\n\n``` \n  group      start        end\n1     a 2010-01-01 2011-04-05\n2     a 2012-06-01 2014-06-09\n3     b 2008-01-01 2012-04-05\n4     b 2014-07-08 2015-01-01\n5     c 2008-01-01 2009-02-01\n```\n\nAs you have probably noticed, the output contains only the relevant\ncolumns: range variables \u0026 grouping variables. Note that the `groups`\nargument is optional. On the other hand, you can also use the\n`startAttr` and `endAttr` arguments, the same as in `collapse_ranges`.\n\n## fill\\_ranges\n\nThe function adds missing ranges to a table. It supports both `Date` and\n`POSIXct` formats; by default, it assumes the range columns are dates.\n\n``` \n  group      start        end cost score\n1     a 2007-01-01 2008-02-05  143    99\n2     a 2010-06-02 2013-04-05  144    33\n3     b 2009-04-05 2009-06-03  105    44\n4     b 2012-08-01 2013-02-17  153    22\n5     b 2019-03-19 2021-04-21  124    33\n6     c 2020-01-05 2020-01-09  105   105\n7     d 2014-01-01 2014-12-31  153   153\n8     d 2015-01-01 2016-12-31  124   124\n```\n\nThe arguments are almost identical to those of `collapse_ranges`.\n\nBy `dimension` argument you indicate whether your data frame contains\n`date` or `timestamp` values (defaults to dates). `groups` argument is\noptional.\n\nThe output based on the above data frame:\n\n``` r\ndf \u003c- fill_ranges(df, \n                  start_var = \"start\", \n                  end_var = \"end\", \n                  groups = \"group\"\n                  )\n\ndf\n```\n\n``` \n   group      start        end cost score\n1      a 2007-01-01 2008-02-05  143    99\n2      a 2008-02-06 2010-06-01   NA    NA\n3      a 2010-06-02 2013-04-05  144    33\n4      b 2009-04-05 2009-06-03  105    44\n5      b 2009-06-04 2012-07-31   NA    NA\n6      b 2012-08-01 2013-02-17  153    22\n7      b 2013-02-18 2019-03-18   NA    NA\n8      b 2019-03-19 2021-04-21  124    33\n9      c 2020-01-05 2020-01-09  105   105\n10     d 2014-01-01 2014-12-31  153   153\n11     d 2015-01-01 2016-12-31  124   124\n```\n\nAs you can see, all the original variables are returned.\n\nThe rows corresponding to added ranges will by default have `NA` in the\ncolumns that are not ranges or grouping variables (in the above case\n`cost` \u0026 `score` variables).\n\nYou can change this behaviour by adjusting the `fill` parameter, like\nbelow:\n\n``` r\ndf \u003c- fill_ranges(df, \n                  groups = \"group\", \n                  start_var = \"start\", \n                  end_var = \"end\", \n                  fill = \"cost = 0, score = Missing\"\n                  )\n\ndf\n```\n\n``` \n   group      start        end cost   score\n1      a 2007-01-01 2008-02-05  143      99\n2      a 2008-02-06 2010-06-01    0 Missing\n3      a 2010-06-02 2013-04-05  144      33\n4      b 2009-04-05 2009-06-03  105      44\n5      b 2009-06-04 2012-07-31    0 Missing\n6      b 2012-08-01 2013-02-17  153      22\n7      b 2013-02-18 2019-03-18    0 Missing\n8      b 2019-03-19 2021-04-21  124      33\n9      c 2020-01-05 2020-01-09  105     105\n10     d 2014-01-01 2014-12-31  153     153\n11     d 2015-01-01 2016-12-31  124     124\n```\n\nNote that this feature is somewhat experimental \u0026 currently the columns\nto be filled are - for safety reasons - automatically converted to\n*character*. Additional checks \u0026 more flexibility will be added in the\nfuture.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farg0naut91%2Fneatranges","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farg0naut91%2Fneatranges","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farg0naut91%2Fneatranges/lists"}