Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bisaloo/xlcutter

Parse Batches of 'xlsx' Files Based on a Template
https://github.com/bisaloo/xlcutter

data-extraction excel non-rectangular-data r r-package tidy-data

Last synced: 2 months ago
JSON representation

Parse Batches of 'xlsx' Files Based on a Template

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# xlcutter

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit/)
[![R-CMD-check](https://github.com/Bisaloo/xlcutter/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Bisaloo/xlcutter/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/Bisaloo/xlcutter/branch/main/graph/badge.svg)](https://app.codecov.io/gh/Bisaloo/xlcutter?branch=main)
[![lifecycle-concept](https://raw.githubusercontent.com/reconverse/reconverse.github.io/master/images/badge-concept.svg)](https://www.reconverse.org/lifecycle.html#concept)

This package allows you to parse entire folders of non-rectangular 'xlsx' files
into a single rectangular and tidy 'data.frame' based on a custom template file
defining the column names of the output.

## Installation

You can install the latest stable version of this package from CRAN:

``` r
install.packages("xlcutter")
```

or the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("remotes")
remotes::install_github("Bisaloo/xlcutter")
```

## Example

Non-rectangular excel files are common in many domains. For a simple
demonstration here, we use the example of the ["Blue
timesheet"](https://templates.office.com/en-us/blue-timesheet-tm77799521) from
, where employees can log their working hours.

A typical use case of xlcutter in this example would be for a manager who want
to get a single rectangular dataset with the timesheets from different
employees.

![Screenshot of timesheets from two fictitious employees](man/figures/screenshot_timesheets.png)

Your first step to extract the data is to define the various columns you want
in the output in a *template* file. You can mark the data cells to extract with
any custom marker, with the default being `{{ column_name }}`.

![Screenshot of a template for the timesheet example](man/figures/screenshot_template.png)

```{r}
library(xlcutter)

data_files <- list.files(
system.file("example", "timesheet", package = "xlcutter"),
pattern = "\\.xlsx$",
full.names = TRUE
)

template_file <- system.file(
"example", "timesheet_template.xlsx",
package = "xlcutter"
)

xlsx_cutter(
data_files,
template_file
)
```

## Other example of use cases

Other typical use cases for this package could be:

- an hospital that wants to collate non-rectangular information sheets from
different patients into a single rectangular dataset