Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rsheets/jailbreakr
Get out of Excel free.
https://github.com/rsheets/jailbreakr
Last synced: 3 months ago
JSON representation
Get out of Excel free.
- Host: GitHub
- URL: https://github.com/rsheets/jailbreakr
- Owner: rsheets
- Created: 2016-01-24T05:04:04.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-08-18T02:38:06.000Z (about 8 years ago)
- Last Synced: 2024-05-21T03:34:10.063Z (6 months ago)
- Language: R
- Size: 56.6 KB
- Stars: 89
- Watchers: 16
- Forks: 9
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - rsheets/jailbreakr - Get out of Excel free. (R)
README
# jailbreakr
**Warning: This project is in the early scoping stages; do not use for anything other than amusement/frustration purposes**
Data Liberator. To extract tabular data people put in nontabular structures in a program designed to hold tables.
![](http://i.giphy.com/SEp6Zq6ZkzUNW.gif)
## Installation
Requires the development version of xml2 (for `xml_find_lgl`) as well as [cellrangr](https://github.com/rsheets/cellranger) and [linen](https://github.com/rsheets/linen). Chances are you'll want [rexcel](https://github.com/rsheets/rexcel) too.
```r
devtools::install_github(c("hadley/xml2",
"rsheets/linen",
"rsheets/cellranger",
"rsheets/rexcel",
"rsheets/jailbreakr"))
```## Goals
There are two large excel spreadsheet corpora; it would be nice to use these to get a feel for what fraction of spreadsheets we can handle or the range of non-table-like data out there.
![the things people do to data](http://replygif.net/i/514.gif)
The first is the [EUSES corpus](http://openscience.us/repo/spreadsheet/euses.html) of 4,447 spreadsheets (16,853 worksheets). This is all xls files (rather than xlsx) and therefore need either an [xls -> xlsx conversion](http://bit.ly/1P2rMGr) or support in jailbreakr for xls files.
The second, larger, one is the [Enron corpus](http://www.felienne.com/archives/3634) of 15,770 spreadsheets (79,983)
# Roadmap
* data structure package:
- linen? General representation of spreadsheet data, plus some limited low-level operations on that data
- depends on cell ranger, tibble
- constructor function
- print methods
- subsetting, range extraction etc.
- plot method - for quickly getting a feel for structure, or a shiny app
- summary: this has n sheets, no formulae, 3 plots, etc, things about the references between the sheets?
- where it came from (excel, googlesheet, etc), with filenames, reference ids etc.
- probably needs references to handle multiple sheets and formulae within them, definitely if we need to do things with plots, but make them immutable at first?
- md5 or other "id" so that we can see if the upstream source has changed. This is different for googlesheets where the id is properly baked into the sheet* low level packages:
- googlesheets
- rexcel
- these depend on linen, and will have to provide things like ids and filenames to satisfy all the features that linen will do.* jailbreakr
- uses output in linen format that is provided by googlesheets or rexcel# Ideas
Can we feed things through openrefine or something?