An open API service indexing awesome lists of open source software.

https://github.com/coatless/raw-data


https://github.com/coatless/raw-data

Last synced: 3 months ago
JSON representation

Awesome Lists containing this project

README

          

# raw-data

Within this repository, we include a collection of data in "raw form" or "flat file format". The goal is to be able to ingest this data into a statistical computing environment like _R_ or _Python_.

## CSV

- [Salaries.csv](Salaries.csv) from the [`carData`](https://cran.r-project.org/package=carData) _R_ package. ([Details](https://cran.r-project.org/web/packages/carData/carData.pdf#page=43))
- [pima.csv](pima.csv) from the [`faraway`](https://cran.r-project.org/package=faraway) _R_ package. ([Details](https://cran.r-project.org/web/packages/faraway/faraway.pdf#page=74))
- [gap-every-five-years.csv](gap-every-five-years.csv) from [`gapminder`](https://github.com/jennybc/gapminder/tree/main/data-raw)
- [xbox-7-day-auctions.csv](xbox-7-day-auctions.csv) from [modelingonlineauctions.com/](http://www.modelingonlineauctions.com/datasets).
- [1976-2020-senate.csv](1976-2020-senate.csv)
- [surreal-residual.csv](surreal-residual.csv) based on [Residual (Sur)Realism by Leonard A Stefanski (2012)](https://doi.org/10.1198/000313007X190079)
- [vehicles.csv](vehicles.csv) from the [`fueleconomy`](https://cran.r-project.org/package=fueleconomy) _R_ package.
- [common.csv](common.csv) from the [`fueleconomy`](https://cran.r-project.org/package=fueleconomy) _R_ package.
- [flights.csv](Salaries.csv) from the [`nycflights13`](https://cran.r-project.org/package=nycflights13) _R_ package. ([Details](https://nycflights13.tidyverse.org/reference/flights.html))
- [ucla-binary-enrollment.csv](ucla-binary-enrollment.csv) from [UCLA's OARC](https://stats.oarc.ucla.edu)
- [tips.csv](tips.csv) from [`seaborn`'s data repository](https://github.com/mwaskom/seaborn-data/blob/master/tips.csv)
- [ramen-ratings-cleaned.csv](ramen-ratings-cleaned.csv) from [Kaggle's `residentmario/ramen-ratings`](https://www.kaggle.com/datasets/residentmario/ramen-ratings) (with minimal cleaning applied).
- [diamonds.csv](diamonds.csv) from [`ggplot2`'s data-raw directory](https://github.com/tidyverse/ggplot2/blob/main/data-raw/diamonds.csv)

## Text

- [subject_heights_tab.txt](subject_heights_tab.txt)
- [subject_heights.txt](subject_heights.txt)

## SQL

- [northwind-dump.sql](northwind-dump.sql) from [**jpwhite3/northwind-SQLite3**](https://github.com/jpwhite3/northwind-SQLite3). ([EER Diagram](https://raw.githubusercontent.com/jpwhite3/northwind-SQLite3/master/Northwind_ERD.png))
- [lahman2016.sqlite](lahman2016.sqlite) from [**jknecht/baseball-archive-sqlite**](https://github.com/jknecht/baseball-archive-sqlite/blob/master/lahman2016.sqlite) ([Details](https://www.seanlahman.com/baseball-archive/statistics/))

- [lahman2019.sqlite](lahman2019.sqlite) from [**WebucatorTraining/lahman-baseball-mysql**](https://github.com/WebucatorTraining/lahman-baseball-mysql/) ([Details](https://www.seanlahman.com/baseball-archive/statistics/), [EER Diagram](https://raw.githubusercontent.com/WebucatorTraining/lahman-baseball-mysql/master/lahman-model.png))

## Excel

- [subject_heights.xlsx](subject_heights.xlsx) built to showcase multiple sheets inside of a single workbook.

## HTML

- [webpage_sample.html](webpage_sample.html)

## Parquet

- [2010-flights-summary.parquet.zip](2010-flights-summary.parquet.zip)