Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dlab-berkeley/awesome-dlab

😎 Awesome lists about all kinds of topics and tools interesting to D-Labbers
https://github.com/dlab-berkeley/awesome-dlab

List: awesome-dlab

awesome awesome-list lists resources social-sciences ucberkeley

Last synced: 16 days ago
JSON representation

😎 Awesome lists about all kinds of topics and tools interesting to D-Labbers

Awesome Lists containing this project

README

        

# awesome-dlab
😎 Awesome lists about all kinds of topics and tools interesting to D-Labbers

**What is an awesome list?**

Only put stuff on the list that you or another D-Labber can personally recommend. You should rather leave stuff out than include too much. Read the [Awesome Manifesto](https://github.com/sindresorhus/awesome/blob/master/awesome.md) to find out more what this list is about.

Or if you'd like to check out stuff that is awesome to people outside of D-Lab, then start here: [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)

## Contents

- [Datasets](#datasets)
- [Natural Language Processing (NLP)](#natural-language-processing-nlp)
- [Rosetta Stones](#rosetta-stones)
- [R](#r)
- [Python](#python)
- [PDF](#pdf)
- [Databases](#databases)
- [Systems Administration](#systems-administration)
- [Cloud computing](#cloud-computing)
- [Reproducibility](#reproducibility)

## Datasets
* [Case.Law](https://case.law/) - all official, book-published United States case law — every volume designated as an official report of decisions by a court within the United States.
* [DEA Pain Pills Database](https://www.washingtonpost.com/national/2019/07/18/how-download-use-dea-pain-pills-database/) - The Washington Post published a significant portion of a database that tracks the path of every opioid pain pill, from manufacturer to pharmacy, in the United States between 2006 and 2012.
* [Awesome Public Data](https://github.com/awesomedata/awesome-public-datasets) - list of a topic-centric public data sources collected and tidied from blogs, answers, and user responses.
* [tidytweetjson](https://github.com/jaeyk/tidytweetjson) - R package for Turning Tweet JSON Files into a Tidyverse-ready Dataframe. The package takes 18 minutes to turn 1 million tweets into a dataframe.
* [tidyethnicnews](https://github.com/jaeyk/tidyethnicnews) - R package for turning one of the largest databases on ethnic newspapers and magazines (Ethnic NewsWatch) into a tidyverse-ready dataframe. The package takes 0.0005 seconds to turn 100 newspaper articles into a tidy dataframe.
* [California COVID Assessment Tool](https://github.com/StateOfCalifornia/CalCAT) - This repository contains an application written in Shiny and for use with any US state to assist in assessing the many different models available for understanding COVID-19 transmission and spread. It brings together several data sources that are publicly available, and can be supplemented with your own data to improve the assessment.

## Natural Language Processing (NLP)
* [Tracking Progress in Natural Language Processing](https://github.com/sebastianruder/NLP-progress) - Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

## Rosetta Stones
* [Rosetta: Python, R, Stata Rosetta Stone. Projects implemented in each language side-by-side.](https://github.com/adamrossnelson/rosetta)
* [Stata to Pandas Cross-Walk](https://github.com/adamrossnelson/StataQuickReference/blob/master/spcrosswlk.md)
* [Data Science Rosetta Stone](http://www.datasciencerosettastone.com/) - A Tutorial of and Translation between Data Science Programming Languages

## R
* [Awesome R](https://github.com/qinwf/awesome-R#readme) - more awesomeness related to this topic.

* [rio: A Swiss-Army Knife for Data I/O](https://cran.r-project.org/web/packages/rio/vignettes/rio.html) - Import, Export, and Convert Data Files including web-based import, reading compressed files directly without explicit decompression, and 'convert()' function for converting between file types.

* [makereproducible](https://github.com/jaeyk/makereproducible): R package for making a project computationally reproducible before sharing it

## PDF
* [Working with PDFs in Python](https://stackabuse.com/working-with-pdfs-in-python-reading-and-splitting-pages/) - Describes a range of Python libraries and and examples to work with PDFs: Reading and Splitting Pages; Adding Images and Watermarks; Inserting, Deleting, and Reordering Pages

## Python
* [Awesome Python](https://github.com/vinta/awesome-python#readme) - more awesomeness related to this topic.

## Databases
* [SQLite](http://www.sqlite.org/) - A completely embedded, full-featured relational database in a few 100k that you can include right into your project.
* [sqlitebiter](https://github.com/thombashi/sqlitebiter) - a CLI tool to convert CSV / Excel / HTML / JSON / and many other formats to a SQLite database file.
* [Awesome SQL](https://github.com/danhuss/awesome-sql) - more awesomeness related to this topic.
* [fuzzy string matching with Postgresql](https://www.freecodecamp.org/news/fuzzy-string-matching-with-postgresql/) - examples of different ways to match strings using PostgreSQL and extensions.
* [binder-postgres](https://github.com/ouseful-template-repos/binder-postgres) - Demo of launching a binderhub notebook server with a free running Postgres server. [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ouseful-template-repos/binder-postgres/master?filepath=notebooks%2FTest%20Databases.ipynb)
* [SQL Join Types Explained in Visuals](https://dataschool.com/how-to-teach-people-sql/sql-join-types-explained-visually/) - Simple, useful visual expalanation of joins in SQL.
* [Understanding Joins in Relational data | R for Data Science](https://r4ds.had.co.nz/relational-data.html#understanding-joins) - Visual expalanation of joins in SQL with the addition of R code and variables.

## Bash
* [miller](https://github.com/johnkerl/miller) - With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, and positionally-indexed.
* [q](https://github.com/harelba/q) - Run SQL directly on CSV or TSV files.
* [jq](https://stedolan.github.io/jq) - jq is a lightweight and flexible command-line JSON processor.
* [jid](https://github.com/simeji/jid) - JSON Incremental Digger to drill down interactively by using filtering queries like jq.

## Systems Administration
* [Ops School](http://www.opsschool.org) - Comprehensive program that will help you learn to be an operations engineer.
* [Awesome Sysadmin](https://github.com/kahun/awesome-sysadmin) - more awesomeness related to this topic.

## Cloud Computing
* [Binder](https://mybinder.org/) - To turn a Git repo into a collection of interactive notebooks. A great tool for teaching workshops.

## Reproducibility
* [The Turing Way handbook](https://github.com/alan-turing-institute/the-turing-way#readme) - a handbook to reproducible, ethical and collaborative data science.
* [MRAN Timemachine](https://mran.microsoft.com/timemachine) - For the purpose of reproducibility, MRAN hosts daily snapshots of the CRAN R packages and R releases as far back as Sept. 17, 2014.