awesome-dlab
😎 Awesome lists about all kinds of topics and tools interesting to D-Labbers
https://github.com/dlab-berkeley/awesome-dlab
Last synced: 7 days ago
JSON representation
-
Bash
- miller - With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, and positionally-indexed.
- q - Run SQL directly on CSV or TSV files.
- jid - JSON Incremental Digger to drill down interactively by using filtering queries like jq.
-
Cloud Computing
- Binder - To turn a Git repo into a collection of interactive notebooks. A great tool for teaching workshops.
-
Databases
- SQLite - A completely embedded, full-featured relational database in a few 100k that you can include right into your project.
- fuzzy string matching with Postgresql - examples of different ways to match strings using PostgreSQL and extensions.
- SQL Join Types Explained in Visuals - Simple, useful visual expalanation of joins in SQL.
- Understanding Joins in Relational data | R for Data Science - Visual expalanation of joins in SQL with the addition of R code and variables.
- sqlitebiter - a CLI tool to convert CSV / Excel / HTML / JSON / and many other formats to a SQLite database file.
- Awesome SQL - more awesomeness related to this topic.
- binder-postgres - Demo of launching a binderhub notebook server with a free running Postgres server. [](https://mybinder.org/v2/gh/ouseful-template-repos/binder-postgres/master?filepath=notebooks%2FTest%20Databases.ipynb)
- SQL Join Types Explained in Visuals - Simple, useful visual expalanation of joins in SQL.
-
Datasets
- Case.Law - all official, book-published United States case law — every volume designated as an official report of decisions by a court within the United States.
- DEA Pain Pills Database - The Washington Post published a significant portion of a database that tracks the path of every opioid pain pill, from manufacturer to pharmacy, in the United States between 2006 and 2012.
- Awesome Public Data - list of a topic-centric public data sources collected and tidied from blogs, answers, and user responses.
- tidytweetjson - R package for Turning Tweet JSON Files into a Tidyverse-ready Dataframe. The package takes 18 minutes to turn 1 million tweets into a dataframe.
- tidyethnicnews - R package for turning one of the largest databases on ethnic newspapers and magazines (Ethnic NewsWatch) into a tidyverse-ready dataframe. The package takes 0.0005 seconds to turn 100 newspaper articles into a tidy dataframe.
- California COVID Assessment Tool - This repository contains an application written in Shiny and for use with any US state to assist in assessing the many different models available for understanding COVID-19 transmission and spread. It brings together several data sources that are publicly available, and can be supplemented with your own data to improve the assessment.
-
Natural Language Processing (NLP)
- Tracking Progress in Natural Language Processing - Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
-
PDF
- Working with PDFs in Python - Describes a range of Python libraries and and examples to work with PDFs: Reading and Splitting Pages; Adding Images and Watermarks; Inserting, Deleting, and Reordering Pages
-
Python
- Awesome Python - more awesomeness related to this topic.
-
R
- Awesome R - more awesomeness related to this topic.
- rio: A Swiss-Army Knife for Data I/O - Import, Export, and Convert Data Files including web-based import, reading compressed files directly without explicit decompression, and 'convert()' function for converting between file types.
- makereproducible
-
Reproducibility
- MRAN Timemachine - For the purpose of reproducibility, MRAN hosts daily snapshots of the CRAN R packages and R releases as far back as Sept. 17, 2014.
- The Turing Way handbook - a handbook to reproducible, ethical and collaborative data science.
-
Rosetta Stones
- Stata to Pandas Cross-Walk
- Data Science Rosetta Stone - A Tutorial of and Translation between Data Science Programming Languages
- Rosetta: Python, R, Stata Rosetta Stone. Projects implemented in each language side-by-side.
- Data Science Rosetta Stone - A Tutorial of and Translation between Data Science Programming Languages
-
Systems Administration
- Ops School - Comprehensive program that will help you learn to be an operations engineer.
- Awesome Sysadmin - more awesomeness related to this topic.
Categories
Sub Categories
Keywords
json
4
awesome
3
csv
3
tsv
3
r
3
html
2
command-line
2
cli
2
natural-language-processing
2
sqlite
2
sql
2
parsing
2
tidy
2
python
2
database
2
data-science
2
awesome-list
2
streaming-algorithms
1
csv-format
1
statistics
1
data-cleaning
1
statistical-analysis
1
data-processing
1
data-reduction
1
data-regression
1
miller
1
devops
1
devops-tools
1
json-data
1
command-line-tools
1
tool
1
jid
1
golang
1
go
1
opendata
1
datasets
1
awesome-public-datasets
1
aaron-swartz
1
rstats
1
list
1
data-analysis
1
python-resources
1
python-library
1
python-framework
1
collections
1
machine-learning
1
machine-translation
1
named-entity-recognition
1
nlp-tasks
1
oracle
1