Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-dataset-creation
Curated list of resources for creating original datasets for original Data Science, Machine Learning and AI research and projects
https://github.com/jon-chun/awesome-dataset-creation
Last synced: 4 days ago
JSON representation
-
Tutorials
-
Reading Content
- The Unreasonable Effectiveness of Recurrent Neural Networks - Andrej Karpathy's intro to RNNs.
- Annotated Diffusion - Tutorial on original diffusion model paper with code
-
Diffusion Models
- Learning to Generate Data by Estimating Gradients of the Data Distribution - Video by Yang Song from Stanford. Excellent theory and interesting applications.
- Learning to Generate Data by Estimating Gradients of the Data Distribution - Video by Yang Song from Stanford. Excellent theory and interesting applications.
- Learning to Generate Data by Estimating Gradients of the Data Distribution - Video by Yang Song from Stanford. Excellent theory and interesting applications.
-
-
Libraries
-
Text, Tabular and Time-Series
- Synthea - Synthetic Patient Population Simulator.
- synthpop - A tool for producing synthetic versions of microdata.
- gretel-synthetics - Generative models for structured and unstructured text, tabular, and multi-variate time-series data featuring differentially private learning.
- SDV - Synthetic Data Generator for tabular, relational, and time series data.
- ydata-synthetic - Synthetic structured data generators.
-
Image
- Contrastive Unpaired Translation - Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan.
- StyleGAN 3 - Official PyTorch implementation of StyleGAN3 from NeurIPS 2021.
- Denoising Diffusion Pytorch - Implementation of DDPM
-
Simulation
- AirSim - AirSim is a simulator for drones, cars and more, built on Unreal and Unity engines.
- Nvidia Dataset Synthesizer - NDDS is a UE4 plugin from NVIDIA to empower computer vision researchers to export high-quality synthetic images with metadata.
- OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
- Unity Perception
-
Video
- Video Diffusion Pytorch - Implementation of video diffusion models in pytorch
-
-
Academic Papers
-
Services
-
Algorithmic Privacy
- List of Synthetic Data Startups in 2021 - Not all of these necessarily have APIs.
-
-
Datasets
-
Algorithmic Privacy
- HuggingFace Datasets - Library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks.
- Google Cloud Public Datasets - Publicly available and free machine learning and analytics datasets.
- Kaggle Datasets - Data science and machine learning datasets.
- /r/datasets - A place to share, find, and discuss Datasets.
- Papers with Code - Datasets - The mission of Papers with Code is to create a free and open resource with Machine Learning papers, code, datasets, methods and evaluation tables.
- Awesome Public Datasets - Topic centric, high quality, public data sources
- Google Research Dataset Search - Discover datasets hosted in thousands of repositories across the web
-
Programming Languages
Sub Categories
Keywords
deep-learning
6
artificial-intelligence
4
computer-vision
4
generative-adversarial-network
3
gans
3
machine-learning
3
synthetic-data
3
time-series
2
gan
2
deeplearning
2
generative-model
2
domain-randomization
2
object-detection
2
pose-estimation
2
synthetic-dataset-generation
2
pytorch
2
synthetic-data-generation
1
sdv
1
relational-datasets
1
datageneration
1
datagenerator
1
gan-architectures
1
python3
1
tensorflow2
1
multi-table
1
generativeai
1
generative-ai
1
data-generation
1
tensorflow
1
privacy
1
differential-privacy
1
opendata
1
datasets
1
awesome-public-datasets
1
video-generation
1
text-to-video
1
ddpm
1
segmentation
1
perception
1
detection
1
unreal-engine
1
simulator
1
self-driving-car
1
research
1
platform-independent
1
pixhawk
1
drones
1
deep-reinforcement-learning
1
cross-platform
1
control-systems
1