Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with dataset-generation
A curated list of projects in awesome lists tagged with dataset-generation .
https://github.com/jmp/cluster-generator
A small script for generating random, two-dimensional Gaussian clusters for research purposes. Creates the data vectors based on given parameters, and outputs them into an external file.
clustering dataset-generation generator python research
Last synced: 10 Jan 2025
https://github.com/swghosh/ra-face-partner
A simple iOS app that detects faces using OpenCV (over ObjC++ wrapper) and stores the images directory in Files app.
dataset-generation face-detection objective-c-plus-plus opencv
Last synced: 27 Dec 2024
https://github.com/alvinwan/ale-tracker
track and log actions for a human player via Arcade Learning Environment
arcade-learning-environment dataset-generation
Last synced: 11 Jan 2025
https://github.com/alvinwan/gym-tracker
track and log actions for a human player via OpenAI Gym
dataset dataset-generation openai-gym
Last synced: 11 Jan 2025
https://github.com/viyadb/events-generator
Framework for simulating mobile application user activity
dataset-generation datasets events kafka mobile-user simulator user-value
Last synced: 19 Nov 2024
https://github.com/simatwa/movies-dataset
A collection of movies dataset for your ML project or any other task.
data-science database dataset-generation datasets movie-data-analysis movie-dataset
Last synced: 15 Jan 2025
https://github.com/nisaaragharia/mass_summarization
Large Scale Dataset Cleaning (Summarization and Information Extraction) Using LLAMA2 70B
data-science dataset dataset-generation llama2 llms summarization
Last synced: 23 Nov 2024
https://github.com/ellisbrown/in100
Python script for generating an ImageNet-100 subset using symlinks to an existing ImageNet installation.
computer-vision dataset-generation deep-learning imagenet
Last synced: 28 Nov 2024
https://github.com/soft/lastfm-archiver
Tool for archiving last.fm listening data into a local SQLite database
archiving database dataset-generation lastfm lastfm-api music sqlite
Last synced: 29 Nov 2024
https://github.com/shamspias/videolabelmagic
Automate video data creation. Extract frames, generate annotations, export in various formats (YOLOv8, YOLO1.1). Integrate with Roboflow, CVAT. Built with Python and Streamlit.
annotation-tool artificial-intelligence auto-annotation auto-labeling auto-segmentation computer-vision computer-vision-annotation data-processing dataset-generation image-annotation-tool image-labeling-tool labeling-tool machine-learning object-detection python rt-detr small-object-detection streamlit video-annotation yolo
Last synced: 04 Dec 2024
https://github.com/sam0x17/image_labeler
a utility for generating VOC image annotations
dataset-generation deep-learning image-annotation image-classification imagenet voc
Last synced: 20 Dec 2024
https://github.com/paragekbote/notus_aegis-v1_recipe
An Innovative approach leveraging Large Language Models (LLMs) to synthesize high-quality datasets.
dataset dataset-creation dataset-generation huggingface natural-language-processing notus
Last synced: 17 Jan 2025
https://github.com/aliyildizoz/exchangedataextraction
Makine Öğrenmesi dersi için veri seti oluşturma.
csharp dataset-generation term-project
Last synced: 14 Jan 2025
https://github.com/pratap2018/dataset_generator
dataset-generation dataset-generator
Last synced: 14 Jan 2025
https://github.com/maxadams0/starryright
Dataset preparation for VikX
astronomy dataset-generation deep-neural-networks python
Last synced: 15 Jan 2025
https://github.com/jeffin07/datasetgenerator
This is a python script which generate a dataset by your limited data
dataset-generation deep-learning image-augmentation limited-data machine-learning python
Last synced: 16 Dec 2024
https://github.com/skywardai/cecilia
EDA tools and datasets generator for ML projects
dataset dataset-generation eda embeddings
Last synced: 11 Nov 2024
https://github.com/stupidcucumber/simplecoco
Simple dataset creator in COCO-format.
annotating-images coco dataset-generation goal object-detection python yolo-dataset
Last synced: 18 Nov 2024
https://github.com/brettdavies/yt_dlp_async
Asynchronous tool for fetching and processing YouTube audio data using yt-dlp, designed to aid scalable and efficient data collection and analysis for academic research projects.
academic-project asynchronous-io audio-downloader audio-metadata data-collection dataset-generation parallel-processing-workerpool poetry poetryapp python-poetry research-project youtube yt-dlp yt-dlp-wrapper
Last synced: 13 Dec 2024
https://github.com/silviatulli/dyadic-minigrid
multiplayer game and chat for collecting data on human counterfactual explanations in a collaborative learning task
counterfactual-explanations data-driven-reinforcement-learning dataset-generation gym-minigrid multiplayer-game nodejs socket-io
Last synced: 18 Jan 2025
https://github.com/epsoft/dataset-generator
dataset generator
data dataset dataset-generation matplotlib matplotlib-figures tensorflow tensorflow-datasets
Last synced: 11 Jan 2025
https://github.com/Brandon82/llm-dataset-gen
Using LLMs (OpenAI API) to generate and add data to datasets
dataset-generation datasets openai openai-api
Last synced: 06 Jan 2025
https://github.com/atelierarith/segrcdb.jl
Unofficial Julia implementation of SegRCDB.jl
dataset dataset-generation fdsl julia-language machine-learning-algorithms
Last synced: 21 Nov 2024
https://github.com/v411e/steghide
Dockerfile for http://www.steghide.sourceforge.net
dataset-generation docker docker-compose steganography steghide
Last synced: 25 Nov 2024
https://github.com/jack-development/redditfetch
RedditFetch is a robust tool for collecting and managing Reddit user data using Python and PRAW. It fetches posts and comments, assigns unique IDs, and structures the data seamlessly for easy access and analysis.
api dataset-generation praw pytorch reddit
Last synced: 23 Nov 2024
https://github.com/sanderhelleso/maxprofitstockmarket
Each day the stock price fluctuates during the day. Given an array of stock prices, what would be the most efficient way to determine the best time to buy and sell to get the max profit. You must buy before you sell.
csv dataset-generation profit-calculator stock-data stock-market
Last synced: 07 Dec 2024
https://github.com/aitor-alvarez/mir-song-dataset-collection
Scripts to create Music Information Retrieval datasets from streaming services for singer identification tasks
audio-signal-processing dataset-generation deep-learning-dataset machine-learning-dataset music-information-retrieval singer-identification-tasks
Last synced: 25 Nov 2024
https://github.com/colddsam/modeyolo
ModeYOLO: Elevate image processing with this Python package. Seamlessly perform color space transformations, simplify dataset modification for deep learning, and leverage OpenCV and NumPy. Ideal for YOLO projects, computer vision tasks, and efficient machine learning workflows.
dataset dataset-generation open-source opencv python pythonpackage ultralytics yolo
Last synced: 17 Jan 2025
https://github.com/hqarroum/number-generator
🍪 A multi-threaded generator allowing to create a large dataset of random numbers.
dataset-generation generator number random
Last synced: 01 Jan 2025
https://github.com/jobayer/oasis4-jpg-data-and-label-extractor
The Python script to extract JPG images and categorize them into different classes of dementia using the NIFTI data
dataset dataset-generation machine-learning oasis oasis-4 oasis-dataset
Last synced: 25 Dec 2024
https://github.com/furk4nbulut/webtrafficqa-responder-ai-rag
A Question-Answering (Q&A) system leveraging web traffic logs. The system is designed to handle natural language questions from users, analyze the relevant traffic log data, and generate accurate and contextually appropriate responses.
dataset-generation faiss-vector-database rag
Last synced: 10 Jan 2025
https://github.com/dimits-ts/synthetic_moderation_experiments
Experiments relating to synthetic LLM user-agents and LLM facilitators in online discussions
data-analysis dataset-generation llms llms-reasoning nlp
Last synced: 27 Dec 2024