Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with dataset-generation

A curated list of projects in awesome lists tagged with dataset-generation .

https://github.com/jmp/cluster-generator

A small script for generating random, two-dimensional Gaussian clusters for research purposes. Creates the data vectors based on given parameters, and outputs them into an external file.

clustering dataset-generation generator python research

Last synced: 10 Jan 2025

https://github.com/swghosh/ra-face-partner

A simple iOS app that detects faces using OpenCV (over ObjC++ wrapper) and stores the images directory in Files app.

dataset-generation face-detection objective-c-plus-plus opencv

Last synced: 27 Dec 2024

https://github.com/alvinwan/ale-tracker

track and log actions for a human player via Arcade Learning Environment

arcade-learning-environment dataset-generation

Last synced: 11 Jan 2025

https://github.com/alvinwan/gym-tracker

track and log actions for a human player via OpenAI Gym

dataset dataset-generation openai-gym

Last synced: 11 Jan 2025

https://github.com/viyadb/events-generator

Framework for simulating mobile application user activity

dataset-generation datasets events kafka mobile-user simulator user-value

Last synced: 19 Nov 2024

https://github.com/simatwa/movies-dataset

A collection of movies dataset for your ML project or any other task.

data-science database dataset-generation datasets movie-data-analysis movie-dataset

Last synced: 15 Jan 2025

https://github.com/nisaaragharia/mass_summarization

Large Scale Dataset Cleaning (Summarization and Information Extraction) Using LLAMA2 70B

data-science dataset dataset-generation llama2 llms summarization

Last synced: 23 Nov 2024

https://github.com/ellisbrown/in100

Python script for generating an ImageNet-100 subset using symlinks to an existing ImageNet installation.

computer-vision dataset-generation deep-learning imagenet

Last synced: 28 Nov 2024

https://github.com/soft/lastfm-archiver

Tool for archiving last.fm listening data into a local SQLite database

archiving database dataset-generation lastfm lastfm-api music sqlite

Last synced: 29 Nov 2024

https://github.com/paragekbote/notus_aegis-v1_recipe

An Innovative approach leveraging Large Language Models (LLMs) to synthesize high-quality datasets.

dataset dataset-creation dataset-generation huggingface natural-language-processing notus

Last synced: 17 Jan 2025

https://github.com/aliyildizoz/exchangedataextraction

Makine Öğrenmesi dersi için veri seti oluşturma.

csharp dataset-generation term-project

Last synced: 14 Jan 2025

https://github.com/jeffin07/datasetgenerator

This is a python script which generate a dataset by your limited data

dataset-generation deep-learning image-augmentation limited-data machine-learning python

Last synced: 16 Dec 2024

https://github.com/skywardai/cecilia

EDA tools and datasets generator for ML projects

dataset dataset-generation eda embeddings

Last synced: 11 Nov 2024

https://github.com/spraakbanken/mink-frontend

Vue frontend for Mink

dataset-generation nlp

Last synced: 29 Nov 2024

https://github.com/brettdavies/yt_dlp_async

Asynchronous tool for fetching and processing YouTube audio data using yt-dlp, designed to aid scalable and efficient data collection and analysis for academic research projects.

academic-project asynchronous-io audio-downloader audio-metadata data-collection dataset-generation parallel-processing-workerpool poetry poetryapp python-poetry research-project youtube yt-dlp yt-dlp-wrapper

Last synced: 13 Dec 2024

https://github.com/silviatulli/dyadic-minigrid

multiplayer game and chat for collecting data on human counterfactual explanations in a collaborative learning task

counterfactual-explanations data-driven-reinforcement-learning dataset-generation gym-minigrid multiplayer-game nodejs socket-io

Last synced: 18 Jan 2025

https://github.com/Brandon82/llm-dataset-gen

Using LLMs (OpenAI API) to generate and add data to datasets

dataset-generation datasets openai openai-api

Last synced: 06 Jan 2025

https://github.com/atelierarith/segrcdb.jl

Unofficial Julia implementation of SegRCDB.jl

dataset dataset-generation fdsl julia-language machine-learning-algorithms

Last synced: 21 Nov 2024

https://github.com/v411e/steghide

Dockerfile for http://www.steghide.sourceforge.net

dataset-generation docker docker-compose steganography steghide

Last synced: 25 Nov 2024

https://github.com/jack-development/redditfetch

RedditFetch is a robust tool for collecting and managing Reddit user data using Python and PRAW. It fetches posts and comments, assigns unique IDs, and structures the data seamlessly for easy access and analysis.

api dataset-generation praw pytorch reddit

Last synced: 23 Nov 2024

https://github.com/sanderhelleso/maxprofitstockmarket

Each day the stock price fluctuates during the day. Given an array of stock prices, what would be the most efficient way to determine the best time to buy and sell to get the max profit. You must buy before you sell.

csv dataset-generation profit-calculator stock-data stock-market

Last synced: 07 Dec 2024

https://github.com/aitor-alvarez/mir-song-dataset-collection

Scripts to create Music Information Retrieval datasets from streaming services for singer identification tasks

audio-signal-processing dataset-generation deep-learning-dataset machine-learning-dataset music-information-retrieval singer-identification-tasks

Last synced: 25 Nov 2024

https://github.com/colddsam/modeyolo

ModeYOLO: Elevate image processing with this Python package. Seamlessly perform color space transformations, simplify dataset modification for deep learning, and leverage OpenCV and NumPy. Ideal for YOLO projects, computer vision tasks, and efficient machine learning workflows.

dataset dataset-generation open-source opencv python pythonpackage ultralytics yolo

Last synced: 17 Jan 2025

https://github.com/hqarroum/number-generator

🍪 A multi-threaded generator allowing to create a large dataset of random numbers.

dataset-generation generator number random

Last synced: 01 Jan 2025

https://github.com/jobayer/oasis4-jpg-data-and-label-extractor

The Python script to extract JPG images and categorize them into different classes of dementia using the NIFTI data

dataset dataset-generation machine-learning oasis oasis-4 oasis-dataset

Last synced: 25 Dec 2024

https://github.com/furk4nbulut/webtrafficqa-responder-ai-rag

A Question-Answering (Q&A) system leveraging web traffic logs. The system is designed to handle natural language questions from users, analyze the relevant traffic log data, and generate accurate and contextually appropriate responses.

dataset-generation faiss-vector-database rag

Last synced: 10 Jan 2025

https://github.com/dimits-ts/synthetic_moderation_experiments

Experiments relating to synthetic LLM user-agents and LLM facilitators in online discussions

data-analysis dataset-generation llms llms-reasoning nlp

Last synced: 27 Dec 2024