Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with datasets
A curated list of projects in awesome lists tagged with datasets .
https://github.com/caesar0301/awesome-public-datasets
A topic-centric list of HQ open datasets.
aaron-swartz awesome-public-datasets datasets opendata
Last synced: 31 Jul 2024
https://github.com/huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
computer-vision datasets deep-learning hacktoberfest machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow
Last synced: 29 Sep 2024
https://github.com/HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
annotation annotation-tool annotations boundingbox computer-vision data-labeling dataset datasets deep-learning image-annotation image-classification image-labeling image-labelling-tool label-studio labeling labeling-tool mlops semantic-segmentation text-annotation yolo
Last synced: 31 Jul 2024
https://github.com/HumanSignal/label-studio?fbclid=IwAR30j2OmVMcB-TenAczkNwwUsObi8JAOpTNxGFzrmMrJ2pd4-gg_S0D3S78
Label Studio is a multi-type data labeling and annotation tool with standardized output format
annotation annotation-tool annotations boundingbox computer-vision data-labeling dataset datasets deep-learning image-annotation image-classification image-labeling image-labelling-tool label-studio labeling labeling-tool mlops semantic-segmentation text-annotation yolo
Last synced: 02 Aug 2024
https://github.com/heartexlabs/label-studio?fbclid=IwAR30j2OmVMcB-TenAczkNwwUsObi8JAOpTNxGFzrmMrJ2pd4-gg_S0D3S78
Label Studio is a multi-type data labeling and annotation tool with standardized output format
annotation annotation-tool annotations boundingbox computer-vision data-labeling dataset datasets deep-learning image-annotation image-classification image-labeling image-labelling-tool label-studio labeling labeling-tool mlops semantic-segmentation text-annotation yolo
Last synced: 18 Aug 2024
https://github.com/humansignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
annotation annotation-tool annotations boundingbox computer-vision data-labeling dataset datasets deep-learning image-annotation image-classification image-labeling image-labelling-tool label-studio labeling labeling-tool mlops semantic-segmentation text-annotation yolo
Last synced: 29 Sep 2024
https://github.com/heartexlabs/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
annotation annotation-tool annotations boundingbox computer-vision data-labeling dataset datasets deep-learning image-annotation image-classification image-labeling image-labelling-tool label-studio labeling labeling-tool mlops semantic-segmentation text-annotation yolo
Last synced: 30 Jul 2024
https://github.com/tonybeltramelli/pix2code
pix2code: Generating Code from a Graphical User Interface Screenshot
datasets deep-learning deep-neural-networks front-end-development graphical-user-interface
Last synced: 30 Sep 2024
https://github.com/doccano/doccano
Open source annotation tool for machine learning practitioners.
annotation-tool data-labeling dataset datasets machine-learning natural-language-processing nuxt nuxtjs python text-annotation vue vuejs
Last synced: 29 Sep 2024
https://github.com/simonw/datasette
An open source multi-tool for exploring and publishing data
asgi automatic-api csv datasets datasette datasette-io docker json python sql sqlite
Last synced: 29 Sep 2024
https://github.com/cleanlab/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
active-learning annotation data-analysis data-centric-ai data-cleaning data-curation data-labeling data-profiling data-quality data-science data-validation dataops dataquality datasets labeling llms noisy-labels out-of-distribution-detection outlier-detection weak-supervision
Last synced: 31 Jul 2024
https://github.com/jindaxiang/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock
Last synced: 03 Aug 2024
https://github.com/akfamily/akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock
Last synced: 29 Sep 2024
https://github.com/satellite-image-deep-learning/techniques
Techniques for deep learning with satellite & aerial imagery
convolutional-neural-networks dataset datasets deep-learning deep-neural-networks earth-observation image-classification keras machine-learning object-detection python pytorch remote-sensing satellite-data satellite-imagery satellite-images sentinel tensorflow
Last synced: 29 Sep 2024
https://github.com/activeloopai/Hub
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
ai computer-vision cv data-science data-version-control datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops python pytorch tensorflow vector-database vector-search
Last synced: 10 Aug 2024
https://github.com/activeloopai/deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
ai computer-vision cv data-science data-version-control datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops python pytorch tensorflow vector-database vector-search
Last synced: 01 Oct 2024
https://github.com/imanneo/fl_chart
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.
barchart chart charts datasets fl-chart flutter flutter-widget graph hacktoberfest linechart piechart radar-chart radar-graphs scatter-chart scatter-plot
Last synced: 02 Oct 2024
https://github.com/imaNNeo/fl_chart
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.
barchart chart charts datasets fl-chart flutter flutter-widget graph hacktoberfest linechart piechart radar-chart radar-graphs scatter-chart scatter-plot
Last synced: 31 Jul 2024
https://github.com/imaNNeoFighT/fl_chart
FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.
barchart chart charts datasets fl-chart flutter flutter-widget graph hacktoberfest linechart piechart radar-chart radar-graphs scatter-chart scatter-plot
Last synced: 30 Jul 2024
https://github.com/liuruoze/easypr
(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design
artificial-intelligence artificial-neural-networks chinese-characters computer-vision datasets machine-learning opencv opencv3 plate-recognition supervised-learning support-vector-machines unconstrained-situation
Last synced: 30 Sep 2024
https://github.com/liuruoze/EasyPR
(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design
artificial-intelligence artificial-neural-networks chinese-characters computer-vision datasets machine-learning opencv opencv3 plate-recognition supervised-learning support-vector-machines unconstrained-situation
Last synced: 30 Jul 2024
https://github.com/tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
data dataset datasets jax machine-learning numpy tensorflow
Last synced: 29 Sep 2024
https://github.com/cluebenchmark/cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
chinese corpus datasets knowledge-graph machine-reading-comprehension machine-translation match ner nlp qa sentiment-analysis text-classification text-similarity text-summarization
Last synced: 30 Sep 2024
https://github.com/CLUEbenchmark/CLUEDatasetSearch
搜索所有中文NLP数据集,附常用英文NLP数据集
chinese corpus datasets knowledge-graph machine-reading-comprehension machine-translation match ner nlp qa sentiment-analysis text-classification text-similarity text-summarization
Last synced: 31 Jul 2024
https://github.com/arize-ai/phoenix
AI Observability & Evaluation
ai-monitoring ai-observability ai-roi aiengineering datasets llm-eval llmops ml-observability mlops model-observability
Last synced: 01 Oct 2024
https://github.com/roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets
Last synced: 31 Jul 2024
https://github.com/justinzm/gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
covid19-data data data-analysis data-science datasets economic-data gopup index-data python
Last synced: 30 Sep 2024
https://github.com/zhulf0804/3D-PointCloud
Papers and Datasets about Point Cloud.
autonomous-driving classification completion datasets detection generation monocular papers point-cloud registration segmentation
Last synced: 30 Jul 2024
https://github.com/microsoft/torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
computer-vision datasets deep-learning earth-observation geospatial models pytorch remote-sensing satellite-imagery torchvision transforms
Last synced: 30 Sep 2024
https://github.com/github/codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
bert cnn data data-science datasets deep-learning machine-learning machine-learning-on-source-code ml natural-language-processing neural-networks nlp nlp-machine-learning open-data programming-language-theory python representation-learning rnn self-attention tensorflow
Last synced: 26 Sep 2024
https://github.com/freedomintelligence/medical_nlp
Medical NLP Competition, dataset, large models, paper
collection datasets list medical models nlp
Last synced: 30 Sep 2024
https://github.com/jsbroks/coco-annotator
:pencil2: Web-based image segmentation tool for object detection, localization, and keypoints
annotate-images coco coco-annotator coco-format computer-vision datasets deep-learning detection image-annotation image-labeling image-segmentation label machine-learning
Last synced: 30 Sep 2024
https://github.com/colour-science/colour
Colour Science for Python
color color-science color-space color-spaces colorspace colorspaces colour colour-science colour-space colour-spaces colourspace colourspaces data dataset datasets python spectral-data spectral-dataset spectral-datasets
Last synced: 30 Sep 2024
https://github.com/prabhuomkar/pytorch-cpp
C++ Implementation of PyTorch Tutorials for Everyone
artificial-intelligence autograd colab convolutional-neural-network cplusplus datasets generative-adversarial-network interactive-tutorials language-model libtorch machine-learning neural-network pytorch recurrent-neural-network scriptmodule-files tensors torch tutorial
Last synced: 25 Sep 2024
https://github.com/snap-stanford/ogb
Benchmark datasets, data loaders, and evaluators for graph machine learning
datasets deep-learning graph-machine-learning graph-neural-networks
Last synced: 30 Sep 2024
https://github.com/isl-org/open3d-ml
An extension of Open3D to address 3D Machine Learning tasks
3d-object-detection 3d-perception datasets lidar object-detection pretrained-models pytorch rgbd semantic-segmentation tensorflow visualization
Last synced: 30 Sep 2024
https://github.com/diffgram/diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
annotation annotation-tool annotations data data-analytics data-annotation data-science datasets datastore deep-learning image-annotation kubernetes labeling machine-learning training-data video-annotation
Last synced: 30 Jul 2024
https://github.com/chineseglue/chineseglue
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
albert bert chinese-corpus datasets glue language-understanding nlp pre-trained-model
Last synced: 30 Sep 2024
https://github.com/ChineseGLUE/ChineseGLUE
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
albert bert chinese-corpus datasets glue language-understanding nlp pre-trained-model
Last synced: 01 Aug 2024
https://github.com/isl-org/Open3D-ML
An extension of Open3D to address 3D Machine Learning tasks
3d-object-detection 3d-perception datasets lidar object-detection pretrained-models pytorch rgbd semantic-segmentation tensorflow visualization
Last synced: 31 Jul 2024
https://github.com/JuliaData/DataFrames.jl
In-memory tabular data in Julia
data data-frame dataframes datasets hacktoberfest julia tabular-data
Last synced: 01 Aug 2024
https://github.com/juliadata/dataframes.jl
In-memory tabular data in Julia
data data-frame dataframes datasets hacktoberfest julia tabular-data
Last synced: 30 Sep 2024
https://github.com/JuliaStats/DataFrames.jl
In-memory tabular data in Julia
data data-frame dataframes datasets hacktoberfest julia tabular-data
Last synced: 04 Aug 2024
https://github.com/jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
audio-dataset audio-datasets data dataset datasets noise voice voice-activity-detection voice-assistant voice-chat voice-commands voice-computing voice-control voice-conversion voice-dataset voice-datasets voice-recognition voice-synthesis
Last synced: 30 Sep 2024
https://github.com/logpai/loghub
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
anomaly-detection datasets log-analysis log-intelligence log-parsing logs unstructured-logs
Last synced: 30 Sep 2024
https://github.com/juand-r/entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
annotations corpora datasets entity-extraction entity-recognition named-entity-recognition natural-language-processing ner nlp nlp-resources
Last synced: 30 Sep 2024
https://github.com/explosion/projects
🪐 End-to-end NLP workflows from prototype to production
annotations datasets natural-language-processing nlp prodigy spacy
Last synced: 30 Sep 2024
https://github.com/PolyAI-LDN/conversational-datasets
Large datasets for conversational AI
conversational-ai datasets machine-learning
Last synced: 02 Aug 2024
https://github.com/eosphoros-ai/db-gpt-hub
A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL
database datasets fine-tuning gpt llm nl2sql sql text-to-sql text2sql
Last synced: 02 Aug 2024
https://github.com/pku-alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna
Last synced: 27 Sep 2024
https://github.com/PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna
Last synced: 03 Aug 2024
https://github.com/RUC-NLPIR/FlashRAG
⚡FlashRAG: A Python Toolkit for Efficient RAG Research
benchmark datasets large-language-models retrieval-augmented-generation
Last synced: 11 Sep 2024
https://github.com/midas-research/audino
Open source audio annotation tool for humans
annotation-tool audio-annotation audio-processing datasets machine-learning python speech-processing
Last synced: 01 Oct 2024
https://github.com/caserec/Datasets-for-Recommender-Systems
This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)
data-science database datasets public-data recommender-systems
Last synced: 08 Aug 2024
https://github.com/iamaziz/PyDataset
Instant access to many datasets in Python.
Last synced: 07 Aug 2024
https://github.com/mims-harvard/TDC
Therapeutics Commons: Artificial Intelligence Foundation for Therapeutic Science
artificial-intelligence benchmarks bioinformatics biology biomedicine biotech cheminformatics chemistry datasets deep-learning drug-discovery machine-learning medicine precision-medicine therapeutics
Last synced: 01 Aug 2024
https://github.com/JizhiziLi/GFM
[IJCV 2022] Bridging Composite and Real: Towards End-to-end Deep Image Matting
animal-matting composition datasets image-matting matting segmentation
Last synced: 30 Jul 2024
https://github.com/CLUEbenchmark/CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
albert bert chinese chinese-corpus corpus datasets nlp pretrain roberta
Last synced: 03 Aug 2024
https://github.com/zjunlp/prompt4reasoningpapers
[ACL 2023] Reasoning with Language Model Prompting: A Survey
arithmetic-reasoning artificial-intelligence awsome-list chain-of-thought chatgpt commonsense-reasoning datasets gpt-3 language-models large-language-models llm logical-reasoning natural-language-processing nlp paper-list prompt prompt-engineering reasoning survey symbolic-reasoning
Last synced: 02 Aug 2024
https://github.com/zjunlp/Prompt4ReasoningPapers
[ACL 2023] Reasoning with Language Model Prompting: A Survey
arithmetic-reasoning artificial-intelligence awsome-list chain-of-thought chatgpt commonsense-reasoning datasets gpt-3 language-models large-language-models llm logical-reasoning natural-language-processing nlp paper-list prompt prompt-engineering reasoning survey symbolic-reasoning
Last synced: 29 Jul 2024
https://github.com/ipeaGIT/geobr
Easy access to official spatial data sets of Brazil in R and Python
brazil datasets geopackage geopandas python r rstats sf shapefile spatial-data
Last synced: 30 Jul 2024
https://github.com/saltudelft/ml4se
A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering
ai4code ai4se code datasets deep-learning llm4code machine-learning ml4code ml4se papers research software-engineering theses tools tudelft
Last synced: 01 Aug 2024
https://github.com/st-tech/zr-obp
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
contextual-bandits datasets multi-armed-bandits off-policy-evaluation research
Last synced: 02 Aug 2024
https://github.com/huggingface/dataset-viewer
Lightweight web API for visualizing and exploring any dataset - computer vision, speech, text, and tabular - stored on the Hugging Face Hub
api-rest data datasets huggingface machine-learning nlp
Last synced: 09 Aug 2024
https://github.com/mahmoudnafifi/Exposure_Correction
Project page of the paper "Learning Multi-Scale Photo Exposure Correction" (CVPR 2021).
coarse-to-fine color-correction computational-photography cvpr cvpr2021 dataset datasets deep-learning deeplearning exposure-correction image-enhancement low-light-enhance low-light-image multi-scale overexposure-correction underexposure-correction
Last synced: 01 Aug 2024
https://github.com/openvinotoolkit/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.
coco computer-vision dataset datasets deep-learning format-converter imagenet neural-networks openvino-toolkit pascal-voc yolo
Last synced: 02 Aug 2024
https://github.com/BaseModelAI/cleora
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
ai cleora-embeddings datasets deepwalk embeddings entity graphs hypergraphs inductive-entity-embeddings machine-learning ml pytorch-biggraph synerise
Last synced: 02 Aug 2024
https://github.com/juliadata/dataframesmeta.jl
Metaprogramming tools for DataFrames
data data-frame dataframes dataframesmeta datasets hacktoberfest julia tabular-data
Last synced: 29 Sep 2024
https://github.com/JuliaData/DataFramesMeta.jl
Metaprogramming tools for DataFrames
data data-frame dataframes dataframesmeta datasets hacktoberfest julia tabular-data
Last synced: 31 Jul 2024
https://github.com/EagleW/PaperRobot
Code for PaperRobot: Incremental Draft Generation of Scientific Ideas
attention-mechanism datasets end-to-end-learning generation memory-networks natural-language-generation nlp paper-generation pytorch text-generation
Last synced: 04 Aug 2024
https://github.com/CLUEbenchmark/pCLUE
pCLUE: 1000000+多任务提示学习数据集
chinese clue datasets multi-task-learning prompt-learning promptclue zero-shot-learning
Last synced: 31 Jul 2024
https://github.com/opencsgs/csghub-server
csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.
ai datasets golang huggingface llm models platform
Last synced: 28 Sep 2024
https://github.com/scale3-labs/langtrace
Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊
ai datasets evaluations gpt langchain llm llm-framework llmops observability open-source open-telemetry openai prompt-engineering tracing
Last synced: 26 Sep 2024
https://github.com/chaoswork/sft_datasets
开源SFT数据集整理,随时补充
chinese-dataset datasets large-language-models llms supervised-finetuning
Last synced: 02 Aug 2024
https://github.com/Yuan-ManX/ai-audio-datasets
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
aigc artificial-intelligence audio audio-effect audio-generation datasets deep-learning machine-learning music-generation
Last synced: 31 Jul 2024
https://github.com/CelebV-HQ/CelebV-HQ
[ECCV 2022] CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
datasets gans generative-model
Last synced: 02 Aug 2024
https://github.com/CambridgeUniversityPress/FirstCourseNetworkScience
Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis
datasets indiana-university network-science networkx python social-network textbook tutorials
Last synced: 26 Sep 2024
https://github.com/MOLAorg/mola
A Modular Optimization framework for Localization and mApping (MOLA)
computer-vision cxx cxx17 datasets graph-slam lidar lidar-point-cloud localization mobile-robots slam toolkit visual-slam
Last synced: 02 Aug 2024
https://github.com/Koziev/NLP_Datasets
My NLP datasets for Russian language
Last synced: 02 Aug 2024
https://cambridgeuniversitypress.github.io/FirstCourseNetworkScience/
Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis
datasets indiana-university network-science networkx python social-network textbook tutorials
Last synced: 02 Aug 2024
https://github.com/jumpingrivers/datasauRus
R Package 📦 Containing the Datasaurus Dozen datasets :bar_chart:
anscombesquartet datasaurus datasaurus-dozen datasets r r-package rstats summary-statistics
Last synced: 30 Jul 2024
https://github.com/weecology/retriever
Quickly download, clean up, and install public datasets into a database management system
data data-retrieval data-science dataset datasets hacktobefest python
Last synced: 01 Aug 2024
https://github.com/waico/SKAB
SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.
algorithms-evaluation anomaly-detection benchmark changepoint-detection collective-anomalies dataset datasets leaderboard outlier-detection skab skoltech
Last synced: 01 Aug 2024
https://github.com/waico/SkAB
SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.
algorithms-evaluation anomaly-detection benchmark changepoint-detection collective-anomalies dataset datasets leaderboard outlier-detection skab skoltech
Last synced: 30 Jul 2024
https://github.com/fjxmlzn/DoppelGANger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
dataset-generation datasets doppelganger fidelity gan gans generative-adversarial-network privacy synthetic-data synthetic-data-generation synthetic-data-generator synthetic-dataset-generation time-series timeseries
Last synced: 04 Aug 2024
https://github.com/merantix-momentum/squirrel-core
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed jax machine-learning ml natural-language-processing nlp python pytorch tensorflow
Last synced: 01 Aug 2024
https://github.com/visipedia/annotation_tools
Visipedia Annotation Tools
annotations computer-vision datasets
Last synced: 01 Aug 2024
https://github.com/amirshnll/Persian-Swear-Words
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
dataset datasets farsi farsiswear farsiswearword nlp nlp-dataset persian persiandataset persianswearword swear sweardataset swearword
Last synced: 04 Aug 2024
https://github.com/SuperKogito/SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
audio audio-datasets datasets emotions emotions-recognition multimodal-emotion-recognition speech speech-emotion-recognition
Last synced: 02 Aug 2024
https://github.com/vega/vega-datasets
Common repository for example datasets used by Vega-related projects
Last synced: 12 Aug 2024
https://github.com/Farama-Foundation/Minari
A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities
datasets gymnasium offline-rl reinforcement-learning
Last synced: 03 Aug 2024
https://github.com/intel/dffml
The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.
ai-inference ai-machine-learning ai-training analytics asyncio dag data-flow dataflows datasets dffml event-based flow-based-programming frameworks hyperautomation libraries machine-learning models pipelines python swrepo
Last synced: 03 Aug 2024
https://github.com/zjunlp/mol-instructions
[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
ai-for-science biomedical dataset datasets generation iclr2024 instruction instruction-following instructions large-language-models llama mol-instruction molecule natural-language-processing protein resource science
Last synced: 26 Sep 2024
https://github.com/AgaMiko/bird-recognition-review
A list of useful resources in the bird sound (song and calls) recognition, such as datasets, papers, links to open source projects and competitions
bird-detection bird-recognition bird-songs bird-species classification convolutional-neural-network datasets review survey
Last synced: 31 Jul 2024
https://github.com/arthchan2003/AIDL_KB
A Knowledge Base for the FB Group Artificial Intelligence and Deep Learning (AIDL)
aidl artificial-intelligence artificial-neural-networks computer-vision datasets deep-learning machine-learning natural-language-processing scientific-papers
Last synced: 02 Aug 2024
https://github.com/ylogx/aesthetics
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
aesthetic aesthetics ava dataset dataset-creation dataset-generation datasets fisher-vectors image image-aesthetic-visual-analysis image-processing live
Last synced: 02 Aug 2024
https://github.com/OlafenwaMoses/IdenProf
IdenProf dataset is a collection of images of identifiable professionals. It is been collected to enable the development of AI systems that can serve by identifying people and the nature of their job by simply looking at an image, just like humans can do.
ai-systems computer-vision datasets humans idenprof-dataset identifying-people image-classification image-recognition images machine-intelligence machine-learning machine-vision people professionals
Last synced: 03 Aug 2024
https://github.com/obss/jury
Comprehensive NLP Evaluation System
datasets evaluate evaluation huggingface machine-learning metrics natural-language-processing nlp nlp-evaluation python pytorch transformers
Last synced: 02 Aug 2024
https://github.com/The-Osint-Toolbox/Data-Acquisition-OSINT
You can find links to data acquisition websites.
breach breach-check breach-compilation breached breaches combolist data datasets dehash dmp hash hashing leaks password pastebin pastes public stealer-logs stealers username
Last synced: 31 Jul 2024