Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with datasets

A curated list of projects in awesome lists tagged with datasets .

https://github.com/caesar0301/awesome-public-datasets

A topic-centric list of HQ open datasets.

aaron-swartz awesome-public-datasets datasets opendata

Last synced: 31 Jul 2024

https://github.com/huggingface/datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

computer-vision datasets deep-learning hacktoberfest machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow

Last synced: 29 Sep 2024

https://github.com/tonybeltramelli/pix2code

pix2code: Generating Code from a Graphical User Interface Screenshot

datasets deep-learning deep-neural-networks front-end-development graphical-user-interface

Last synced: 30 Sep 2024

https://github.com/simonw/datasette

An open source multi-tool for exploring and publishing data

asgi automatic-api csv datasets datasette datasette-io docker json python sql sqlite

Last synced: 29 Sep 2024

https://github.com/jindaxiang/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock

Last synced: 03 Aug 2024

https://github.com/akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock

Last synced: 29 Sep 2024

https://github.com/activeloopai/Hub

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

ai computer-vision cv data-science data-version-control datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops python pytorch tensorflow vector-database vector-search

Last synced: 10 Aug 2024

https://github.com/activeloopai/deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

ai computer-vision cv data-science data-version-control datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops python pytorch tensorflow vector-database vector-search

Last synced: 01 Oct 2024

https://github.com/imanneo/fl_chart

FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.

barchart chart charts datasets fl-chart flutter flutter-widget graph hacktoberfest linechart piechart radar-chart radar-graphs scatter-chart scatter-plot

Last synced: 02 Oct 2024

https://github.com/imaNNeo/fl_chart

FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.

barchart chart charts datasets fl-chart flutter flutter-widget graph hacktoberfest linechart piechart radar-chart radar-graphs scatter-chart scatter-plot

Last synced: 31 Jul 2024

https://github.com/imaNNeoFighT/fl_chart

FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.

barchart chart charts datasets fl-chart flutter flutter-widget graph hacktoberfest linechart piechart radar-chart radar-graphs scatter-chart scatter-plot

Last synced: 30 Jul 2024

https://github.com/liuruoze/easypr

(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design

artificial-intelligence artificial-neural-networks chinese-characters computer-vision datasets machine-learning opencv opencv3 plate-recognition supervised-learning support-vector-machines unconstrained-situation

Last synced: 30 Sep 2024

https://github.com/liuruoze/EasyPR

(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design

artificial-intelligence artificial-neural-networks chinese-characters computer-vision datasets machine-learning opencv opencv3 plate-recognition supervised-learning support-vector-machines unconstrained-situation

Last synced: 30 Jul 2024

https://github.com/tensorflow/datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

data dataset datasets jax machine-learning numpy tensorflow

Last synced: 29 Sep 2024

https://github.com/roapi/roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets

Last synced: 31 Jul 2024

https://github.com/justinzm/gopup

数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…

covid19-data data data-analysis data-science datasets economic-data gopup index-data python

Last synced: 30 Sep 2024

https://github.com/microsoft/torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

computer-vision datasets deep-learning earth-observation geospatial models pytorch remote-sensing satellite-imagery torchvision transforms

Last synced: 30 Sep 2024

https://github.com/freedomintelligence/medical_nlp

Medical NLP Competition, dataset, large models, paper

collection datasets list medical models nlp

Last synced: 30 Sep 2024

https://github.com/snap-stanford/ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning

datasets deep-learning graph-machine-learning graph-neural-networks

Last synced: 30 Sep 2024

https://github.com/diffgram/diffgram

The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

annotation annotation-tool annotations data data-analytics data-annotation data-science datasets datastore deep-learning image-annotation kubernetes labeling machine-learning training-data video-annotation

Last synced: 30 Jul 2024

https://github.com/chineseglue/chineseglue

Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard

albert bert chinese-corpus datasets glue language-understanding nlp pre-trained-model

Last synced: 30 Sep 2024

https://github.com/ChineseGLUE/ChineseGLUE

Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard

albert bert chinese-corpus datasets glue language-understanding nlp pre-trained-model

Last synced: 01 Aug 2024

https://github.com/logpai/loghub

A large collection of system log datasets for AI-driven log analytics [ISSRE'23]

anomaly-detection datasets log-analysis log-intelligence log-parsing logs unstructured-logs

Last synced: 30 Sep 2024

https://github.com/juand-r/entity-recognition-datasets

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

annotations corpora datasets entity-extraction entity-recognition named-entity-recognition natural-language-processing ner nlp nlp-resources

Last synced: 30 Sep 2024

https://github.com/explosion/projects

🪐 End-to-end NLP workflows from prototype to production

annotations datasets natural-language-processing nlp prodigy spacy

Last synced: 30 Sep 2024

https://github.com/PolyAI-LDN/conversational-datasets

Large datasets for conversational AI

conversational-ai datasets machine-learning

Last synced: 02 Aug 2024

https://github.com/eosphoros-ai/db-gpt-hub

A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL

database datasets fine-tuning gpt llm nl2sql sql text-to-sql text2sql

Last synced: 02 Aug 2024

https://github.com/RUC-NLPIR/FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research

benchmark datasets large-language-models retrieval-augmented-generation

Last synced: 11 Sep 2024

https://github.com/caserec/Datasets-for-Recommender-Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

data-science database datasets public-data recommender-systems

Last synced: 08 Aug 2024

https://github.com/iamaziz/PyDataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 07 Aug 2024

https://github.com/JizhiziLi/GFM

[IJCV 2022] Bridging Composite and Real: Towards End-to-end Deep Image Matting

animal-matting composition datasets image-matting matting segmentation

Last synced: 30 Jul 2024

https://github.com/CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

albert bert chinese chinese-corpus corpus datasets nlp pretrain roberta

Last synced: 03 Aug 2024

https://github.com/ipeaGIT/geobr

Easy access to official spatial data sets of Brazil in R and Python

brazil datasets geopackage geopandas python r rstats sf shapefile spatial-data

Last synced: 30 Jul 2024

https://github.com/saltudelft/ml4se

A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering

ai4code ai4se code datasets deep-learning llm4code machine-learning ml4code ml4se papers research software-engineering theses tools tudelft

Last synced: 01 Aug 2024

https://github.com/st-tech/zr-obp

Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation

contextual-bandits datasets multi-armed-bandits off-policy-evaluation research

Last synced: 02 Aug 2024

https://github.com/huggingface/dataset-viewer

Lightweight web API for visualizing and exploring any dataset - computer vision, speech, text, and tabular - stored on the Hugging Face Hub

api-rest data datasets huggingface machine-learning nlp

Last synced: 09 Aug 2024

https://github.com/openvinotoolkit/datumaro

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.

coco computer-vision dataset datasets deep-learning format-converter imagenet neural-networks openvino-toolkit pascal-voc yolo

Last synced: 02 Aug 2024

https://github.com/BaseModelAI/cleora

Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

ai cleora-embeddings datasets deepwalk embeddings entity graphs hypergraphs inductive-entity-embeddings machine-learning ml pytorch-biggraph synerise

Last synced: 02 Aug 2024

https://github.com/opencsgs/csghub-server

csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.

ai datasets golang huggingface llm models platform

Last synced: 28 Sep 2024

https://github.com/scale3-labs/langtrace

Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊

ai datasets evaluations gpt langchain llm llm-framework llmops observability open-source open-telemetry openai prompt-engineering tracing

Last synced: 26 Sep 2024

https://github.com/Yuan-ManX/ai-audio-datasets

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

aigc artificial-intelligence audio audio-effect audio-generation datasets deep-learning machine-learning music-generation

Last synced: 31 Jul 2024

https://github.com/CelebV-HQ/CelebV-HQ

[ECCV 2022] CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

datasets gans generative-model

Last synced: 02 Aug 2024

https://github.com/CambridgeUniversityPress/FirstCourseNetworkScience

Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis

datasets indiana-university network-science networkx python social-network textbook tutorials

Last synced: 26 Sep 2024

https://github.com/MOLAorg/mola

A Modular Optimization framework for Localization and mApping (MOLA)

computer-vision cxx cxx17 datasets graph-slam lidar lidar-point-cloud localization mobile-robots slam toolkit visual-slam

Last synced: 02 Aug 2024

https://github.com/Koziev/NLP_Datasets

My NLP datasets for Russian language

datasets nlp nlp-resources

Last synced: 02 Aug 2024

https://cambridgeuniversitypress.github.io/FirstCourseNetworkScience/

Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis

datasets indiana-university network-science networkx python social-network textbook tutorials

Last synced: 02 Aug 2024

https://github.com/jumpingrivers/datasauRus

R Package 📦 Containing the Datasaurus Dozen datasets :bar_chart:

anscombesquartet datasaurus datasaurus-dozen datasets r r-package rstats summary-statistics

Last synced: 30 Jul 2024

https://github.com/weecology/retriever

Quickly download, clean up, and install public datasets into a database management system

data data-retrieval data-science dataset datasets hacktobefest python

Last synced: 01 Aug 2024

https://github.com/waico/SKAB

SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.

algorithms-evaluation anomaly-detection benchmark changepoint-detection collective-anomalies dataset datasets leaderboard outlier-detection skab skoltech

Last synced: 01 Aug 2024

https://github.com/waico/SkAB

SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.

algorithms-evaluation anomaly-detection benchmark changepoint-detection collective-anomalies dataset datasets leaderboard outlier-detection skab skoltech

Last synced: 30 Jul 2024

https://github.com/fjxmlzn/DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

dataset-generation datasets doppelganger fidelity gan gans generative-adversarial-network privacy synthetic-data synthetic-data-generation synthetic-data-generator synthetic-dataset-generation time-series timeseries

Last synced: 04 Aug 2024

https://github.com/merantix-momentum/squirrel-core

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed jax machine-learning ml natural-language-processing nlp python pytorch tensorflow

Last synced: 01 Aug 2024

https://github.com/visipedia/annotation_tools

Visipedia Annotation Tools

annotations computer-vision datasets

Last synced: 01 Aug 2024

https://github.com/natasha/corus

Links to Russian corpora + Python functions for loading and parsing

corpora datasets nlp python russian

Last synced: 02 Aug 2024

https://github.com/amirshnll/Persian-Swear-Words

Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها

dataset datasets farsi farsiswear farsiswearword nlp nlp-dataset persian persiandataset persianswearword swear sweardataset swearword

Last synced: 04 Aug 2024

https://github.com/SuperKogito/SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.

audio audio-datasets datasets emotions emotions-recognition multimodal-emotion-recognition speech speech-emotion-recognition

Last synced: 02 Aug 2024

https://github.com/vega/vega-datasets

Common repository for example datasets used by Vega-related projects

datasets

Last synced: 12 Aug 2024

https://github.com/Farama-Foundation/Minari

A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities

datasets gymnasium offline-rl reinforcement-learning

Last synced: 03 Aug 2024

https://github.com/intel/dffml

The easiest way to use Machine Learning. Mix and match underlying ML libraries and data set sources. Generate new datasets or modify existing ones with ease.

ai-inference ai-machine-learning ai-training analytics asyncio dag data-flow dataflows datasets dffml event-based flow-based-programming frameworks hyperautomation libraries machine-learning models pipelines python swrepo

Last synced: 03 Aug 2024

https://github.com/AgaMiko/bird-recognition-review

A list of useful resources in the bird sound (song and calls) recognition, such as datasets, papers, links to open source projects and competitions

bird-detection bird-recognition bird-songs bird-species classification convolutional-neural-network datasets review survey

Last synced: 31 Jul 2024

https://github.com/ylogx/aesthetics

Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader

aesthetic aesthetics ava dataset dataset-creation dataset-generation datasets fisher-vectors image image-aesthetic-visual-analysis image-processing live

Last synced: 02 Aug 2024

https://github.com/OlafenwaMoses/IdenProf

IdenProf dataset is a collection of images of identifiable professionals. It is been collected to enable the development of AI systems that can serve by identifying people and the nature of their job by simply looking at an image, just like humans can do.

ai-systems computer-vision datasets humans idenprof-dataset identifying-people image-classification image-recognition images machine-intelligence machine-learning machine-vision people professionals

Last synced: 03 Aug 2024