Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with datasets

A curated list of projects in awesome lists tagged with datasets .

https://github.com/huggingface/datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

computer-vision datasets deep-learning hacktoberfest machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow

Last synced: 16 Dec 2024

https://github.com/tonybeltramelli/pix2code

pix2code: Generating Code from a Graphical User Interface Screenshot

datasets deep-learning deep-neural-networks front-end-development graphical-user-interface

Last synced: 16 Dec 2024

https://github.com/akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock

Last synced: 16 Dec 2024

https://github.com/simonw/datasette

An open source multi-tool for exploring and publishing data

asgi automatic-api csv datasets datasette datasette-io docker json python sql sqlite

Last synced: 16 Dec 2024

https://github.com/activeloopai/deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

ai computer-vision cv data-science datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops multi-modal python pytorch tensorflow vector-database vector-search

Last synced: 21 Dec 2024

https://github.com/activeloopai/Hub

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

ai computer-vision cv data-science datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops multi-modal python pytorch tensorflow vector-database vector-search

Last synced: 08 Dec 2024

https://github.com/imanneo/fl_chart

FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.

barchart chart charts datasets fl-chart flutter flutter-widget graph hacktoberfest linechart piechart radar-chart radar-graphs scatter-chart scatter-plot

Last synced: 16 Dec 2024

https://github.com/imaNNeo/fl_chart

FL Chart is a highly customizable Flutter chart library that supports Line Chart, Bar Chart, Pie Chart, Scatter Chart, and Radar Chart.

barchart chart charts datasets fl-chart flutter flutter-widget graph hacktoberfest linechart piechart radar-chart radar-graphs scatter-chart scatter-plot

Last synced: 30 Oct 2024

https://github.com/liuruoze/easypr

(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design

artificial-intelligence artificial-neural-networks chinese-characters computer-vision datasets machine-learning opencv opencv3 plate-recognition supervised-learning support-vector-machines unconstrained-situation

Last synced: 17 Dec 2024

https://github.com/liuruoze/EasyPR

(CGCSTCD'2017) An easy, flexible, and accurate plate recognition project for Chinese licenses in unconstrained situations. CGCSTCD = China Graduate Contest on Smart-city Technology and Creative Design

artificial-intelligence artificial-neural-networks chinese-characters computer-vision datasets machine-learning opencv opencv3 plate-recognition supervised-learning support-vector-machines unconstrained-situation

Last synced: 26 Oct 2024

https://github.com/tensorflow/datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

data dataset datasets jax machine-learning numpy tensorflow

Last synced: 17 Dec 2024

https://github.com/roapi/roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

analytics arrow blob-storage cloud-native columnar datafusion datasets delta-lake graphql in-memory-database parquet query query-frontends rest-api rust s3 sql static-datasets

Last synced: 30 Oct 2024

https://github.com/opencsgs/csghub

CSGHub is an open-source large model platform just like on-premise version of Hugging Face. You can easily manage models and datasets, deploy model applications and setup model finetune or inference jobs with user interface. CSGHub also provides Python SDK with full compatibility of hf sdk. Join us together to build a safer and more open platform⭐️

ai datasets huggingface llm management-system models platform

Last synced: 07 Nov 2024

https://github.com/microsoft/torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

computer-vision datasets deep-learning earth-observation geospatial models pytorch remote-sensing satellite-imagery torchvision transforms

Last synced: 17 Dec 2024

https://github.com/justinzm/gopup

数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…

covid19-data data data-analysis data-science datasets economic-data gopup index-data python

Last synced: 19 Dec 2024

https://github.com/freedomintelligence/medical_nlp

Medical NLP Competition, dataset, large models, paper

collection datasets list medical models nlp

Last synced: 03 Dec 2024

https://github.com/snap-stanford/ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning

datasets deep-learning graph-machine-learning graph-neural-networks

Last synced: 18 Dec 2024

https://github.com/diffgram/diffgram

The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

annotation annotation-tool annotations data data-analytics data-annotation data-science datasets datastore deep-learning image-annotation kubernetes labeling machine-learning training-data video-annotation

Last synced: 25 Oct 2024

https://github.com/logpai/loghub

A large collection of system log datasets for AI-driven log analytics [ISSRE'23]

anomaly-detection datasets log-analysis log-intelligence log-parsing logs unstructured-logs

Last synced: 04 Dec 2024

https://github.com/chineseglue/chineseglue

Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard

albert bert chinese-corpus datasets glue language-understanding nlp pre-trained-model

Last synced: 21 Dec 2024

https://github.com/ChineseGLUE/ChineseGLUE

Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard

albert bert chinese-corpus datasets glue language-understanding nlp pre-trained-model

Last synced: 06 Nov 2024

https://github.com/juand-r/entity-recognition-datasets

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

annotations corpora datasets entity-extraction entity-recognition named-entity-recognition natural-language-processing ner nlp nlp-resources

Last synced: 19 Dec 2024

https://github.com/eosphoros-ai/db-gpt-hub

A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL

database datasets fine-tuning gpt hacktoberfest llm nl2sql sql text-to-sql text2sql

Last synced: 19 Dec 2024

https://github.com/luqmaan/awesome-transit

Community list of transit APIs, apps, datasets, research, and software :bus::star2::train::star2::steam_locomotive:

awesome awesome-list bus datasets gtfs gtfs-analysis gtfs-converters gtfs-feed gtfs-files gtfs-libraries gtfs-realtime gtfs-utils gtfs-validator list realtime-data tools transit transit-agencies transit-data transit-map

Last synced: 04 Nov 2024

https://github.com/eosphoros-ai/DB-GPT-Hub

A repository that contains models, datasets, and fine-tuning techniques for DB-GPT, with the purpose of enhancing model performance in Text-to-SQL

database datasets fine-tuning gpt llm nl2sql sql text-to-sql text2sql

Last synced: 24 Oct 2024

https://github.com/explosion/projects

🪐 End-to-end NLP workflows from prototype to production

annotations datasets natural-language-processing nlp prodigy spacy

Last synced: 19 Dec 2024

https://github.com/PolyAI-LDN/conversational-datasets

Large datasets for conversational AI

conversational-ai datasets machine-learning

Last synced: 11 Nov 2024

https://github.com/RUC-NLPIR/FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research

benchmark datasets large-language-models retrieval-augmented-generation

Last synced: 11 Sep 2024

https://github.com/caserec/Datasets-for-Recommender-Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

data-science database datasets public-data recommender-systems

Last synced: 28 Nov 2024

https://github.com/dmitryryumin/iccv-2023-papers

ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!

3d-graphics 3d-reconstruction biometrics computer-vision datasets deep-learning explainable-ai face-recognition gesture-recognition iccv iccv2023 image-processing image-synthesis multimodal-learning pattern-recognition photogrammetry pose-estimation robotics transfer-learning video-synthesis

Last synced: 20 Dec 2024

https://github.com/iamaziz/PyDataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 27 Nov 2024

https://github.com/iamaziz/pydataset

Instant access to many datasets in Python.

data-science datasets python

Last synced: 21 Dec 2024

https://github.com/CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

albert bert chinese chinese-corpus corpus datasets nlp pretrain roberta

Last synced: 16 Nov 2024

https://github.com/JizhiziLi/GFM

[IJCV 2022] Bridging Composite and Real: Towards End-to-end Deep Image Matting

animal-matting composition datasets image-matting matting segmentation

Last synced: 26 Oct 2024

https://github.com/cluebenchmark/cluecorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

albert bert chinese chinese-corpus corpus datasets nlp pretrain roberta

Last synced: 09 Nov 2024

https://github.com/ipeagit/geobr

Easy access to official spatial data sets of Brazil in R and Python

brazil datasets geopackage geopandas python r rstats sf shapefile spatial-data

Last synced: 18 Dec 2024

https://github.com/ipeaGIT/geobr

Easy access to official spatial data sets of Brazil in R and Python

brazil datasets geopackage geopandas python r rstats sf shapefile spatial-data

Last synced: 25 Oct 2024

https://github.com/huggingface/dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.

api-rest data datasets huggingface machine-learning nlp

Last synced: 29 Nov 2024

https://github.com/saltudelft/ml4se

A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering

ai4code ai4se code datasets deep-learning llm4code machine-learning ml4code ml4se papers research software-engineering theses tools tudelft

Last synced: 05 Nov 2024

https://github.com/opencsgs/csghub-server

csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.

ai datasets golang huggingface llm models platform

Last synced: 21 Dec 2024

https://github.com/scale3-labs/langtrace

Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊

ai datasets evaluations gpt langchain llm llm-framework llmops observability open-source open-telemetry openai prompt-engineering tracing

Last synced: 20 Dec 2024

https://github.com/st-tech/zr-obp

Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation

contextual-bandits datasets multi-armed-bandits off-policy-evaluation research

Last synced: 11 Nov 2024

https://github.com/jozu-ai/kitops

An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.

ai code datasets devops devops-tools gguf hacktoberfest kubernetes kubernetes-deployment ml mlops mlops-tools model-interpretability model-serving models opensource platform-engineering pytorch sklearn tensorflow

Last synced: 21 Dec 2024

https://github.com/satellite-image-deep-learning/datasets

Datasets for deep learning with satellite & aerial imagery

datasets earth-observation remote-sensing satellite-data satellite-imagery sentinel

Last synced: 06 Nov 2024

https://github.com/Synerise/cleora

Cleora AI is a general-purpose open-source model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data. Created by Synerise.com team.

ai cleora-embeddings datasets deepwalk embeddings entity graphs hypergraphs inductive-entity-embeddings machine-learning ml pytorch-biggraph synerise

Last synced: 14 Dec 2024

https://github.com/BaseModelAI/cleora

Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

ai cleora-embeddings datasets deepwalk embeddings entity graphs hypergraphs inductive-entity-embeddings machine-learning ml pytorch-biggraph synerise

Last synced: 13 Nov 2024

https://github.com/openvinotoolkit/datumaro

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.

coco computer-vision dataset datasets deep-learning format-converter imagenet neural-networks openvino-toolkit pascal-voc yolo

Last synced: 13 Nov 2024

https://github.com/Yuan-ManX/ai-audio-datasets

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

aigc artificial-intelligence audio audio-effect audio-generation datasets deep-learning machine-learning music-generation

Last synced: 27 Oct 2024

https://github.com/CelebV-HQ/CelebV-HQ

[ECCV 2022] CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

datasets gans generative-model

Last synced: 10 Nov 2024

https://github.com/dmitryryumin/cvpr-2023-24-papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

action-recognition autonomous-driving biometrics computer-vision cvpr cvpr2023 cvpr2024 datasets deep-learning face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition scene-analysis segmentation self-supervised-learning shape-analysis video-synthesis

Last synced: 15 Dec 2024

https://cambridgeuniversitypress.github.io/FirstCourseNetworkScience/

Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis

datasets indiana-university network-science networkx python social-network textbook tutorials

Last synced: 11 Nov 2024

https://github.com/CambridgeUniversityPress/FirstCourseNetworkScience

Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis

datasets indiana-university network-science networkx python social-network textbook tutorials

Last synced: 26 Sep 2024

https://github.com/MOLAorg/mola

A Modular Optimization framework for Localization and mApping (MOLA)

computer-vision cxx cxx17 datasets graph-slam lidar lidar-point-cloud localization mobile-robots slam toolkit visual-slam

Last synced: 13 Nov 2024

https://github.com/Koziev/NLP_Datasets

My NLP datasets for Russian language

datasets nlp nlp-resources

Last synced: 13 Nov 2024

https://github.com/cleardusk/meglass

An eyeglass face dataset collected and cleaned for face recognition evaluation, CCBR 2018.

3dface dataset datasets face-recognition

Last synced: 16 Dec 2024

https://github.com/chakki-works/chakin

Simple downloader for pre-trained word vectors

datasets machine-learning natural-language-processing word-embeddings word-vectors

Last synced: 21 Dec 2024

https://github.com/jovianhq/opendatasets

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

data-science datasets machine-learning python

Last synced: 21 Dec 2024

https://github.com/langwatch/langwatch

🤖 Build AI applications with confidence ✅ DSPy Visualizer ✅ Understand how your users are using your LLM-app ✅ Get a full picture of the quality performance of your LLM-app ✅ Collaborate with your stakeholders in ONE platform ✅ Iterate towards the most valuable & reliable LLM-app.

ai analytics datasets evaluation gpt llm observability openai prompt-engineering

Last synced: 04 Dec 2024

https://github.com/src-d/datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code

dataset datasets git github machine-learning mlosc

Last synced: 15 Dec 2024

https://github.com/arjunmann73/Data-Analytics-Projects

:mag_right: Data analysis with real world data sets using Python :mag:

classification datasets machine-learning python regression

Last synced: 07 Nov 2024

https://github.com/jumpingrivers/datasauRus

R Package 📦 Containing the Datasaurus Dozen datasets :bar_chart:

anscombesquartet datasaurus datasaurus-dozen datasets r r-package rstats summary-statistics

Last synced: 25 Oct 2024

https://github.com/jumpingrivers/datasaurus

R Package 📦 Containing the Datasaurus Dozen datasets :bar_chart:

anscombesquartet datasaurus datasaurus-dozen datasets r r-package rstats summary-statistics

Last synced: 16 Dec 2024

https://github.com/weecology/retriever

Quickly download, clean up, and install public datasets into a database management system

data data-retrieval data-science dataset datasets hacktobefest python

Last synced: 04 Nov 2024

https://github.com/waico/SKAB

SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.

algorithms-evaluation anomaly-detection benchmark changepoint-detection collective-anomalies dataset datasets leaderboard outlier-detection skab skoltech

Last synced: 05 Nov 2024

https://github.com/waico/SkAB

SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.

algorithms-evaluation anomaly-detection benchmark changepoint-detection collective-anomalies dataset datasets leaderboard outlier-detection skab skoltech

Last synced: 26 Oct 2024