Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/arian-askari/ChatGPT-RetrievalQA

A dataset for training/evaluating Question Answering Retrieval models on ChatGPT responses with the possibility to training/evaluating on real human responses.

ai chatgpt chatgpt-information-retrieval chatgpt-ir data-augmentation dataset deep-learning gpt-3 gpt2 gpt3 information-retrieval information-retrieval-chatgpt ir ir-chatgpt machine-learning nlp openai python sequence-to-sequence text-retrieval

Last synced: 03 Jul 2024

https://github.com/philipperemy/name-dataset

The Python library for names.

dataset name named-entity-recognition python

Last synced: 02 Jul 2024

https://github.com/antvis/data-set

state driven all in one data process for data visualization.

antvis data-visualization dataset g2 state-driven statistics visualization

Last synced: 02 Jul 2024

https://github.com/jayleicn/animeGAN

A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

dataset generative-adversarial-network pytorch

Last synced: 02 Jul 2024

https://github.com/malllabiisc/cesi

WWW 2018: CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

canonicalization cesi dataset embeddings knowledge-graph knowledge-graph-embeddings www

Last synced: 02 Jul 2024

https://github.com/taivop/joke-dataset

A dataset of 200k English plaintext jokes.

dataset humor jokes

Last synced: 02 Jul 2024

https://github.com/wainshine/Chinese-Names-Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

corpus dataset dict names ner

Last synced: 02 Jul 2024

https://github.com/candlewill/Dialog_Corpus

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

chatbot corpus dataset dialog system

Last synced: 02 Jul 2024

https://github.com/xiaoaoran/SynLiDAR

SynLiDAR: Synthetic LiDAR sequential point cloud dataset with point-wise annotations (AAAI2022)

dataset domain-adaptation point-cloud semantic-segmentation

Last synced: 01 Jul 2024

https://github.com/HuangCongQing/3D-Point-Clouds

🔥3D点云目标检测&语义分割(深度学习)-SOTA方法,代码,论文,数据集等

3d-detection 3d-point-cloud 3d-point-clouds 3d-semantic-segmentation cpp dataset deep-learning pcl point-cloud python3 ros ros-melodic sota

Last synced: 01 Jul 2024

https://github.com/unmannedlab/RELLIS-3D

RELLIS-3D: A Multi-modal Dataset for Off-Road Robotics

3d-segmentation dataset image-segmentation lidar off-road ros-bag semantic-segmentation

Last synced: 01 Jul 2024

https://github.com/QingyongHu/SensatUrban

🔥Urban-scale point cloud dataset (CVPR 2021 & IJCV 2022)

benchmark city-modeling dataset large-scale photogrammetry pointcloud urban-scale

Last synced: 01 Jul 2024

https://github.com/OpenDriveLab/OpenLane

[ECCV 2022 Oral] OpenLane: Large-scale Realistic 3D Lane Dataset

autonomous-driving dataset deep-learning lane-detection

Last synced: 01 Jul 2024

https://github.com/shaohua0116/MultiDigitMNIST

Combine multiple MNIST digits to create datasets with 100/1000 classes for few-shot learning/meta-learning

dataset few-shot-learning image-classification meta-learning mnist mnist-classification

Last synced: 01 Jul 2024

https://github.com/yaoyao-liu/tiered-imagenet-tools

Tools for generating tieredImageNet dataset and processing batches

dataset few-shot few-shot-learning meta-learning tiered-imagenet

Last synced: 01 Jul 2024

https://github.com/HusseinYoussef/Arabic-OCR

OCR system for Arabic language that converts images of typed text to machine-encoded text.

arabic character-segmentation computer-vision dataset image-processing machine-learning neural-network ocr opencv-python scikit-learn segmentation

Last synced: 30 Jun 2024

https://github.com/bigdata-ustc/EduData

Edudata: Datasets in Education and convenient interface for downloading and preprocessing dataset in education

assistment-dataset assistments cognitive-diagnosis dataset datasets-education ednet education junyi kddcup kddcup2010 knowledge-tracing luna math23k nips2020 oli pisa-data psychometrics

Last synced: 30 Jun 2024

https://github.com/visipedia/inat_comp

iNaturalist competition details

competition computer-vision dataset inaturalist

Last synced: 30 Jun 2024

https://github.com/neelabalan/mongodb-sample-dataset

sample dataset used in mongodb atlas cluster for local testing purpose

dataset docker mongodb

Last synced: 29 Jun 2024

https://github.com/kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

data-engineering data-pipelines data-science dataset dvcs machine-learning mlops

Last synced: 29 Jun 2024

https://github.com/ESA-PhiLab/Major-TOM

Expandable Datasets for Earth Observation

dataset earth-observation multi-spectral remote-sensing sentinel-1 sentinel-2

Last synced: 28 Jun 2024

https://github.com/ART-Group-it/GASP

GASP! Dataset - Generating Abstracts of Scientific Papers from Abstracts of Cited Papers

corpus dataset machine-learning natural-language-processing nlp

Last synced: 28 Jun 2024

https://github.com/wenet-e2e/opencpop

Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis

aimusic dataset opencpop singing svs

Last synced: 28 Jun 2024

https://github.com/ppasupat/WikiTableQuestions

A dataset of complex questions on semi-structured Wikipedia tables

compositional-semantics dataset question-answering semantic-parsing

Last synced: 28 Jun 2024

https://github.com/CornellNLP/ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.

computational-social-science conversational-ai conversational-analysis conversations dataset dialogs machine-learning nlp toolkit

Last synced: 28 Jun 2024

https://github.com/bigscience-workshop/data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus

dataset large-language-models multilingual

Last synced: 28 Jun 2024

https://github.com/Belval/TextRecognitionDataGenerator

A synthetic data generator for text recognition

data dataset fake ocr synthetic text text-recognition training-set-generator

Last synced: 27 Jun 2024

https://github.com/mohamad-dehghani/crawler

crawl the information of persian songs lyrics

dataset farsi persian scrapy scrapy-crawler

Last synced: 27 Jun 2024

https://github.com/letsencrypt/dns-lots-of-lookups

dnslol is a command line tool for performing lots of DNS lookups.

dataset dig dns

Last synced: 27 Jun 2024

https://github.com/yaoyao-liu/mini-imagenet-tools

Tools for generating mini-ImageNet dataset and processing batches

dataset few-shot few-shot-learning imagenet meta-learning mini-imagenet miniimagenet one-shot-learning

Last synced: 26 Jun 2024

https://github.com/elenanereiss/Legal-Entity-Recognition

A Dataset of German Legal Documents for Named Entity Recognition

blstm crf dataset german legal-texts ner

Last synced: 26 Jun 2024

https://github.com/m1-llie/TUMCC

[IP&M 2022] Telegram地下市场中文黑话识别语料集。Telegram Underground Market Chinese Corpus. Paper: Identification of Chinese Dark Jargons in Telegram Underground Markets Using Context-Oriented and Linguistic Features (IP&M, 2022).

chinese corpus dataset telegram

Last synced: 26 Jun 2024

https://github.com/tb0hdan/domains

World’s single largest Internet domains dataset

colly dataset internet-domains scrapy search-engines yacy

Last synced: 26 Jun 2024

https://mvc-datasets.github.io/MVC/

MULTI-VIEW CLOTHING DATASET

clothing dataset multiview

Last synced: 26 Jun 2024

https://github.com/M-3LAB/awesome-industrial-anomaly-detection

Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。

anomaly-detection anomaly-segmentation computer-vision dataset deep-learning defect-detection industrial-image

Last synced: 25 Jun 2024

https://github.com/ZHOUYI1023/awesome-radar-perception

A curated list of radar datasets, detection, tracking and fusion

autonomous-driving autonomous-vehicles dataset deep-learning detection fusion radar slam

Last synced: 25 Jun 2024

https://github.com/ai4ce/MARS

[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

3dgs collaborative-perception coperception cvpr2024 dataset multiagent multimodal-deep-learning nerf self-driving

Last synced: 25 Jun 2024

https://github.com/salcoast/deleted-tweets-archive

These tweets display several bad actors' most divisive uses of the Twitter platform.

dataset deleted-tweets twitter twitter-api

Last synced: 24 Jun 2024

https://github.com/Continvvm/continuum

A clean and simple data loading library for Continual Learning

continual-learning dataloader dataset incremental-learning lifelong-learning online-learning pytorch

Last synced: 24 Jun 2024

https://github.com/vahidk/tfrecord

TFRecord reader for PyTorch

dataset loader pytorch tensorflow tfrecord

Last synced: 24 Jun 2024

https://github.com/layumi/University1652-Baseline

ACM Multimedia2020 University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization :helicopter: annotates 1652 buildings in 72 universities around the world.

awesome-list cross-view cvact cvusa dataset drone gem-pooling geo-localization image-retrieval multi-source-benchmark place-recognition pytorch remote-sensing satellite uav

Last synced: 24 Jun 2024

https://github.com/eddex/signals-dataset

A dataset with 5077 images of numbered signals and a script to create a train-test-split

annotations dataset hslu machine-learning train-test-split

Last synced: 24 Jun 2024

https://github.com/yhlleo/RoadNet

RoadNet: A Multi-task Benchmark Dataset for Road Detection, TGRS.

centerline-detection dataset edge-detection image-segmentation multi-task-learning road-detection

Last synced: 24 Jun 2024

https://github.com/ArashAmani/Kurdish-Dialect-Recognition

We extract the x-vector and i-vector of five Kurdish Dialects and use these vectors to recognition Kurdish dialects.

dataset i-vector kurdish kurdish-dataset kurdish-dialects kurdish-language-processing recognition x-vector

Last synced: 23 Jun 2024

https://github.com/CAPTAIN-WHU/iSAID_Devkit

[CVPR'W19-Oral] Official repository for "iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images"

aerial-imagery cvpr19 dataset devkit dota evaluation-code instance-segmentation object-detection pytorch pytorch-implementation

Last synced: 23 Jun 2024

https://github.com/myungsub/CAIN

Source code for AAAI 2020 paper "Channel Attention Is All You Need for Video Frame Interpolation"

aaai2020 channel-attention dataset deep-convolutional-networks frame-interpolation pytorch video-frame-interpolation

Last synced: 23 Jun 2024

https://github.com/HillZhang1999/MuCGEC

MuCGEC中文纠错数据集及文本纠错SOTA模型开源;Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction"

dataset gec generation grammatical-error-correction naacl

Last synced: 23 Jun 2024

https://github.com/salesforce/WikiSQL

A large annotated semantic parsing corpus for developing natural language interfaces.

database dataset machine-learning natural-language natural-language-interface natural-language-processing

Last synced: 23 Jun 2024

https://github.com/zjunlp/KnowPrompt

Code and datasets for the WWW2022 paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction."

dataset dialogre few-shot-learning knowledge-informed-prompt-learning knowprompt prompt prompt-learning prompt-tuning pytorch re relation-extraction semeval tacred text-classification

Last synced: 23 Jun 2024

https://github.com/PhilipMay/stsb-multi-mt

Machine translated multilingual STS benchmark dataset.

dataset multilingual nlp

Last synced: 23 Jun 2024

https://github.com/victorsungo/MMDialog

The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

chat dataset

Last synced: 23 Jun 2024

https://github.com/krystalan/SGSum

CCKS‘2021:《SGSum:一个面向体育赛事摘要的人工标注数据集》

dataset paper

Last synced: 23 Jun 2024

https://github.com/radi-cho/botbots

A dataset featuring diverse dialogues between two ChatGPT (gpt-3.5-turbo) instances with system messages written by GPT-4. Covering various contexts and tasks (task-oriented dialogue systems, abstract reasoning, brainstorming).

chatgpt dataset gpt-4

Last synced: 22 Jun 2024

https://github.com/Charmve/Surface-Defect-Detection

📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.

charmve dataset deep-learning defects image-segmentation paper pcb-surface-defect surface surface-defect-detection surface-defects surface-detection

Last synced: 22 Jun 2024

https://github.com/GanjinZero/awesome_Chinese_medical_NLP

中文医学NLP公开资源整理:术语集/语料库/词向量/预训练模型/知识图谱/命名实体识别/QA/信息抽取/模型/论文/etc

dataset dictionary knowledge-graph medical model nlp resource

Last synced: 22 Jun 2024

https://github.com/cv-small-snails/Awesome-Table-Recognition

A curated list of resources dedicated to table recognition

dataset ocr ocr-recognition papers papers-with-code table-recognition

Last synced: 22 Jun 2024

https://github.com/citp/privacy-policy-historical

Historical website privacy policies spanning over two decades.

dataset privacy-policy

Last synced: 22 Jun 2024

https://github.com/zhouhaoyi/ETDataset

The Electricity Transformer dataset is collected to support the further investigation on the long sequence forecasting problem.

dataset electricity-transformer-dataset forecasting long-sequence

Last synced: 22 Jun 2024

https://github.com/sutdcv/SUTD-TrafficQA

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

annotations cvpr cvpr2021 dataset multimodal multimodal-deep-learning paper traffic-events video-qa video-reasoning vqa vqa-dataset

Last synced: 22 Jun 2024

https://github.com/BaseAdresseNationale/adresse.data.gouv.fr

Le site officiel de l'Adresse

adresse dataset open-data

Last synced: 21 Jun 2024

https://github.com/aa8y/docker-dataset

Docker database images with pre-populated data for testing and/or practice.

alpine databases-populated dataset docker-image dummy lightweight postgresql thin

Last synced: 21 Jun 2024

https://github.com/ThinkR-open/prenoms

French Baby Names 1900-2020

dataset r

Last synced: 20 Jun 2024

https://github.com/thammegowda/mtdata

A tool that locates, downloads, and extracts machine translation corpora

dataset machine-translation multilingual natural-language-generation natural-language-processing parallel-data

Last synced: 20 Jun 2024

https://github.com/franciellevargas/HateBR

HateBR is the first large-scale expert annotated dataset of Brazilian Instagram comments for hate speech and offensive language detection on the web and social media.

brazilian-portuguese dataset hatespeech-detection machine-learning natural-language-processing text-classification

Last synced: 20 Jun 2024

https://github.com/flyyuan/Chinese-Medical-QA-Data

中文疾病诊断数据集(百万条),可用于中国人疾病分析、疾病诊断。

dataset deep-learning

Last synced: 20 Jun 2024

https://bids-standard.github.io/bids-examples/

A set of BIDS compatible datasets with empty raw data files that can be used for writing lightweight software tests.

bids data-standards dataset neuroimaging standards

Last synced: 19 Jun 2024

https://github.com/researchmm/img2poem

[MM'18] Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

code dataset poem-generator

Last synced: 19 Jun 2024

https://github.com/kushalkafle/DVQA_dataset

DVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018

bar-chart cvpr2018 dataset deep-learning question-answering vqa

Last synced: 19 Jun 2024

https://github.com/lupantech/IconQA

Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".

commensense dataset mathai pytorch reasoning vqa

Last synced: 19 Jun 2024

https://github.com/mvdoc/budapest-fmri-data

Quality assurance analyses of fMRI data collected while participants watched The Grand Budapest Hotel by Wes Anderson.

dataset fmri fmri-dataset grand-budapest-hotel neuroimaging

Last synced: 18 Jun 2024

https://github.com/Psychic-DL/Awesome-Traffic-Agent-Trajectory-Prediction

This is a list of papers related to traffic agent trajectory prediction.

awesome dataset deep-learning papers source-code traffic-agent trajectory-prediction

Last synced: 18 Jun 2024

https://github.com/rodrigoberriel/satellite-crosswalk-classification

Deep Learning Based Large-Scale Automatic Satellite Crosswalk Classification (GRSL, 2017)

dataset deep-learning remote-sensing satellite-imagery

Last synced: 18 Jun 2024

https://github.com/osdg-ai/osdg-data

The OSDG Community Dataset (OSDG-CD) is a public dataset of thousands of text excerpts, validated by OSDG Community Platform (OSDG-CP) citizen scientists with respect to the Sustainable Development Goals (SDGs). The dataset is updated every quarter and published on Zenodo.

citizen-science citsci crowdsourcing dataset digital-public-goods machine-learning open-data public-good public-goods sdg sdg-data sdgs sustainability sustainable-development-goals united-nations

Last synced: 17 Jun 2024

https://github.com/justmarkham/trump-lies

Tutorial: Web scraping in Python with Beautiful Soup

beautiful-soup data-science dataset pandas python requests tutorial web-scraping

Last synced: 17 Jun 2024

https://github.com/takuseno/d4rl-pybullet

Datasets for data-driven deep reinforcement learning with PyBullet environments

data-driven-reinforcement-learning dataset deep-reinforcement-learning

Last synced: 16 Jun 2024

https://github.com/takuseno/d4rl-atari

Datasets for data-driven deep reinforcement learning with Atari (wrapper for datasets released by Google)

data-driven-reinforcement-learning dataset deep-reinforcement-learning

Last synced: 16 Jun 2024

https://github.com/studiomoniker/Quickdraw-appendix

Dataset of 25k penises: an appendix to the Quick, Draw! Dataset

censorship dataset machine-learning penis quickdraw quickdraw-dataset

Last synced: 16 Jun 2024