An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-collection

A curated list of projects in awesome lists tagged with data-collection .

https://github.com/naibowang/easyspider

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www

Last synced: 12 May 2025

https://github.com/NaiboWang/EasySpider

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www

Last synced: 20 Mar 2025

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 09 Sep 2025

https://github.com/cloudquery/cloudquery

Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.

airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql

Last synced: 16 May 2026

https://github.com/firecrawl/firecrawl-mcp-server

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

batch-processing claude content-extraction data-collection firecrawl firecrawl-ai javascript-rendering llm-tools mcp mcp-server model-context-protocol search-api web-crawler web-scraping

Last synced: 07 Apr 2026

https://github.com/jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

bigquery clickhouse data-collection data-connectors data-integration golang postgres redshift snowflake

Last synced: 18 May 2026

https://github.com/plan-player-analytics/Plan

Player Analytics plugin for Minecraft Server platforms - View player activity of your server with ease. :calendar:

analytics bukkit-plugin bungeecord-plugin data-collection fabric-mod hacktoberfest mysql nukkit-plugin spigot-plugin sponge-plugin sqlite statistics velocity-plugin visualization webserver

Last synced: 14 Mar 2025

https://github.com/plan-player-analytics/plan

Player Analytics plugin for Minecraft Server platforms - View player activity of your server with ease. :calendar:

analytics bukkit-plugin bungeecord-plugin data-collection fabric-mod hacktoberfest mysql nukkit-plugin spigot-plugin sponge-plugin sqlite statistics velocity-plugin visualization webserver

Last synced: 01 Mar 2026

https://github.com/getodk/collect

ODK Collect is an Android app for filling out forms. It's been used to collect billions of data points in challenging environments around the world. Contribute and make the world a better place! ✨📋✨

android data-collection global-development global-health java mhealth mobile-data-collection odk social-impact xforms

Last synced: 10 Apr 2026

https://github.com/chaoss/augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/

chaoss data-collection data-modeling data-visualization defined-metrics facade git github hacktoberfest hacktoberfest2020 health linux linux-foundation metrics open-source opensource python-library research sustainability unix

Last synced: 21 Jan 2026

https://github.com/pnoker/iot-dc3

IoT DC3 is a 100% open-source, distributed Internet of Things (IoT) platform built on Spring Cloud. It accelerates IoT project development and simplifies IoT device management, offering a comprehensive solution for building robust IoT systems.

data-collection dcs docker gateway iot java lwm2m modbus mqtt multi-protocol opc-ua plc rpc rtsp s7 socket spring-cloud tcp things

Last synced: 16 Jul 2025

https://github.com/zhaoyachao/zdh_web

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块

bigdata collection data data-collection datapipeline datax-web etl pipline scheduler spark sparketl

Last synced: 04 Apr 2025

https://github.com/scriptsmith/reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

api data-collection data-mining data-scraping facebook gui pinterest reddit scraping socialmedia tumblr twitter youtube

Last synced: 07 Apr 2025

https://github.com/ScriptSmith/reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

api data-collection data-mining data-scraping facebook gui pinterest reddit scraping socialmedia tumblr twitter youtube

Last synced: 04 Apr 2025

https://github.com/wq/wq.app

💻📱 wq's app library: a JavaScript framework powering offline-first web & native apps for geospatial data collection, mobile surveys, and citizen science. Powered by Redux, React, Material UI and Maplibre GL.

citizen-science data-collection geospatial gis mobile mobile-app offline offline-first survey wq-framework

Last synced: 26 Mar 2025

https://github.com/networkdynamics/pytok

A web scraper for TikTok using Playwright

data-collection tiktok tiktok-api tiktok-scraper web-scraper

Last synced: 19 Jan 2026

https://github.com/wq/wq.db

☁🌐 wq's db library, extending Django REST framework to support apps for geospatial field data collection, citizen science, and crowdsourcing.

citizen-science data-collection django django-rest-framework rest-api wq-framework

Last synced: 10 Jan 2026

https://github.com/bps-statistics/form-gear

FormGear is a framework engine for dynamic form creation and complex form processing and validation for data collection.

census data data-collection form-builder form-engine form-generator national-statistics official-statistics survey survey-builder survey-form

Last synced: 13 Oct 2025

https://github.com/Minipada/ros2_data_collection

Collect, validate and send data reliably from ROS 2 to create APIs and dashboards.

data-collection robotics ros2

Last synced: 13 May 2025

https://github.com/mxdldev/android-amap-track-collect

这阵子由于项目需要,需要从手机上采集用户的运动轨迹数据,这样的功能大家都见到的很多了,比如咕咚、悦动圈,对跑步运动轨迹数据进行采集,再如,微信运动、钉钉运动,对于每一天你走步进行计数,如果要记录轨迹就离不开的手机定位,如果要记录步数那就离不开陀螺仪(角速度传感器),花了一天多的时间实现了一个定位数据实时采集的功能。

android data-collection gps location motion-track

Last synced: 17 Jul 2025

https://github.com/pantunes/xtcryptosignals

Cryptocurrencies price data collection, price tickers, signals notifications, charts, Telegram bot and more.

agregator altcoins api bitcoin crypto-currencies cryptocurrency data-collection ethereum exchange exchange-api notifications portfolio service signals-notifications ticker trading

Last synced: 30 Apr 2025

https://github.com/akvo/akvo-flow

A data collection and monitoring tool that works anywhere.

agpl akvo akvo-flow data-collection java

Last synced: 29 Aug 2025

https://github.com/melvynator/ELK_twitter

This is a data pipeline for Twitter (ETL) using the elastic stack Elasticsearch, Logstash and Kibana (version 6.1)

data-collection data-visualization elasticsearch elk elk-stack kibana logstash machine-learning natural-language-processing twitter twitter-api

Last synced: 30 Aug 2025

https://github.com/melvynator/elk_twitter

This is a data pipeline for Twitter (ETL) using the elastic stack Elasticsearch, Logstash and Kibana (version 6.1)

data-collection data-visualization elasticsearch elk elk-stack kibana logstash machine-learning natural-language-processing twitter twitter-api

Last synced: 14 Jul 2025

https://github.com/ilaria-manco/song-describer

Song Describer is a data collection platform for annotating music with textual descriptions.

annotations audio-captioning data-collection music-dataset

Last synced: 24 Sep 2025

https://github.com/getodk/javarosa

The core library that many of the ODK tools are built around. It's written in Java, implements the ODK XForms spec, and runs on mobile devices and cloud servers. ✨🏗✨

data-collection global-development global-health java mhealth mobile-data-collection odk xforms

Last synced: 26 Apr 2026

https://github.com/pzaino/thecrowler

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.

automation blue-team-tool content-detection content-discovery crawler crawling cyber-security cybersecurity cybersecurity-tools data-collection data-science distributed-systems golang indexer indexing reconnaissance red-team-tools scraping search-engine vulnerability-detection

Last synced: 06 Feb 2026

https://github.com/ntivirikin/xeno-canto-py

Python wrapper for the xeno-canto.org API to aid in downloading and managing recordings.

api-wrapper birding birds birdsong classification data-collection data-mining json metadata python scraper song xeno-canto xenocanto

Last synced: 21 Feb 2026

https://github.com/davidberenstein1957/dataset-viber

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

data-collection data-quality evaluation human-feedback

Last synced: 06 Mar 2025

https://github.com/leogregianin/ibge

🌎 Data collection of geographical divisions of Brazil by IBGE

brasil brazil data-collection ibge json

Last synced: 23 Jul 2025

https://github.com/edgee-cloud/edgee

The full-stack edge platform for your edge oriented applications.

data-collection edge edge-computing edgee http https proxy rust wasm wasm-component webassembly

Last synced: 02 Jan 2026

https://github.com/documents-brasil/ibge

🌎 Data collection of geographical divisions of Brazil by IBGE

brasil brazil data-collection ibge json

Last synced: 12 Apr 2025

https://github.com/atapas/js-collections-map-set

Repository to have example code to demonstrate JavaScript Map and Set data structures.

data-collection data-structures javascript map set

Last synced: 12 Apr 2025

https://github.com/gaalcaras/mailinglistscraper

A python web scraper for public email lists.

data-collection mailinglist scraper scrapy spider webscraping

Last synced: 02 Aug 2025

https://github.com/fulldecent/google-voice-numbers

Retrieves the full list of available Google Voice numbers and finds the best ones

data-collection google-voice harvest harvest-data spider telephone-number telephony

Last synced: 30 Dec 2025

https://github.com/gidim/babler

Data Collection System For NLP/Speech Recognition

blogs data-collection forums language-modeling machine-learning nlp scraping

Last synced: 16 May 2025

https://github.com/nuhmanpk/webtrench

A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

audio-datasets data data-collection data-science dataset-generation deep-learning image-data-generator machine-learning python scarper text-datasets

Last synced: 21 Mar 2025

https://github.com/nuhmanpk/Webtrench

A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

audio-datasets data data-collection data-science dataset-generation deep-learning image-data-generator machine-learning python scarper text-datasets

Last synced: 08 Jul 2025

https://github.com/lcsrodriguez/ecocal

Worldwide economic calendar Python package (details, estimates, market news, ...)

data-collection economic-calendar financial-events multithreaded python webscraping

Last synced: 17 May 2026

https://github.com/esri/data-collection-dotnet

Data collection application built using the .NET Runtime SDK.

arcgis data-collection dotnet offline online open-source-app popup related-records runtime runtime-sdk wpf

Last synced: 07 Jul 2025

https://github.com/eurostat/pyrostat

API (Python) for Eurostat data collections upload

api data-collection eurostat

Last synced: 05 Feb 2026

https://github.com/lironmiz/pcep-30-0x

PCEP™ – Certified Entry-Level Python Programmer certification shows that the individual is familiar with universal computer programming concepts like data types, containers, functions, conditions, loops, as well as Python programming language syntax, semantics, and the runtime environment.

certificate control-flow course data-collection data-types education exceptions functions input-output learning-by-doing literals numeral-systems operations operators pcap practice python-syntax-and-semantics python3 runtime-environment variables

Last synced: 18 Mar 2025

https://github.com/mahtafetrat/manatts-persian-speech-dataset

ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset

Last synced: 08 Apr 2025

https://github.com/qarmin/system-info-collector

App to collect ram/cpu usage from OS and show it in pretty graphs

data data-collection system

Last synced: 06 Jul 2025

https://github.com/robotology/wearables

Code moved to https://github.com/robotology/human-dynamics-estimation

data-collection force-torque-sensor framework imu sensor wearable wearable-devices

Last synced: 16 Mar 2026

https://github.com/cph-cachet/carp.core-kotlin

Infrastructure-agnostic framework for distributed data collection.

data-collection ddd distributed-computing hacktoberfest mhealth research research-platform

Last synced: 19 Apr 2025

https://github.com/unicornunicode/FACT

FACT is a tool to collect, process and visualise forensic data from clusters of machines running in the cloud or on-premise.

cloud data-collection forensics

Last synced: 12 Jul 2025

https://github.com/MahtaFetrat/ManaTTS-Persian-Speech-Dataset

ManaTTS is the largest open Persian speech dataset with 86+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset

Last synced: 01 Mar 2025

https://github.com/cardi/aws-spot-price-history

automating aws spot price history retrieval

aws-ec2 data-collection spot-instances

Last synced: 12 May 2025

https://github.com/sowinskibraeden/dayz-reforger

A general purpose Discord bot to handle DayZ Killfeed, stats, alarms and factions' armbands using Nitrado log files.

data-analytics data-collection dayz discord discord-bot discord-js fetch-api mongodb nitrado regex scalability

Last synced: 09 Jul 2025

https://github.com/pvernier/pykobo

A Python module to fetch data from the Kobo API

api data-collection kobo kobo-toolbox kobocollect kobotoolbox xlsform

Last synced: 07 Feb 2026

https://github.com/dadosjusbr/alba

Sistema para escalonamento e orquestração de execuções, visando a automatização de processos do DadosJusBR

coleta-de-dados dados-abertos dadosabertos data-collection hacktoberfest open-data opendata

Last synced: 14 Jan 2026

https://github.com/abeltavares/marketpipe

🛠 Containerized and configurable Airflow ETL pipeline for collecting and storing stock and cryptocurrency market data.

airflow aws ci-cd cryptocurrency data-analysis data-collection data-storage docker iac oop pgadmin pipeline postgresql python sql stocks unit-testing

Last synced: 22 Apr 2025

https://github.com/harisbinzia/mastodoner

Mastodoner is a command line tool (and Python library) for archiving Mastodon, a decentralized micro-blogging social network.

data-collection mastodon social-network

Last synced: 12 Apr 2025

https://github.com/mumarshahbaz/oscilloscope-online-v2

Web Serial Plotter with as much customization as possible. Custom Colors, Automatic Timescale, Live data visualization! Plot as many graphs as you can with just a click of a button! Truly, an online Oscilloscope!

arduino automatic-time customizable data-collection data-visualization esp experiment online oscilloscope serial-plotter timescale web

Last synced: 13 Sep 2025

https://github.com/redayzarra/sleepapneadetection

My capstone project explores machine learning, hardware, and web development to create a smart home system for monitoring the health of homebound patients suffering from sleep apnea. The system includes data collection through sensors, embedded ML (TinyML) to analyze data, and web development for creating a medical dashboard.

arduino arduino-ide capstone capstone-project data-collection embedded-systems machine-learning machine-learning-algorithms medical mern mern-project mern-stack python tinyml web-development

Last synced: 08 Oct 2025

https://github.com/ikstream/dalec

Dalec is a project that aims to provide a privacy preserving data collection method. It utilizes DNS for client/server seperation while transmiting data encrypted

collection data data-collection dns exfiltration shell

Last synced: 11 Aug 2025

https://github.com/munroe-meyer-institute-vr-laboratory/cometrics

Clinical tool for coregistration of frequency and duration based behavior, physiological signals, and video data. Session tracking features streamline multi-session clinical data recording.

behavior behavior-analysis behavioral-sciences biometrics clinical-research data-annotation-machine-learning data-annotation-tools data-collection empatica-e4

Last synced: 17 Jan 2026

https://github.com/sodascience/social_science_inferences_with_llms

Addressing LLM-related measurement error in social science modeling research.

data-collection inference large-language-models llms

Last synced: 30 Jan 2026

https://github.com/exloud/windows-telemetry-disabler

PowerShell/Batch utility that permanently disables Windows telemetry and data-collection services on Windows 10/11.

batch-script data-collection disable disable-services disabler exloud log nsudo powershell-script privacy telemetry windows windows7 windows7-windows11 windows8-1

Last synced: 06 Oct 2025

https://github.com/ivopetkov/data-object

A familiar and powerful Data Object abstraction for PHP.

data-collection data-list data-object filter sort

Last synced: 26 Oct 2025

https://github.com/sowinskibraeden/dayz-data-collection

Collects Log Data from Nitrado DayZ server.

data-collection

Last synced: 09 Jul 2025

https://github.com/sferez/twitter_toolbox

Complete Toolbox for Scraping, Streaming, Interact with API, Cleaning, Preprocessing, Applying NLP on Twitter Data

data-collection data-science nlp preprocessing twitter twitter-api twitter-scraping twitter-streaming-api

Last synced: 10 Apr 2025

https://github.com/aymane-maghouti/real-time-data-pipeline-using-kafka

This project implements a real-time data pipeline using Apache Kafka, Python's psutil library for metric collection, and SQL Server for data storage. The pipeline collects metrics data from the local computer, processes it through Kafka brokers, and loads it into a SQL Server database. Additionally, a real-time dashboard is created using Power BI.

apache-kafka data-collection data-streaming data-visualization powerbi python real-time real-time-data-pipeline

Last synced: 05 Jul 2025

https://github.com/beingvirus/jobminer

JobMiner – A Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data.

automation beautifulsoup career crawler data-collection data-mining hacktoberfest hacktoberfest-accepted hacktoberfest2025 job-scraper jobs open-source python selenium web-scraping

Last synced: 10 Oct 2025

https://github.com/amacsmith/macfly

The project's idea is to be able to add URLs to a list that most likely consist of live data. The scrapers will do an initial scrape of the site and send that along with a prompt to an AI model to regenerate a page displaying charts or explain the data retrieved. While allowing the scrapers to continue and push live scraped data to ai generate page

ai data-collection ml possibilities scraper website-generator websockets

Last synced: 01 Feb 2026

https://github.com/yousefkotp/local-leads-finder

Local Leads Finder helps you uncover nearby business prospects in minutes, enter a keyword and city, watch real-time progress, and download clean lead lists ready for outreach. Perfect for agencies, freelancers, and growth teams who need consistent, enriched local data without the heavy work.

api-integration business-intelligence data-collection flask google-maps lead-finder lead-generation lead-generation-bot lead-generation-data lead-generation-tool leads local-business local-businesses marketing-automation prospecting python sales-tools web-scraping web-scraping-python

Last synced: 06 Apr 2026

https://github.com/alitahir4024/data-collecting-project

This project is simple data collection project and to practise JS local storage skills

data-collection html-css-javascript localstorage

Last synced: 19 Mar 2026

https://github.com/dmdhrumilmistry/githubprofilescraper

Scrapes github profiles and stores data in json format

data-collection dmdhrumilmistry github-scraper python3 scraper

Last synced: 18 Jul 2025

https://github.com/kmader/easy_dash

A library for making Dash apps easier to build and particularly focusing on common data collection use cases

data-collection python reactive-programming scientific-computing web-gui widgets

Last synced: 25 Oct 2025

https://github.com/artucuno/guild-network-map

Create a map of all mutual guilds that members share with you on Discord.

data-collection data-visualization discord

Last synced: 26 Feb 2026