An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-collection

A curated list of projects in awesome lists tagged with data-collection .

https://github.com/naibowang/easyspider

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www

Last synced: 12 May 2025

https://github.com/NaiboWang/EasySpider

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www

Last synced: 20 Mar 2025

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 12 May 2025

https://github.com/jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

bigquery clickhouse data-collection data-connectors data-integration golang postgres redshift snowflake

Last synced: 11 May 2025

https://github.com/plan-player-analytics/Plan

Player Analytics plugin for Minecraft Server platforms - View player activity of your server with ease. :calendar:

analytics bukkit-plugin bungeecord-plugin data-collection fabric-mod hacktoberfest mysql nukkit-plugin spigot-plugin sponge-plugin sqlite statistics velocity-plugin visualization webserver

Last synced: 14 Mar 2025

https://github.com/plan-player-analytics/plan

Player Analytics plugin for Minecraft Server platforms - View player activity of your server with ease. :calendar:

analytics bukkit-plugin bungeecord-plugin data-collection fabric-mod hacktoberfest mysql nukkit-plugin spigot-plugin sponge-plugin sqlite statistics velocity-plugin visualization webserver

Last synced: 13 Apr 2025

https://github.com/getodk/collect

ODK Collect is an Android app for filling out forms. It's been used to collect billions of data points in challenging environments around the world. Contribute and make the world a better place! ✨📋✨

android data-collection global-development global-health java mhealth mobile-data-collection odk social-impact xforms

Last synced: 25 Nov 2024

https://github.com/chaoss/augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/

chaoss data-collection data-modeling data-visualization defined-metrics facade git github hacktoberfest hacktoberfest2020 health linux linux-foundation metrics open-source opensource python-library research sustainability unix

Last synced: 15 May 2025

https://github.com/pnoker/iot-dc3

IoT DC3 is a 100% open-source, distributed Internet of Things (IoT) platform built on Spring Cloud. It accelerates IoT project development and simplifies IoT device management, offering a comprehensive solution for building robust IoT systems.

data-collection dcs docker gateway iot java lwm2m modbus mqtt multi-protocol opc-ua plc rpc rtsp s7 socket spring-cloud tcp things

Last synced: 27 Mar 2025

https://github.com/zhaoyachao/zdh_web

大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块

bigdata collection data data-collection datapipeline datax-web etl pipline scheduler spark sparketl

Last synced: 04 Apr 2025

https://github.com/ScriptSmith/reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

api data-collection data-mining data-scraping facebook gui pinterest reddit scraping socialmedia tumblr twitter youtube

Last synced: 04 Apr 2025

https://github.com/scriptsmith/reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

api data-collection data-mining data-scraping facebook gui pinterest reddit scraping socialmedia tumblr twitter youtube

Last synced: 07 Apr 2025

https://github.com/wq/wq.app

💻📱 wq's app library: a JavaScript framework powering offline-first web & native apps for geospatial data collection, mobile surveys, and citizen science. Powered by Redux, React, Material UI and Maplibre GL.

citizen-science data-collection geospatial gis mobile mobile-app offline offline-first survey wq-framework

Last synced: 26 Mar 2025

https://github.com/wq/wq.db

☁🌐 wq's db library, extending Django REST framework to support apps for geospatial field data collection, citizen science, and crowdsourcing.

citizen-science data-collection django django-rest-framework rest-api wq-framework

Last synced: 29 Mar 2025

https://github.com/bps-statistics/form-gear

FormGear is a framework engine for dynamic form creation and complex form processing and validation for data collection.

census data data-collection form-builder form-engine form-generator national-statistics official-statistics survey survey-builder survey-form

Last synced: 01 Apr 2025

https://github.com/Minipada/ros2_data_collection

Collect, validate and send data reliably from ROS 2 to create APIs and dashboards.

data-collection robotics ros2

Last synced: 13 May 2025

https://github.com/pantunes/xtcryptosignals

Cryptocurrencies price data collection, price tickers, signals notifications, charts, Telegram bot and more.

agregator altcoins api bitcoin crypto-currencies cryptocurrency data-collection ethereum exchange exchange-api notifications portfolio service signals-notifications ticker trading

Last synced: 30 Apr 2025

https://github.com/melvynator/elk_twitter

This is a data pipeline for Twitter (ETL) using the elastic stack Elasticsearch, Logstash and Kibana (version 6.1)

data-collection data-visualization elasticsearch elk elk-stack kibana logstash machine-learning natural-language-processing twitter twitter-api

Last synced: 17 Dec 2024

https://github.com/melvynator/ELK_twitter

This is a data pipeline for Twitter (ETL) using the elastic stack Elasticsearch, Logstash and Kibana (version 6.1)

data-collection data-visualization elasticsearch elk elk-stack kibana logstash machine-learning natural-language-processing twitter twitter-api

Last synced: 27 Dec 2024

https://github.com/getodk/javarosa

The core library that many of the ODK tools are built around. It's written in Java, implements the ODK XForms spec, and runs on mobile devices and cloud servers. ✨🏗✨

data-collection global-development global-health java mhealth mobile-data-collection odk xforms

Last synced: 25 Nov 2024

https://github.com/mxdldev/android-amap-track-collect

这阵子由于项目需要,需要从手机上采集用户的运动轨迹数据,这样的功能大家都见到的很多了,比如咕咚、悦动圈,对跑步运动轨迹数据进行采集,再如,微信运动、钉钉运动,对于每一天你走步进行计数,如果要记录轨迹就离不开的手机定位,如果要记录步数那就离不开陀螺仪(角速度传感器),花了一天多的时间实现了一个定位数据实时采集的功能。

android data-collection gps location motion-track

Last synced: 23 Nov 2024

https://github.com/davidberenstein1957/dataset-viber

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

data-collection data-quality evaluation human-feedback

Last synced: 06 Mar 2025

https://github.com/edgee-cloud/edgee

The full-stack edge platform for your edge oriented applications.

data-collection edge edge-computing edgee http https proxy rust wasm wasm-component webassembly

Last synced: 05 Apr 2025

https://github.com/documents-brasil/ibge

🌎 Data collection of geographical divisions of Brazil by IBGE

brasil brazil data-collection ibge json

Last synced: 12 Apr 2025

https://github.com/atapas/js-collections-map-set

Repository to have example code to demonstrate JavaScript Map and Set data structures.

data-collection data-structures javascript map set

Last synced: 12 Apr 2025

https://github.com/gaalcaras/mailinglistscraper

A python web scraper for public email lists.

data-collection mailinglist scraper scrapy spider webscraping

Last synced: 06 Dec 2024

https://github.com/fulldecent/google-voice-numbers

Retrieves the full list of available Google Voice numbers and finds the best ones

data-collection google-voice harvest harvest-data spider telephone-number telephony

Last synced: 18 Jan 2025

https://github.com/nuhmanpk/webtrench

A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

audio-datasets data data-collection data-science dataset-generation deep-learning image-data-generator machine-learning python scarper text-datasets

Last synced: 21 Mar 2025

https://github.com/gidim/babler

Data Collection System For NLP/Speech Recognition

blogs data-collection forums language-modeling machine-learning nlp scraping

Last synced: 16 May 2025

https://github.com/nuhmanpk/Webtrench

A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code

audio-datasets data data-collection data-science dataset-generation deep-learning image-data-generator machine-learning python scarper text-datasets

Last synced: 20 Nov 2024

https://github.com/lironmiz/pcep-30-0x

PCEP™ – Certified Entry-Level Python Programmer certification shows that the individual is familiar with universal computer programming concepts like data types, containers, functions, conditions, loops, as well as Python programming language syntax, semantics, and the runtime environment.

certificate control-flow course data-collection data-types education exceptions functions input-output learning-by-doing literals numeral-systems operations operators pcap practice python-syntax-and-semantics python3 runtime-environment variables

Last synced: 18 Mar 2025

https://github.com/mahtafetrat/manatts-persian-speech-dataset

ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset

Last synced: 08 Apr 2025

https://github.com/unicornunicode/FACT

FACT is a tool to collect, process and visualise forensic data from clusters of machines running in the cloud or on-premise.

cloud data-collection forensics

Last synced: 21 Nov 2024

https://github.com/cph-cachet/carp.core-kotlin

Infrastructure-agnostic framework for distributed data collection.

data-collection ddd distributed-computing hacktoberfest mhealth research research-platform

Last synced: 19 Apr 2025

https://github.com/qarmin/system-info-collector

App to collect ram/cpu usage from OS and show it in pretty graphs

data data-collection system

Last synced: 22 Mar 2025

https://github.com/MahtaFetrat/ManaTTS-Persian-Speech-Dataset

ManaTTS is the largest open Persian speech dataset with 86+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.

data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset

Last synced: 01 Mar 2025

https://github.com/vrknetha/mcp-server-firecrawl

FireCrawl MCP Server is a powerful web scraping integration for Claude and other LLMs. It provides JavaScript rendering, batch processing, and search capabilities through a Model Context Protocol (MCP) interface. Now with support for self-hosted instances and advanced features like parallel processing, automatic retries, and content filtering

batch-processing claude content-extraction data-collection firecrawl firecrawl-ai javascript-rendering llm-tools mcp-server model-context-protocol search-api web-crawler web-scraping

Last synced: 20 Jan 2025

https://github.com/cardi/aws-spot-price-history

automating aws spot price history retrieval

aws-ec2 data-collection spot-instances

Last synced: 12 May 2025

https://github.com/sowinskibraeden/dayz-reforger

A general purpose Discord bot to handle DayZ Killfeed, stats, alarms and factions' armbands using Nitrado log files.

data-analytics data-collection dayz discord discord-bot discord-js fetch-api mongodb nitrado regex scalability

Last synced: 20 Nov 2024

https://github.com/abeltavares/marketpipe

🛠 Containerized and configurable Airflow ETL pipeline for collecting and storing stock and cryptocurrency market data.

airflow aws ci-cd cryptocurrency data-analysis data-collection data-storage docker iac oop pgadmin pipeline postgresql python sql stocks unit-testing

Last synced: 22 Apr 2025

https://github.com/harisbinzia/mastodoner

Mastodoner is a command line tool (and Python library) for archiving Mastodon, a decentralized micro-blogging social network.

data-collection mastodon social-network

Last synced: 12 Apr 2025

https://github.com/redayzarra/sleepapneadetection

My capstone project explores machine learning, hardware, and web development to create a smart home system for monitoring the health of homebound patients suffering from sleep apnea. The system includes data collection through sensors, embedded ML (TinyML) to analyze data, and web development for creating a medical dashboard.

arduino arduino-ide capstone capstone-project data-collection embedded-systems machine-learning machine-learning-algorithms medical mern mern-project mern-stack python tinyml web-development

Last synced: 26 Jan 2025

https://github.com/ikstream/dalec

Dalec is a project that aims to provide a privacy preserving data collection method. It utilizes DNS for client/server seperation while transmiting data encrypted

collection data data-collection dns exfiltration shell

Last synced: 15 May 2025

https://github.com/aymane-maghouti/real-time-data-pipeline-using-kafka

This project implements a real-time data pipeline using Apache Kafka, Python's psutil library for metric collection, and SQL Server for data storage. The pipeline collects metrics data from the local computer, processes it through Kafka brokers, and loads it into a SQL Server database. Additionally, a real-time dashboard is created using Power BI.

apache-kafka data-collection data-streaming data-visualization powerbi python real-time real-time-data-pipeline

Last synced: 17 Jan 2025

https://github.com/sferez/twitter_toolbox

Complete Toolbox for Scraping, Streaming, Interact with API, Cleaning, Preprocessing, Applying NLP on Twitter Data

data-collection data-science nlp preprocessing twitter twitter-api twitter-scraping twitter-streaming-api

Last synced: 10 Apr 2025

https://github.com/sowinskibraeden/dayz-data-collection

Collects Log Data from Nitrado DayZ server.

data-collection

Last synced: 20 Nov 2024

https://github.com/ivopetkov/data-object

A familiar and powerful Data Object abstraction for PHP.

data-collection data-list data-object filter sort

Last synced: 14 Apr 2025

https://github.com/alitahir4024/data-collecting-project

This project is simple data collection project and to practise JS local storage skills

data-collection html-css-javascript localstorage

Last synced: 28 Mar 2025

https://github.com/dmdhrumilmistry/githubprofilescraper

Scrapes github profiles and stores data in json format

data-collection dmdhrumilmistry github-scraper python3 scraper

Last synced: 02 Apr 2025

https://github.com/cpaxton/needlemaster

Free Android game for experiments in learning task structure from human demonstrations. User performances can be saved and exported for research use.

android-game data-collection game lfd needle-master phone

Last synced: 13 May 2025

https://github.com/artucuno/guild-network-map

Create a map of all mutual guilds that members share with you on Discord.

data-collection data-visualization discord

Last synced: 14 Apr 2025

https://github.com/kwokhing/exploratory-data-analysis-on-smrt-tweets

Demo on performing exploratory data analysis (EDA) on train service disruptions based on scrapped (user generated contents) tweets from the train operator's (SMRT) twitter account

data-analysis data-cleaning data-collection data-preparation exploratory-data-analysis exploratory-data-visualizations folium geospatial-data leaflet-map python python3 regex scraping selenium selenium-python social-media text-processing user-generated-content web-scraping webscraping

Last synced: 02 Dec 2024

https://github.com/shivendrra/web-graze

scrape raw data from various sources of the internet, like wikipedia, internet archieve, britannica, youtube, unsplash, etc

data-collection data-collection-system data-for-llm data-for-ml webscra webscraper webscrapper-python

Last synced: 13 May 2025

https://github.com/ikstream/dns-handler

Data collection server for the dalec user collection system

collection dalec data data-collection dns dns-server python python3

Last synced: 13 Mar 2025

https://github.com/kutukvpavel/gpibserver

General purpose GPIB data collection server

automation data-collection gpib ieee488 server

Last synced: 29 Apr 2025

https://github.com/reckadon/ml-harwithdts

Repository for Assignment 1 of team KAR.ai. - Human Activity Recognition (HAR) with Decision trees and LLMs

data-collection decision-trees groq-api human-activity-recognition jupyter machine-learning matplotlib pandas pca-analysis prompt-engineering sklearn tsfel

Last synced: 07 May 2025

https://github.com/aidinhamedi/advanced-arduino-datalogger

This project is an advanced datalogger that logs temperature, humidity, and air pressure. It uses an Adafruit SHT31 sensor for temperature and humidity, and a BMP180 for air pressure.

arduino arduino-ide arduino-mega bluetooth bmp180 c cpp data-collection datalogger sensors sht31 st7735 tft-display

Last synced: 06 Apr 2025

https://github.com/ozakboy/taiwan-news-crawlers

.net-based Crawlers for news of Taiwan (.net 台灣新聞爬蟲,數據物件化,方便使用)

crawler data-collection dataset-generation dotnet news taiwan webcrawlers

Last synced: 15 Apr 2025

https://github.com/tathithienthanh/finaltest_database-sql-data-collection-for-ds

The final test of the "Database SQL and Data Collection for Data Science" course from The Ho Chi Minh City University of Science (19/09/2023)

chrome data-collection data-processing database final-test ipynb-jupyter-notebook mysql pymysql query scraping-websites selenium sql statistics visualization

Last synced: 05 May 2025

https://github.com/simonblanke/search-data-collector

Thread safe and atomic data collection into csv-files

csv data-collection hyperactive pandas python

Last synced: 05 May 2025

https://github.com/jinsyin/datalink

⚡ 数据集成 | DataLink is a lightweight data integration framework build on top of DataX, Spark and Flink

batch big-data bigdata cdc data data-collection data-exchange data-integration data-pipeline data-synchronization datalink etl flink flink-cdc framework integration pipeline spark streaming

Last synced: 13 Apr 2025

https://github.com/edgee-cloud/demo-html

Demo of Edgee integrated into a basic html website

analytics data-collection edge-computing edgee wasm wasm-component

Last synced: 22 Apr 2025

https://github.com/yukito0209/is6941-ml-social-media

IS6941 Machine Learning & Social Media Analytics 课程小组项目代码仓库,探索机器学习在社交媒体数据分析中的应用。

bert city-university-of-hong-kong crawler data-collection llama machine-learning python sentiment-analysis social-media

Last synced: 01 Apr 2025

https://github.com/code-jl/nfl-point-kicker-data-scraper

A Python-based web scraping toolkit that extracts and processes NFL kicking statistics from Pro-Football-Reference. This project automates the collection of comprehensive game data, with a particular focus on field goal attempts and environmental conditions.

automation beautifulsoup csv data-analysis data-collection field-goals football-statistics kicking-stats nfl python selenium sports-analysis statistics weather-data web-scraping

Last synced: 14 Apr 2025

https://github.com/mehdibo/PHPForms

A set of of tools to help you create forms and export data as fast as possible

data-collection form-generator form-validation forms php-library php7

Last synced: 29 Apr 2025

https://github.com/mehdibo/phpforms

A set of of tools to help you create forms and export data as fast as possible

data-collection form-generator form-validation forms php-library php7

Last synced: 28 Mar 2025