Projects in Awesome Lists tagged with data-collection
A curated list of projects in awesome lists tagged with data-collection .
https://github.com/naibowang/easyspider
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www
Last synced: 12 May 2025
https://github.com/NaiboWang/EasySpider
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www
Last synced: 20 Mar 2025
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake
Last synced: 12 May 2025
https://github.com/snowplow/snowplow
The leader in Next-Generation Customer Data Infrastructure
analytics data data-collection data-pipeline marketing-analytics product-analytics snowplow snowplow-events snowplow-pipeline
Last synced: 13 May 2025
https://github.com/cloudquery/cloudquery
The developer first cloud governance platform
airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql
Last synced: 14 May 2025
https://github.com/jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
bigquery clickhouse data-collection data-connectors data-integration golang postgres redshift snowflake
Last synced: 11 May 2025
https://github.com/mendableai/firecrawl-mcp-server
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
batch-processing claude content-extraction data-collection firecrawl firecrawl-ai javascript-rendering llm-tools mcp-server model-context-protocol search-api web-crawler web-scraping
Last synced: 13 May 2025
https://github.com/plan-player-analytics/Plan
Player Analytics plugin for Minecraft Server platforms - View player activity of your server with ease. :calendar:
analytics bukkit-plugin bungeecord-plugin data-collection fabric-mod hacktoberfest mysql nukkit-plugin spigot-plugin sponge-plugin sqlite statistics velocity-plugin visualization webserver
Last synced: 14 Mar 2025
https://github.com/plan-player-analytics/plan
Player Analytics plugin for Minecraft Server platforms - View player activity of your server with ease. :calendar:
analytics bukkit-plugin bungeecord-plugin data-collection fabric-mod hacktoberfest mysql nukkit-plugin spigot-plugin sponge-plugin sqlite statistics velocity-plugin visualization webserver
Last synced: 13 Apr 2025
https://github.com/getodk/collect
ODK Collect is an Android app for filling out forms. It's been used to collect billions of data points in challenging environments around the world. Contribute and make the world a better place! ✨📋✨
android data-collection global-development global-health java mhealth mobile-data-collection odk social-impact xforms
Last synced: 25 Nov 2024
https://github.com/chaoss/augur
Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/
chaoss data-collection data-modeling data-visualization defined-metrics facade git github hacktoberfest hacktoberfest2020 health linux linux-foundation metrics open-source opensource python-library research sustainability unix
Last synced: 15 May 2025
https://github.com/pnoker/iot-dc3
IoT DC3 is a 100% open-source, distributed Internet of Things (IoT) platform built on Spring Cloud. It accelerates IoT project development and simplifies IoT device management, offering a comprehensive solution for building robust IoT systems.
data-collection dcs docker gateway iot java lwm2m modbus mqtt multi-protocol opc-ua plc rpc rtsp s7 socket spring-cloud tcp things
Last synced: 27 Mar 2025
https://github.com/chapmanjacobd/library
99+ CLI tools to build, browse, and blend your media library
broadcatching cli command-line curation data-collection datacuration datasette-tool ffmpeg ffprobe files folders gallery-dl media mpv music playlist qbittorrent-nox sqlite videos yt-dlp
Last synced: 14 May 2025
https://github.com/zhaoyachao/zdh_web
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块
bigdata collection data data-collection datapipeline datax-web etl pipline scheduler spark sparketl
Last synced: 04 Apr 2025
https://github.com/ScriptSmith/reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
api data-collection data-mining data-scraping facebook gui pinterest reddit scraping socialmedia tumblr twitter youtube
Last synced: 04 Apr 2025
https://github.com/scriptsmith/reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
api data-collection data-mining data-scraping facebook gui pinterest reddit scraping socialmedia tumblr twitter youtube
Last synced: 07 Apr 2025
https://github.com/elbwalker/walkeros
Open-source event collection and tag management (gtag.js/GTM alternative)
consent-management data-capture data-collection event-tracking first-party gdpr measurement privacy-by-design server-side tag-manager tagging tracking user-behavior vendor-agnostic
Last synced: 15 May 2025
https://github.com/K3V1991/Disable-Firefox-Telemetry-and-Data-Collection
How to disable Firefox Telemetry and Data Collection
blocking browser config data data-collection disable firefox how-to list mozilla mozilla-firefox off options privacy reporting security server settings telemetry tutorial
Last synced: 13 Apr 2025
https://github.com/silverton-io/buz
Serverless multi-protocol + multi-destination event collection system.
analytics analytics-tracking cloudevents cloudevents-schema contracts data data-collection data-platform eventbridge jsonschema product-analytics redpanda redpanda-console schema-registry schema-validation snowplow-analytics streaming-analytics streaming-data webhook-receiver webhook-server
Last synced: 12 Apr 2025
https://github.com/wq/wq.app
💻📱 wq's app library: a JavaScript framework powering offline-first web & native apps for geospatial data collection, mobile surveys, and citizen science. Powered by Redux, React, Material UI and Maplibre GL.
citizen-science data-collection geospatial gis mobile mobile-app offline offline-first survey wq-framework
Last synced: 26 Mar 2025
https://github.com/CertifaiAI/classifai
:fire: One of the most comprehensive open-source data annotation platform.
annotation annotation-tool big-data computervision data-annotation data-collection data-science deep-learning labelling machine-learning
Last synced: 11 May 2025
https://github.com/wq/wq.db
☁🌐 wq's db library, extending Django REST framework to support apps for geospatial field data collection, citizen science, and crowdsourcing.
citizen-science data-collection django django-rest-framework rest-api wq-framework
Last synced: 29 Mar 2025
https://github.com/bps-statistics/form-gear
FormGear is a framework engine for dynamic form creation and complex form processing and validation for data collection.
census data data-collection form-builder form-engine form-generator national-statistics official-statistics survey survey-builder survey-form
Last synced: 01 Apr 2025
https://github.com/douglasneuroinformatics/opendatacapture
An electronic data capture platform for administering remote and in-person clinical instruments
clinical data-collection electronic-data-capture esbuild form-builder full-stack monaco-editor monorepo multilingual nodejs prisma react research tailwindcss turborepo typescript
Last synced: 13 Apr 2025
https://github.com/Minipada/ros2_data_collection
Collect, validate and send data reliably from ROS 2 to create APIs and dashboards.
Last synced: 13 May 2025
https://github.com/ineffyble/genders.wtf
data-collection forms gender genders
Last synced: 05 Apr 2025
https://github.com/andreztz/pyradios
A Client for the Radio Browser API
api data-collection entertainment internet-radio internet-radio-stations music open-api python radio-browser radio-stations streaming
Last synced: 08 May 2025
https://github.com/pantunes/xtcryptosignals
Cryptocurrencies price data collection, price tickers, signals notifications, charts, Telegram bot and more.
agregator altcoins api bitcoin crypto-currencies cryptocurrency data-collection ethereum exchange exchange-api notifications portfolio service signals-notifications ticker trading
Last synced: 30 Apr 2025
https://github.com/melvynator/elk_twitter
This is a data pipeline for Twitter (ETL) using the elastic stack Elasticsearch, Logstash and Kibana (version 6.1)
data-collection data-visualization elasticsearch elk elk-stack kibana logstash machine-learning natural-language-processing twitter twitter-api
Last synced: 17 Dec 2024
https://github.com/graphlit/graphlit-mcp-server
Model Context Protocol (MCP) Server for Graphlit Platform
claude content-extraction content-ingestion data-collection llm-tools mcp-server model-context-protocol search-api unstructured-data web-crawler web-scraping
Last synced: 22 Mar 2025
https://github.com/melvynator/ELK_twitter
This is a data pipeline for Twitter (ETL) using the elastic stack Elasticsearch, Logstash and Kibana (version 6.1)
data-collection data-visualization elasticsearch elk elk-stack kibana logstash machine-learning natural-language-processing twitter twitter-api
Last synced: 27 Dec 2024
https://github.com/getodk/javarosa
The core library that many of the ODK tools are built around. It's written in Java, implements the ODK XForms spec, and runs on mobile devices and cloud servers. ✨🏗✨
data-collection global-development global-health java mhealth mobile-data-collection odk xforms
Last synced: 25 Nov 2024
https://github.com/mxdldev/android-amap-track-collect
这阵子由于项目需要,需要从手机上采集用户的运动轨迹数据,这样的功能大家都见到的很多了,比如咕咚、悦动圈,对跑步运动轨迹数据进行采集,再如,微信运动、钉钉运动,对于每一天你走步进行计数,如果要记录轨迹就离不开的手机定位,如果要记录步数那就离不开陀螺仪(角速度传感器),花了一天多的时间实现了一个定位数据实时采集的功能。
android data-collection gps location motion-track
Last synced: 23 Nov 2024
https://github.com/davidberenstein1957/dataset-viber
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
data-collection data-quality evaluation human-feedback
Last synced: 06 Mar 2025
https://github.com/wooster0/shifting
A privacy-focused list of alternatives to online services.
big-data browser data-collection facebook gmail google list microsoft open-source privacy privacy-enhancing-technologies privacy-protection privacy-tools search-engine security services youtube
Last synced: 05 May 2025
https://github.com/edgee-cloud/edgee
The full-stack edge platform for your edge oriented applications.
data-collection edge edge-computing edgee http https proxy rust wasm wasm-component webassembly
Last synced: 05 Apr 2025
https://github.com/documents-brasil/ibge
🌎 Data collection of geographical divisions of Brazil by IBGE
brasil brazil data-collection ibge json
Last synced: 12 Apr 2025
https://github.com/khuangaf/itri-speech-recognition-dataset-generation
Automatic Speech Recognition Dataset Generation
automatic data-collection mask-rcnn speech-recognition
Last synced: 30 Apr 2025
https://github.com/atapas/js-collections-map-set
Repository to have example code to demonstrate JavaScript Map and Set data structures.
data-collection data-structures javascript map set
Last synced: 12 Apr 2025
https://github.com/getodk/central-frontend
Vue.js based frontend for ODK Central
data-collection global-development global-health javascript mhealth odk social-impact vuejs
Last synced: 03 Apr 2025
https://github.com/gaalcaras/mailinglistscraper
A python web scraper for public email lists.
data-collection mailinglist scraper scrapy spider webscraping
Last synced: 06 Dec 2024
https://github.com/fulldecent/google-voice-numbers
Retrieves the full list of available Google Voice numbers and finds the best ones
data-collection google-voice harvest harvest-data spider telephone-number telephony
Last synced: 18 Jan 2025
https://github.com/nuhmanpk/webtrench
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
audio-datasets data data-collection data-science dataset-generation deep-learning image-data-generator machine-learning python scarper text-datasets
Last synced: 21 Mar 2025
https://github.com/gidim/babler
Data Collection System For NLP/Speech Recognition
blogs data-collection forums language-modeling machine-learning nlp scraping
Last synced: 16 May 2025
https://github.com/nuhmanpk/Webtrench
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
audio-datasets data data-collection data-science dataset-generation deep-learning image-data-generator machine-learning python scarper text-datasets
Last synced: 20 Nov 2024
https://github.com/lironmiz/pcep-30-0x
PCEP™ – Certified Entry-Level Python Programmer certification shows that the individual is familiar with universal computer programming concepts like data types, containers, functions, conditions, loops, as well as Python programming language syntax, semantics, and the runtime environment.
certificate control-flow course data-collection data-types education exceptions functions input-output learning-by-doing literals numeral-systems operations operators pcap practice python-syntax-and-semantics python3 runtime-environment variables
Last synced: 18 Mar 2025
https://github.com/alttch/pulr
pull devices and transform data into events
automation data data-collection data-conversion ethernet-ip industiral modbus plc plc-programming snmp
Last synced: 28 Apr 2025
https://github.com/mahtafetrat/manatts-persian-speech-dataset
ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset
Last synced: 08 Apr 2025
https://github.com/unicornunicode/FACT
FACT is a tool to collect, process and visualise forensic data from clusters of machines running in the cloud or on-premise.
cloud data-collection forensics
Last synced: 21 Nov 2024
https://github.com/cph-cachet/carp.core-kotlin
Infrastructure-agnostic framework for distributed data collection.
data-collection ddd distributed-computing hacktoberfest mhealth research research-platform
Last synced: 19 Apr 2025
https://github.com/qarmin/system-info-collector
App to collect ram/cpu usage from OS and show it in pretty graphs
Last synced: 22 Mar 2025
https://github.com/MahtaFetrat/ManaTTS-Persian-Speech-Dataset
ManaTTS is the largest open Persian speech dataset with 86+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset
Last synced: 01 Mar 2025
https://github.com/vrknetha/mcp-server-firecrawl
FireCrawl MCP Server is a powerful web scraping integration for Claude and other LLMs. It provides JavaScript rendering, batch processing, and search capabilities through a Model Context Protocol (MCP) interface. Now with support for self-hosted instances and advanced features like parallel processing, automatic retries, and content filtering
batch-processing claude content-extraction data-collection firecrawl firecrawl-ai javascript-rendering llm-tools mcp-server model-context-protocol search-api web-crawler web-scraping
Last synced: 20 Jan 2025
https://github.com/cardi/aws-spot-price-history
automating aws spot price history retrieval
aws-ec2 data-collection spot-instances
Last synced: 12 May 2025
https://github.com/sowinskibraeden/dayz-reforger
A general purpose Discord bot to handle DayZ Killfeed, stats, alarms and factions' armbands using Nitrado log files.
data-analytics data-collection dayz discord discord-bot discord-js fetch-api mongodb nitrado regex scalability
Last synced: 20 Nov 2024
https://github.com/abeltavares/marketpipe
🛠 Containerized and configurable Airflow ETL pipeline for collecting and storing stock and cryptocurrency market data.
airflow aws ci-cd cryptocurrency data-analysis data-collection data-storage docker iac oop pgadmin pipeline postgresql python sql stocks unit-testing
Last synced: 22 Apr 2025
https://github.com/harisbinzia/mastodoner
Mastodoner is a command line tool (and Python library) for archiving Mastodon, a decentralized micro-blogging social network.
data-collection mastodon social-network
Last synced: 12 Apr 2025
https://github.com/redayzarra/sleepapneadetection
My capstone project explores machine learning, hardware, and web development to create a smart home system for monitoring the health of homebound patients suffering from sleep apnea. The system includes data collection through sensors, embedded ML (TinyML) to analyze data, and web development for creating a medical dashboard.
arduino arduino-ide capstone capstone-project data-collection embedded-systems machine-learning machine-learning-algorithms medical mern mern-project mern-stack python tinyml web-development
Last synced: 26 Jan 2025
https://github.com/ikstream/dalec
Dalec is a project that aims to provide a privacy preserving data collection method. It utilizes DNS for client/server seperation while transmiting data encrypted
collection data data-collection dns exfiltration shell
Last synced: 15 May 2025
https://github.com/dsacms/metrics
Experimentations in Open Source Repository Metrics
cmsgov data-collection data-visualization git github github-pages health html-css-javascript metrics opensource python website
Last synced: 13 Apr 2025
https://github.com/kartta-labs/noter-frontend
Photo annotation tool
annotation crowdsourcing data-collection historical-data historical-maps photos
Last synced: 14 Apr 2025
https://github.com/aymane-maghouti/real-time-data-pipeline-using-kafka
This project implements a real-time data pipeline using Apache Kafka, Python's psutil library for metric collection, and SQL Server for data storage. The pipeline collects metrics data from the local computer, processes it through Kafka brokers, and loads it into a SQL Server database. Additionally, a real-time dashboard is created using Power BI.
apache-kafka data-collection data-streaming data-visualization powerbi python real-time real-time-data-pipeline
Last synced: 17 Jan 2025
https://github.com/sferez/twitter_toolbox
Complete Toolbox for Scraping, Streaming, Interact with API, Cleaning, Preprocessing, Applying NLP on Twitter Data
data-collection data-science nlp preprocessing twitter twitter-api twitter-scraping twitter-streaming-api
Last synced: 10 Apr 2025
https://github.com/mahtafetrat/gptinformal-persian-speech-dataset
A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject
data-collection data-preprocessing dataset-preparation forced-alignment mana-tts manatts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset
Last synced: 06 Apr 2025
https://github.com/sowinskibraeden/dayz-data-collection
Collects Log Data from Nitrado DayZ server.
Last synced: 20 Nov 2024
https://github.com/ivopetkov/data-object
A familiar and powerful Data Object abstraction for PHP.
data-collection data-list data-object filter sort
Last synced: 14 Apr 2025
https://github.com/thatsinewave/guardianwatch-bot
Simple discord bot that grabs all the public data about each user inside a server and outputs a list
administrative-tools csv-export data-collection discord-api discord-bot discord-py discord-token educational good-first-contribution good-first-issue good-first-pr good-first-project google-sheets-api member-information mit-license open-source python server-management thatsinewave user-analytics
Last synced: 18 Jan 2025
https://github.com/shuyib/teaching_data_collection
Learn data collection by putting a couple of things into consideration
best-practices data-collection data-science data-structures data-visualization makefile matplotlib pandas-dataframe polars-dataframe
Last synced: 22 Mar 2025
https://github.com/naxzyu/dialogos
Dialogos: Pioneering Interactive Narratives and Language Proficiency with Enhanced AI in Unity
accessible-technology adaptive-feedback ai-integration collaborative-innovation data-collection gamified-education immersive-experiences interactive-storytelling language-learning model-training multilingual-communication narrative-design real-time-interaction unity-development user-engagement
Last synced: 09 Feb 2025
https://github.com/solrikk/datadigger
DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages.
business-intelligence content-extraction data-analysis data-collection data-extraction data-mining go golang-api html-parser marketing-tools metadata-extraction research-tools seo-tools web-application web-crawling web-scraping web-tools
Last synced: 15 Apr 2025
https://github.com/alitahir4024/data-collecting-project
This project is simple data collection project and to practise JS local storage skills
data-collection html-css-javascript localstorage
Last synced: 28 Mar 2025
https://github.com/dmdhrumilmistry/githubprofilescraper
Scrapes github profiles and stores data in json format
data-collection dmdhrumilmistry github-scraper python3 scraper
Last synced: 02 Apr 2025
https://github.com/heiderjeffer/misalignment-between-ownership-and-contribution-affects-system-reliability
Research Proposals RP
archtecture data-analysis data-collection nvivo-software python qualitative-analysis quantative-analysis reliability-engineering software-engineering
Last synced: 08 Feb 2025
https://github.com/cpaxton/needlemaster
Free Android game for experiments in learning task structure from human demonstrations. User performances can be saved and exported for research use.
android-game data-collection game lfd needle-master phone
Last synced: 13 May 2025
https://github.com/tuvalabs/django-inapp-survey
In App Survey/Announcement for Django Application
angular announcement-banner campaign data-collection django django-application django-inapp-survey django-rest-framework question-and-answer survey survey-app
Last synced: 12 Feb 2025
https://github.com/artucuno/guild-network-map
Create a map of all mutual guilds that members share with you on Discord.
data-collection data-visualization discord
Last synced: 14 Apr 2025
https://github.com/kwokhing/exploratory-data-analysis-on-smrt-tweets
Demo on performing exploratory data analysis (EDA) on train service disruptions based on scrapped (user generated contents) tweets from the train operator's (SMRT) twitter account
data-analysis data-cleaning data-collection data-preparation exploratory-data-analysis exploratory-data-visualizations folium geospatial-data leaflet-map python python3 regex scraping selenium selenium-python social-media text-processing user-generated-content web-scraping webscraping
Last synced: 02 Dec 2024
https://github.com/firefly-cpp/succulent
Collect POST requests
data-collection data-preprocessing-pipelines data-science esp32 machine-learning raspberry-pi
Last synced: 13 Apr 2025
https://github.com/shivendrra/web-graze
scrape raw data from various sources of the internet, like wikipedia, internet archieve, britannica, youtube, unsplash, etc
data-collection data-collection-system data-for-llm data-for-ml webscra webscraper webscrapper-python
Last synced: 13 May 2025
https://github.com/ikstream/dns-handler
Data collection server for the dalec user collection system
collection dalec data data-collection dns dns-server python python3
Last synced: 13 Mar 2025
https://github.com/kutukvpavel/gpibserver
General purpose GPIB data collection server
automation data-collection gpib ieee488 server
Last synced: 29 Apr 2025
https://github.com/reckadon/ml-harwithdts
Repository for Assignment 1 of team KAR.ai. - Human Activity Recognition (HAR) with Decision trees and LLMs
data-collection decision-trees groq-api human-activity-recognition jupyter machine-learning matplotlib pandas pca-analysis prompt-engineering sklearn tsfel
Last synced: 07 May 2025
https://github.com/edgee-cloud/meta-capi-component
Meta CAPI Edgee component
data-collection edge edge-computing edgee facebook meta rust wasm wasm-component
Last synced: 02 Mar 2025
https://github.com/aidinhamedi/advanced-arduino-datalogger
This project is an advanced datalogger that logs temperature, humidity, and air pressure. It uses an Adafruit SHT31 sensor for temperature and humidity, and a BMP180 for air pressure.
arduino arduino-ide arduino-mega bluetooth bmp180 c cpp data-collection datalogger sensors sht31 st7735 tft-display
Last synced: 06 Apr 2025
https://github.com/bc100dev/osintgramcxx
A reimplementation of Osintgram, but in C++
cli cli-app command-line command-line-interface command-line-tool cpp data-collection instagram linux-app linux-cli networking open-source osint osint-tool osintgram shell-prompt windows-app windows-cli
Last synced: 29 Dec 2024
https://github.com/ozakboy/taiwan-news-crawlers
.net-based Crawlers for news of Taiwan (.net 台灣新聞爬蟲,數據物件化,方便使用)
crawler data-collection dataset-generation dotnet news taiwan webcrawlers
Last synced: 15 Apr 2025
https://github.com/abdelhakim-gh/nlp_sentiment_analysis_darija
fine tuning a pre-trained model on darija dialect to work with sentiment analysis task
data-collection data-integration data-preprocessing embeddings fine-tuning-llm gradio-interface llm lora natural-language-processing vizualisation
Last synced: 15 Apr 2025
https://github.com/tathithienthanh/finaltest_database-sql-data-collection-for-ds
The final test of the "Database SQL and Data Collection for Data Science" course from The Ho Chi Minh City University of Science (19/09/2023)
chrome data-collection data-processing database final-test ipynb-jupyter-notebook mysql pymysql query scraping-websites selenium sql statistics visualization
Last synced: 05 May 2025
https://github.com/simonblanke/search-data-collector
Thread safe and atomic data collection into csv-files
csv data-collection hyperactive pandas python
Last synced: 05 May 2025
https://github.com/aaronspindler/frcscouting.ca
Public Web Scouting for First Robotics Competition
data-collection first first-robotics first-robotics-competition first-robotics-scouting frc frc-robotics-scouting frc-scouting frc-scouts robot-data-collection robotics robotics-scouting robots scouting
Last synced: 04 Apr 2025
https://github.com/jinsyin/datalink
⚡ 数据集成 | DataLink is a lightweight data integration framework build on top of DataX, Spark and Flink
batch big-data bigdata cdc data data-collection data-exchange data-integration data-pipeline data-synchronization datalink etl flink flink-cdc framework integration pipeline spark streaming
Last synced: 13 Apr 2025
https://github.com/edgee-cloud/demo-html
Demo of Edgee integrated into a basic html website
analytics data-collection edge-computing edgee wasm wasm-component
Last synced: 22 Apr 2025
https://github.com/zeeshanahmad4/global-election-data-gathring-and-cleaning-election-dataset-
Building Election Dataset includes data from 172 countries both "Presidential" and "legislative".
data-collection election-data international-relations legislative-elections natural-language-processing nlp presidential-elections
Last synced: 01 Apr 2025
https://github.com/yukito0209/is6941-ml-social-media
IS6941 Machine Learning & Social Media Analytics 课程小组项目代码仓库,探索机器学习在社交媒体数据分析中的应用。
bert city-university-of-hong-kong crawler data-collection llama machine-learning python sentiment-analysis social-media
Last synced: 01 Apr 2025
https://github.com/code-jl/nfl-point-kicker-data-scraper
A Python-based web scraping toolkit that extracts and processes NFL kicking statistics from Pro-Football-Reference. This project automates the collection of comprehensive game data, with a particular focus on field goal attempts and environmental conditions.
automation beautifulsoup csv data-analysis data-collection field-goals football-statistics kicking-stats nfl python selenium sports-analysis statistics weather-data web-scraping
Last synced: 14 Apr 2025
https://github.com/mehdibo/PHPForms
A set of of tools to help you create forms and export data as fast as possible
data-collection form-generator form-validation forms php-library php7
Last synced: 29 Apr 2025
https://github.com/mehdibo/phpforms
A set of of tools to help you create forms and export data as fast as possible
data-collection form-generator form-validation forms php-library php7
Last synced: 28 Mar 2025