Crawler | Ecosyste.ms: Awesome

https://github.com/flaribbit/pixiv-favorites-list

爬取P站收藏夹保存为json格式

crawler pixiv python

Last synced: 21 Apr 2026

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 05 Jun 2026

https://github.com/illm4tic/pokemon-crawler

Crawl JSON-formatted data for Pokémon, based on the PokeAPI.

crawler pokemon

Last synced: 21 Apr 2026

https://github.com/v-bible/crawler

A collection of web crawlers to crawl Catholic resources in Vietnamese language

catholic corpus-linguistics crawler nlp playwright

Last synced: 22 Apr 2026

https://github.com/thc1006/nycu_timtable_crawler

🎓 NYCU Course Data Crawler & Timetable System | 國立陽明交通大學課程爬蟲與選課系統 - Python web scraper for course schedules, syllabi & educational data analysis. Crawls 18K+ courses with 98% success rate. Features: interactive timetable, JSON API, Google Colab support, batch processing, resume capability.

academic course course-selection crawler data-analysis education educational-data google-colab json-api nycu open-data python schedule student-tools syllabus taiwan timetable university web-automation web-scraping

Last synced: 24 Apr 2026

https://github.com/theshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 24 Apr 2026

https://github.com/dnlzrgz/excursionist

Scrapy-powered flight price crawler.

crawler crawlers crawling flight flights playwright scraper scraping-websites scrapy travel traveling

Last synced: 24 Apr 2026

https://github.com/monumentality/ifiend

Check latest YouTube uploads without leaving the comfort of your terminal.

crawler headless-chrome terminal-based youtube yt-dlp

Last synced: 25 Apr 2026

https://github.com/liu233w/ojhunt-lite

A lightweight async Python tool for querying Online Judge (OJ) statistics across multiple platforms. Track your accepted problems (AC) and total submissions from 29+ competitive programming platforms.

acm-icpc codechef-api codeforces-api crawler spoj-api

Last synced: 05 May 2026

https://github.com/palpitate-xus/sge_data_insert

利用Github Actions实现自动获取sge数据并存入数据库

crawler mysql python

Last synced: 26 Apr 2026

https://github.com/bingxyz/btcethcrawler

telegram 比特幣、乙太幣廣播頻道

bash bash-script crawler telegram-bot

Last synced: 26 Apr 2026

https://github.com/taiizor/gocrawler

A high-performance web crawler with concurrent processing capabilities written in Go.

crawler csv go golang golang-application golang-library json storage url web

Last synced: 26 Apr 2026

https://github.com/tetreum/price-crawler

Article price crawler

crawler nodejs

Last synced: 26 Apr 2026

https://github.com/mg98/ipfs-replicate

Replicate IPFS' distributed data structure locally, based on network traces.

crawler dag ipfs redisgraph scraper

Last synced: 02 May 2026

https://github.com/twknab/django_ajax_web_crawler

Web crawler which retrieves all links on any page. Python & Django-powered.

beautifulsoup4 crawler django-application

Last synced: 27 Apr 2026

https://github.com/martinkennelly/websitesearchcrawler

Website Crawler

crawler java website

Last synced: 27 Apr 2026

https://github.com/dearvn/crawl-mortgage-broker

A script to crawl data from website https://findamortgagebroker.com/

crawler findamortgagebroker mortgage-lenders mortgage-loans nmls php7 python3 seleniumbase

Last synced: 28 Apr 2026

https://github.com/zzzzer91/crash

通用多线程爬虫框架。

crawler framework python

Last synced: 28 Apr 2026

https://github.com/justserpapi/web-html

JustSerpAPI Crawl Webpage HTML API Python SDK examples, with related Google Search API, Google Lens API, Google Maps API, Google News API, Google Shopping API, Google Scholar API, Google Finance API, Google Trends API, Google Jobs API, Google Patents API, Google Hotels API, and Web APIs.

crawler google-finance-api google-hotels-api google-jobs-api google-lens-api google-maps-api google-news-api google-patents-api google-scholar-api google-search-api google-shopping-api google-trends-api html-api justserpapi python serp-api web-crawling web-html-api web-scraping

Last synced: 08 Jun 2026

https://github.com/kkuvam/web-scrape

Web Scraping Technology Evaluation - Evaluation of different web scraping technologies in Python, with a focus on Requests, BeautifulSoup, and Scrapy. Benchmarked each technology for ease of use, performance, scalability, and maintainability

beautifulsoup crawler requests scraping scrapy

Last synced: 28 Apr 2026

https://github.com/cseas/crawler

Recursive web crawler

crawler python seed-webpage

Last synced: 28 Apr 2026

https://github.com/frobware/grawler

Web Crawler

crawler go

Last synced: 08 Jun 2026

https://github.com/josepedrodias/naivebot

attempt to mimic googlebot behaviour in nodejs with nightmarejs

crawler googlebot nightmarejs nodejs robots

Last synced: 29 Apr 2026

https://github.com/chunkingz/youtubelinks-scraper

A python script that scrapes Youtube links from a predefined website of choice.

crawler python scraper spider websitescraper youtube

Last synced: 29 Apr 2026

https://github.com/antash-mishra/huskyai

Democratizing News Feed

celery crawler flask llama news nextjs

Last synced: 29 Apr 2026

https://github.com/ryu1kn/procedural-page-crawler

Page Crawler. Tell it where to go and what to look for.

crawler npm-package scraper

Last synced: 30 Apr 2026

https://github.com/antoniowd/crawly

Un web crawler para explorar la web en busca de determinada informacion (email, telefonos, etc...)

crawler got jsdom nodejs webcrawler webscraping

Last synced: 01 May 2026

https://github.com/zawlinnnaing/my-wiki-crawler

A simple program for crawling Burmese wikipedia using Media wiki API.

crawler myanmar-tools python wikipedia-api

Last synced: 01 May 2026

https://github.com/qqxs/usda_pomological_watercolors

爬取美国农业部果树水彩的数据

crawler koa2 nodejs watercolors

Last synced: 01 May 2026

https://github.com/luciopaiva/dicio-crawler

Node.js crawler for dicio.com.br.

crawler nodejs scraper

Last synced: 02 May 2026

https://github.com/cold-bin/jwzx-mail

use golang to construct cqupt-jwzx crawler application

crawler golang

Last synced: 09 Jun 2026

https://github.com/aristotelesbr/api_quotes

Project test for job.

crawler mongodb rails5

Last synced: 02 May 2026

https://github.com/alexnthnz/web-crawler

Scalable web crawler built with Python, Redis, and Cassandra, inspired by Alex Xu's design. Crawls, indexes, and stores web content with robots.txt compliance and duplicate detection.

crawler python

Last synced: 03 May 2026

https://github.com/soffits/oogc-resource-index

Spreadsheet-ready OOGC resource indexing with incremental crawl, authenticated download URLs, and Seafile export.

agpl-3 automation cli crawler python uv

Last synced: 03 May 2026

https://github.com/rebrowser/iaai-dataset

IAAI salvage auction data: vehicle listings with loss types, damage codes, title brands, mileage, drivetrain, condition grades, and branch locations. Updated daily.

automotive-data crawler data-collection data-science dataset iaai insurance-auto-auction open-data parquet salvage-auction salvage-vehicles scraper total-loss vehicle-auction web-scraping

Last synced: 03 May 2026

https://github.com/rebrowser/iheart-dataset

iHeart radio station database: 3,600+ stations with call letters, formats, markets, cume audience, stream URLs, and 185M+ daily airplay records. Updated daily.

airplay crawler data-collection data-science dataset datasets iheart music-data open-data radio radio-stations scraper web-scraping

Last synced: 03 May 2026

https://github.com/oleksandr-moik/spring-boot-web-crawler

Web Crawler app on Spring Boot. Getting categories and relevant news category.

crawler gradle java spring-boot

Last synced: 03 May 2026

https://github.com/yann-github/webcrawler-http

Command line application to crawl a website and generate a report of internal linking structure

crawler csv-format javascript jest node report tdd

Last synced: 03 May 2026

https://github.com/qeqqe/cog

An MCP integerated intelligent RAG that gives relevent context to LLM's through crawled Docs

backend-api claude-desktop crawl4ai crawler fastapi mcp python rag sementic-chunking

Last synced: 04 May 2026

https://github.com/jamesjarvis/web-graph

Experiment with web scraping

colly crawler database golang web-graph

Last synced: 04 May 2026

https://github.com/kareemsasa3/arachne

A resilient, concurrent web scraper service built in Go, featuring a REST API, Redis-backed job queue, and circuit breaker for fault tolerance.

asynchronous circuit-breaker concurrency crawler docker docker-compose go golang job-queue rate-limiting redis rest-api web-scraper web-scraping

Last synced: 04 May 2026

https://github.com/basemax/crawleryjc

This PHP crawler is designed to scrape news articles and categories from the YJC.ir news agency website. It provides a way to extract valuable data from the website for further analysis or any other purpose.

crawler crawler-php database database-news ir ir-yjc iran news news-database news-yjc php php-crawler yjc yjc-ir yjc-news

Last synced: 05 May 2026

https://github.com/hileix/jjxy-lib-search

图书馆书籍查询爬虫工具

crawler expressjs nodejs phantomjs request

Last synced: 05 May 2026

https://github.com/yukihirai0505/streamcrawler

akka stream × crawler

akka-streams crawler elasticsearch instagram sbt scala

Last synced: 05 May 2026

https://github.com/lanesun/one-link

"One Link to rule them all."

crawler curl http svelte web-service

Last synced: 05 May 2026

https://github.com/fauzaanu/markdown-crawler

Python tool that crawls websites and neatly saves their text content into markdown files, providing a convenient way to archive the text content of the web locally

crawler llm markdown rag scraper

Last synced: 06 May 2026

https://github.com/tribecabrasil/tribeca-insights

Modular Python CLI for content extraction, term frequency analysis, and SEO reporting

analytics crawler django insights seo

Last synced: 06 May 2026

https://github.com/igapyon/selecrawler

Simple selenium based web crawler

chrome crawler java selenium web

Last synced: 06 May 2026

https://github.com/jnbdz/xtamia-crawler

(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux

crawler electron foundation foundation-css javascript scraper vuejs xtamia

Last synced: 06 May 2026

https://github.com/hasdata/find-urls-from-any-domain

This repository provides practical examples of website link scraping using Python and Node.js.

ai-extraction crawler hasdata-api nodejs python sitemap-parser url-extraction web-crawling web-scraping

Last synced: 06 May 2026

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 07 May 2026

https://github.com/tylpk1216/new-taipei-parkinfo

Find the available parking in New Taipei, Taiwan.

crawler golang goverment-data

Last synced: 07 May 2026

https://github.com/zhqiang1989/youtube-graph-collector

A demo in python on how to collect youtube video engagement graph data

crawler graph video youtube

Last synced: 07 May 2026

https://github.com/ireddragonicy/booruprompt

A simple web application built with NextJS to extract tags from booru websites. Just paste the URL of a booru post, and this tool will fetch and display the associated tags, ready for you to copy.

booru cleaning-data crawler nextjs noobai tags typescript web

Last synced: 07 May 2026

https://github.com/wcygan/crawler

web crawler

crawler crawling tokio tokio-rs web-crawler

Last synced: 08 May 2026

https://github.com/tsaohucn/crawler_fb_page

This is crawler use selenium for facebook pages

crawler facebook-page rails ruby selenium

Last synced: 09 May 2026

https://github.com/allotmentandy/socialmedialinkextractor

php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links

crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube

Last synced: 09 May 2026

https://github.com/basemax/okala-product-ids

A PHP script to fetch and save product IDs from Okala's online store API across multiple categories and store branches.

crawler crawler-okala crawler-php crawlers data database ids ir iran json okala okala-crawler php php-crawler product

Last synced: 09 May 2026

https://github.com/xenia101/coro.na

A Web Map Service for the Corona-virus

coronavirus covid-19 crawler crawling flask flask-api json python requests urllib

Last synced: 09 May 2026

https://github.com/catbraaain/search-crawl

Search the web and crawl content stealthily, with optional extraction using LLMs.

crawl crawler fastapi playwright scrape scraping searxng

Last synced: 09 May 2026

https://github.com/a-b-z-b/web-spider

A Humble Web Crawler

crawler docker-compose go mongodb web-crawler

Last synced: 09 May 2026

https://github.com/victorbaumgartner/electron-crawler-ui

Desktop app with axios electron to crawl websites accross multiple servers

app axios crawler desktop electronjs macos multiple-servers multithreading

Last synced: 09 May 2026

https://github.com/machinecyc/lotteryinsight

Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.

crawler data docker lottery mysql-database python3 taiwan

Last synced: 09 May 2026

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具，可以自定义抽取自己所需要的字段内容，简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 10 May 2026

https://github.com/khanof89/twitter_scraper

Scrape tweet details from user profile using selenium

crawler scraper selenium twitter twitter-bot

Last synced: 11 May 2026

https://github.com/woshiluo/bilibilicomic-download

bilibili crawler downloader manga

Last synced: 11 May 2026

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 12 May 2026

https://github.com/georgynet/crawler

Web Crawler

crawler go golang web-crawler

Last synced: 10 Jun 2026

https://github.com/sbstjn/tatort

Query information for upcoming Tatort shows

crawler node nodejs tatort

Last synced: 12 May 2026

https://github.com/fredcodee/pexel.com-image-scrapper

download images from pexel.com

crawler image python selenium

Last synced: 13 May 2026

https://github.com/nextlevelshit/node-crawl

Webcrawler for nodejs

crawl crawler javascript nodejs

Last synced: 14 May 2026

https://github.com/scrape-do/dotnet-example

Best Rotating Proxy & Scraping API Alternative. C# Example.

captcha captcha-solver crawler crawlers crawling data-mining data-science data-scraping free free-proxy free-proxy-list proxy proxy-list proxylist rotating-proxy scraper scraping scraping-api scraping-tool

Last synced: 12 Jun 2026

https://github.com/jurooravec/knwldg

Datasets, scrapers, pipelines

companies crawler data dataset non-profit-organizations scraper scrapy

Last synced: 13 Jun 2026

https://github.com/soenneker/soenneker.playwrights.crawler

A configurable Playwright crawler with rich stealth and control options.

browser chrome chromium crawl crawler csharp dotnet playwright playwrightcrawler playwrights scrape scraper stealth util

Last synced: 14 Jun 2026

https://github.com/vhdm/twitter-hashtag-crawler

Twitter hashtag crawler by selenium, without using the Twitter API ;)

crawler python tor twitter

Last synced: 14 Jun 2026

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 15 Jun 2026

https://github.com/raspi/scrapy-corsair

Web crawler for Corsair (corsair.com)

crawler hardware memory scrapy spider

Last synced: 15 Jun 2026

https://github.com/arman-aminian/divar-text-exploring

The first practice of Dr. Asgari's NLP lesson - Data Exploration

crawler natural-language-processing nlp preprocessing scrapy

Last synced: 15 Jun 2026

https://github.com/zhanziyuan/webdownloader

Download elements from the specified website.

crawler downloader image image-downloader python python-crawler web

Last synced: 15 Jun 2026

https://github.com/tpeterw/summariser

summarizer for pdf and text based uploads

crawler hackathon nlp node nodejs python

Last synced: 15 Jun 2026

https://github.com/zzzzer91/match_spider

某菠菜网站爬虫，该网站已倒闭:disappointed_relieved:

crawler python

Last synced: 16 Jun 2026

https://github.com/zzzzer91/chinaxinge

chinaxinge 爬虫。

crawler python python3

Last synced: 17 Jun 2026

https://github.com/mach1el/openproject-crawler

Scraping data on OpenProject

crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling

Last synced: 17 Jun 2026

https://github.com/maxonary/simple-crawler

Streamlit Webscraper

crawler streamlit webscraping

Last synced: 20 Jun 2026

https://github.com/manchittlab/TheCrawler

Open-source web scraper + LLM-powered structured extraction. PDF/DOCX, markdown, JSON-LD, microdata, commerce data, forms, 16 analytics-tracker detection. Structured errors with retryable flags. Adaptive Cheerio->Playwright. CLI, npm, REST API, and MCP server. AGPL-3.0.

agpl apify cheerio crawler llm markdown mcp mcp-server model-context-protocol nodejs playwright rag scraper typescript web-scraping