An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/engageintellect/scrapers

A repository of web scrapers using Python & Scrapy

crawler python scrapy spider

Last synced: 31 Mar 2025

https://github.com/insectmk/douban-crawler

豆瓣电影Top250爬虫及数据展示

analysis crawler django echarts mysql python3 website

Last synced: 10 Mar 2026

https://github.com/splorg/sage

A scraper to get every quote from a book off of Goodreads.

books crawler datamining goodreads goodreads-data python scraper scrapy webcrawling webscraping

Last synced: 12 Jun 2025

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 06 Apr 2025

https://github.com/46319943/ganji_community

爬取赶集网上各个城市的小区信息

crawler ganji ganjispider

Last synced: 18 Jan 2026

https://github.com/pjt3591oo/python-parse

this are modules for url pasing

crawler

Last synced: 04 Aug 2025

https://github.com/sxoxgxi/webcrawler

A multi threaded web crawler

crawler python webcrawling

Last synced: 28 Jul 2025

https://github.com/sevenecks/web-crawler

crawl a website, find pages, find links, find relationships between them and report on 404 and other errors

404 checker crawler site web

Last synced: 21 Jun 2025

https://github.com/yuchenq/comp90055-project

This is the lastest version of my project belong to Comp90055.

couchdb crawler data-visualization python3 textblob tweepy

Last synced: 16 Jul 2025

https://github.com/balintpethe/laravel-universal-scraper

Universal Scraper for Laravel

crawler laravel scraper web-scraper

Last synced: 13 Jan 2026

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 22 May 2026

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 08 Apr 2025

https://github.com/evansuner/smartproxypool

智能代理,自动获取可用高匿代理

crawler fastapi proxy python

Last synced: 15 May 2026

https://github.com/diegojromerolopez/relwrac

A basic crawler developed with python and asyncio

asyncio crawler page-rank python

Last synced: 11 Nov 2025

https://github.com/jefftriplett/pholcidae-demo

:spider: A Pholcidae demo for crawling/spidering a website

crawler csv pholcidae python scrapper scrapy-crawler spider toml

Last synced: 22 Jul 2025

https://github.com/danielemoraschi/sitemap-app

Sitemap generator command line application using dmoraschi/sitemap-common library

crawler php php-library sitemap sitemap-generator

Last synced: 19 Oct 2025

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 21 Mar 2025

https://github.com/forattini-dev/crawlex

The stealth crawler that actually looks like Chrome.

crawler stealth

Last synced: 14 May 2026

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 10 Jun 2026

https://github.com/maddevsio/spiderwoman

"Vertical" crawler, which main target is to count links (resolved, e.g. from bit.ly) to external domains from all pages of given resources

big-data count-links crawler golang

Last synced: 19 May 2026

https://github.com/andrepradika/scrape-medrecruit.medworld.com

🛠 A Playwright-based web scraper that extracts job listings from MedRecruit, including job title, department, location, job type, duration, and job URL, saving the data to an Excel file.

crawler scrape

Last synced: 17 Mar 2025

https://github.com/andrepradika/scrape-xpel.com

📌 A Playwright-based web scraper that extracts installer details from XPEL’s Installer Locator and saves them to CSV and Excel files.

crawler scrape

Last synced: 17 Mar 2025

https://github.com/lfsc09/crawl-this-go

Simple CLI tool for crawling pdf documents and html pages

crawler go

Last synced: 18 Jun 2025

https://github.com/seanghay/wpget

⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API

crawler wordpress wp-json

Last synced: 08 Feb 2026

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 11 Feb 2026

https://github.com/casatrick/solana-transaction-crawler

crawl & parse solana transaction

crawler parser rust solana transaction

Last synced: 20 Jun 2026

https://github.com/lilchen96/pokemon-crawler

Crawl JSON-formatted data for Pokémon, based on the PokeAPI.

crawler pokemon

Last synced: 28 Dec 2025

https://github.com/bersegosx/exparic

Web parser via yaml config

crawler parser yaml-configuration

Last synced: 21 Oct 2025

https://github.com/prorobot-ai/worker

A concurrent web worker written in Go (Golang) designed to crawl websites efficiently while respecting basic crawling policies. The worker stops automatically after crawling a specified number of links (default: 64).

crawler golang grpc-server scraper

Last synced: 29 Jul 2025

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 16 Jan 2026

https://github.com/athulmurali/flickr-api-docs-crawler

A python based crawler that extracts the documentation of apis and writes it into a file as JSON. A beautiful documentation page can be built from the JSON file using Docusaurus

api beautifulsoup4 crawler documentation python3

Last synced: 18 Jun 2026

https://github.com/eneax/web-crawler

A web crawler built in Node.js

crawler javascript nodejs web-crawler

Last synced: 15 Apr 2026

https://github.com/sauerbraten/monzter

Link crawler with configurable maximum depth and rate limiting

crawler go golang web-crawler

Last synced: 23 May 2026

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 28 May 2026

https://github.com/weizujie/python3-spider

Python 写的一些爬虫小脚本

crawler python3

Last synced: 18 May 2026

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 01 Jun 2026

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 15 Mar 2025

https://github.com/billy0402/scrapy-tutorial

A learning project from the book 'Scrapy一本就精通'.

course crawler docker mongodb mysql proxy python redis scrapy splash sqlite ubuntu

Last synced: 13 Apr 2026

https://github.com/agucova/needs-seeding

🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.

crawler sci-hub torrents

Last synced: 12 Oct 2025

https://github.com/istador/mediaindexer

Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.

crawler website

Last synced: 03 Jan 2026

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 30 Jul 2025

https://github.com/zhou-chaoxian/ax-spider

A simple, powerful, and fast asynchronous Python crawler framework.

asyncio ax-spider crawler httpx python scrapy

Last synced: 18 Mar 2025

https://github.com/amirsorouri00/crawler

Page-Rank Public python2 projects whice have been turned into python3.

crawler page-rank python

Last synced: 05 Sep 2025

https://github.com/c17an/grade-tracer

👨‍💻 항공대 성적변동 추적 크롤러 🏑

concurrently crawler es6 express nodejs nodemon puppeteer react

Last synced: 13 Apr 2026

https://github.com/izh318/genie-music-artist-album-crawler

지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.

crawler genie genie-music gui

Last synced: 08 Nov 2025

https://github.com/shashankgroovy/crawler

Python crawler

crawler python webcrawler

Last synced: 30 Jul 2025

https://github.com/gabrielolobo/crawley

This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.

crawler poetry python scrapping

Last synced: 22 Jun 2025

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 16 Jan 2026

https://github.com/cristiangreco/gcrawler

A simple (not concurrent) web crawler written in Java.

crawler java

Last synced: 30 Jul 2025

https://github.com/kianoushamirpour/crawl_google_scholar_with_selenium_fastapi_mongodb

Crawl google scholar profiles with selenium, store the extracted data in the MongoDB and serve the queries with FastAPI.

crawler fastapi google-scholar mongodb python selenium

Last synced: 16 Apr 2026

https://github.com/ggteixeira/motorcycle-simulator

A toy project that fetches prices from motorcycles from OLX and does some calculations for those who want to buy them..

crawler motorcycle olx scraper

Last synced: 28 Feb 2025

https://github.com/fusetim/bitcrawler

Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?

bittorrent crawler dht

Last synced: 30 Mar 2025

https://github.com/freakwill/mycrawlers

🕷 My Crawlers for Movies、Information、Encyclopedia...

baidu crawler douban movie quotes taobao

Last synced: 21 Mar 2025

https://github.com/jauharibill/animeindo-crawler

this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage

crawler python3 scrapy

Last synced: 08 Jul 2025

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 31 Mar 2025

https://github.com/iomarmochtar/imagecrawler

Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+

crawler python-library

Last synced: 14 May 2025

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 10 May 2026

https://github.com/imrany/spindle

An open-source, lightweight web crawler and scraper. It can discover links on the web (crawler) and extract structured data from webpages (scraper).

crawler go golang scraper

Last synced: 24 Sep 2025

https://github.com/lesterrry/mutt

More Usable Time Tracker

crawler ios-calendar parser

Last synced: 15 Jul 2025

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 10 Jun 2026

https://github.com/vaenow/chromeless-coursera-caption

Chromeless crawler coursera video's caption / subtitle

caption chromeless coursera crawler crx subtitle

Last synced: 31 Mar 2025

https://github.com/dappros/site_crawler

Site crawler used in Ethora platform as an option to import your specific business data into your AI agent chat bot.

crawler data-ingestion embedding-vectors embeddings ethora llm rag retrieval-augmented-generation retrieval-based-chatbots retrieval-chatbot semantic-search site-crawler vectorstore web-scraping website-indexing

Last synced: 20 Jan 2026

https://github.com/pvital/cra-cra

Another web crawler

crawler python

Last synced: 16 Mar 2025

https://github.com/guillempuche/news_crawler

Scrape news from Olot town hall (https://www.olot.cat) with TypeScript and Crawlee. Collects summaries and full articles, stored in separate datasets.

biomejs crawlee crawler news-crawler olot townhall yarn-berry

Last synced: 23 Oct 2025

https://github.com/linjonh/videowebsidesparser

This Project is used to parse a video web side to remove ads.

crawler parser python

Last synced: 13 Jun 2025

https://github.com/zzzzer91/match_spider

某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:

crawler python

Last synced: 16 Jun 2026

https://github.com/oleksandr-moik/spring-boot-web-crawler

Web Crawler app on Spring Boot. Getting categories and relevant news category.

crawler gradle java spring-boot

Last synced: 03 May 2026

https://github.com/yann-github/webcrawler-http

Command line application to crawl a website and generate a report of internal linking structure

crawler csv-format javascript jest node report tdd

Last synced: 03 May 2026

https://github.com/tetreum/price-crawler

Article price crawler

crawler nodejs

Last synced: 26 Apr 2026

https://github.com/taiizor/gocrawler

A high-performance web crawler with concurrent processing capabilities written in Go.

crawler csv go golang golang-application golang-library json storage url web

Last synced: 26 Apr 2026

https://github.com/bingxyz/btcethcrawler

telegram 比特幣、乙太幣廣播頻道

bash bash-script crawler telegram-bot

Last synced: 26 Apr 2026

https://github.com/qeqqe/cog

An MCP integerated intelligent RAG that gives relevent context to LLM's through crawled Docs

backend-api claude-desktop crawl4ai crawler fastapi mcp python rag sementic-chunking

Last synced: 04 May 2026

https://github.com/palpitate-xus/sge_data_insert

利用Github Actions实现自动获取sge数据并存入数据库

crawler mysql python

Last synced: 26 Apr 2026

https://github.com/bennettdams/vace-it-crawler

Python (Scrapy) crawler to access data of FACEIT.com

crawler python scrapy

Last synced: 03 Jun 2026

https://github.com/jamesjarvis/web-graph

Experiment with web scraping

colly crawler database golang web-graph

Last synced: 04 May 2026

https://github.com/zzzzer91/chinaxinge

chinaxinge 爬虫。

crawler python python3

Last synced: 17 Jun 2026

https://github.com/kareemsasa3/arachne

A resilient, concurrent web scraper service built in Go, featuring a REST API, Redis-backed job queue, and circuit breaker for fault tolerance.

asynchronous circuit-breaker concurrency crawler docker docker-compose go golang job-queue rate-limiting redis rest-api web-scraper web-scraping

Last synced: 04 May 2026

https://github.com/liu233w/ojhunt-lite

A lightweight async Python tool for querying Online Judge (OJ) statistics across multiple platforms. Track your accepted problems (AC) and total submissions from 29+ competitive programming platforms.

acm-icpc codechef-api codeforces-api crawler spoj-api

Last synced: 05 May 2026

https://github.com/igorbrizack/crawler-web

Aplicação de coleta de dados Web com ReactJS e Python - API Rest

beautifulsoup crawler docker fastapi mongodb nodejs python3 react scraper

Last synced: 16 Apr 2026

https://github.com/basemax/crawleryjc

This PHP crawler is designed to scrape news articles and categories from the YJC.ir news agency website. It provides a way to extract valuable data from the website for further analysis or any other purpose.

crawler crawler-php database database-news ir ir-yjc iran news news-database news-yjc php php-crawler yjc yjc-ir yjc-news

Last synced: 05 May 2026

https://github.com/hileix/jjxy-lib-search

图书馆书籍查询爬虫工具

crawler expressjs nodejs phantomjs request

Last synced: 05 May 2026

https://github.com/monumentality/ifiend

Check latest YouTube uploads without leaving the comfort of your terminal.

crawler headless-chrome terminal-based youtube yt-dlp

Last synced: 25 Apr 2026

https://github.com/lanesun/one-link

"One Link to rule them all."

crawler curl http svelte web-service

Last synced: 05 May 2026

https://github.com/fauzaanu/markdown-crawler

Python tool that crawls websites and neatly saves their text content into markdown files, providing a convenient way to archive the text content of the web locally

crawler llm markdown rag scraper

Last synced: 06 May 2026

https://github.com/rodrigorvsn/ace

🔥 Receiving an email of hottest promotions every day

crawler cronjob nextjs prisma puppeteer react-email resend

Last synced: 17 Apr 2026

https://github.com/tribecabrasil/tribeca-insights

Modular Python CLI for content extraction, term frequency analysis, and SEO reporting

analytics crawler django insights seo

Last synced: 06 May 2026

https://github.com/lsongdev/node-crawler

simple crawler

crawler node-crawler

Last synced: 06 May 2026

https://github.com/igapyon/selecrawler

Simple selenium based web crawler

chrome crawler java selenium web

Last synced: 06 May 2026

https://github.com/jnbdz/xtamia-crawler

(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux

crawler electron foundation foundation-css javascript scraper vuejs xtamia

Last synced: 06 May 2026