An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/splorg/sage

A scraper to get every quote from a book off of Goodreads.

books crawler datamining goodreads goodreads-data python scraper scrapy webcrawling webscraping

Last synced: 12 Jun 2025

https://github.com/forattini-dev/crawlex

The stealth crawler that actually looks like Chrome.

crawler stealth

Last synced: 14 May 2026

https://github.com/46319943/ganji_community

爬取赶集网上各个城市的小区信息

crawler ganji ganjispider

Last synced: 18 Jan 2026

https://github.com/aminehsan/datamining-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scraping

Last synced: 25 Jul 2025

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 06 Apr 2025

https://github.com/sxoxgxi/webcrawler

A multi threaded web crawler

crawler python webcrawling

Last synced: 28 Jul 2025

https://github.com/yggverse/yps

YPS - Yggdrasil Port Scanner

cli crawler network port scanner tcp tool udp yggdrasil

Last synced: 03 Jul 2025

https://github.com/diegojromerolopez/relwrac

A basic crawler developed with python and asyncio

asyncio crawler page-rank python

Last synced: 11 Nov 2025

https://github.com/danielemoraschi/sitemap-app

Sitemap generator command line application using dmoraschi/sitemap-common library

crawler php php-library sitemap sitemap-generator

Last synced: 19 Oct 2025

https://github.com/longluo/spider

My Python Spider / Crawler

crawler python spider twitter weibo weibo-crawler weibo-spider

Last synced: 11 Jun 2025

https://github.com/phatpham9/scraper.fun

Building, using & sharing HTML scraper are way funnier!

crawler html-scraper scraper

Last synced: 24 Mar 2025

https://github.com/evansuner/smartproxypool

智能代理,自动获取可用高匿代理

crawler fastapi proxy python

Last synced: 15 May 2026

https://github.com/agucova/needs-seeding

🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.

crawler sci-hub torrents

Last synced: 12 Oct 2025

https://github.com/andrepradika/scrape-medrecruit.medworld.com

🛠 A Playwright-based web scraper that extracts job listings from MedRecruit, including job title, department, location, job type, duration, and job URL, saving the data to an Excel file.

crawler scrape

Last synced: 17 Mar 2025

https://github.com/jefftriplett/pholcidae-demo

:spider: A Pholcidae demo for crawling/spidering a website

crawler csv pholcidae python scrapper scrapy-crawler spider toml

Last synced: 22 Jul 2025

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 21 Mar 2025

https://github.com/andrepradika/scrape-xpel.com

📌 A Playwright-based web scraper that extracts installer details from XPEL’s Installer Locator and saves them to CSV and Excel files.

crawler scrape

Last synced: 17 Mar 2025

https://github.com/pjt3591oo/python-parse

this are modules for url pasing

crawler

Last synced: 04 Aug 2025

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 10 Jun 2026

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 10 Jun 2026

https://github.com/maddevsio/spiderwoman

"Vertical" crawler, which main target is to count links (resolved, e.g. from bit.ly) to external domains from all pages of given resources

big-data count-links crawler golang

Last synced: 19 May 2026

https://github.com/yuchenq/comp90055-project

This is the lastest version of my project belong to Comp90055.

couchdb crawler data-visualization python3 textblob tweepy

Last synced: 16 Jul 2025

https://github.com/balintpethe/laravel-universal-scraper

Universal Scraper for Laravel

crawler laravel scraper web-scraper

Last synced: 13 Jan 2026

https://github.com/ronierisonmaciel/crawler

Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.

beautifulsoup4 crawler crawling python python3

Last synced: 26 Mar 2025

https://github.com/casatrick/solana-transaction-crawler

crawl & parse solana transaction

crawler parser rust solana transaction

Last synced: 20 Jun 2026

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 08 Apr 2025

https://github.com/bersegosx/exparic

Web parser via yaml config

crawler parser yaml-configuration

Last synced: 21 Oct 2025

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 01 Jun 2026

https://github.com/athulmurali/flickr-api-docs-crawler

A python based crawler that extracts the documentation of apis and writes it into a file as JSON. A beautiful documentation page can be built from the JSON file using Docusaurus

api beautifulsoup4 crawler documentation python3

Last synced: 18 Jun 2026

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 16 Jan 2026

https://github.com/seanghay/wpget

⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API

crawler wordpress wp-json

Last synced: 08 Feb 2026

https://github.com/prorobot-ai/worker

A concurrent web worker written in Go (Golang) designed to crawl websites efficiently while respecting basic crawling policies. The worker stops automatically after crawling a specified number of links (default: 64).

crawler golang grpc-server scraper

Last synced: 29 Jul 2025

https://github.com/kianoushamirpour/crawl_google_scholar_with_selenium_fastapi_mongodb

Crawl google scholar profiles with selenium, store the extracted data in the MongoDB and serve the queries with FastAPI.

crawler fastapi google-scholar mongodb python selenium

Last synced: 16 Apr 2026

https://github.com/weizujie/python3-spider

Python 写的一些爬虫小脚本

crawler python3

Last synced: 18 May 2026

https://github.com/billy0402/scrapy-tutorial

A learning project from the book 'Scrapy一本就精通'.

course crawler docker mongodb mysql proxy python redis scrapy splash sqlite ubuntu

Last synced: 13 Apr 2026

https://github.com/lesterrry/mutt

More Usable Time Tracker

crawler ios-calendar parser

Last synced: 15 Jul 2025

https://github.com/eneax/web-crawler

A web crawler built in Node.js

crawler javascript nodejs web-crawler

Last synced: 15 Apr 2026

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 15 Mar 2025

https://github.com/sauerbraten/monzter

Link crawler with configurable maximum depth and rate limiting

crawler go golang web-crawler

Last synced: 23 May 2026

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 28 May 2026

https://github.com/amirsorouri00/crawler

Page-Rank Public python2 projects whice have been turned into python3.

crawler page-rank python

Last synced: 05 Sep 2025

https://github.com/c17an/grade-tracer

👨‍💻 항공대 성적변동 추적 크롤러 🏑

concurrently crawler es6 express nodejs nodemon puppeteer react

Last synced: 13 Apr 2026

https://github.com/istador/mediaindexer

Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.

crawler website

Last synced: 03 Jan 2026

https://github.com/fusetim/bitcrawler

Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?

bittorrent crawler dht

Last synced: 30 Mar 2025

https://github.com/gabrielolobo/crawley

This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.

crawler poetry python scrapping

Last synced: 22 Jun 2025

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 30 Jul 2025

https://github.com/jauharibill/animeindo-crawler

this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage

crawler python3 scrapy

Last synced: 08 Jul 2025

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 16 Jan 2026

https://github.com/ggteixeira/motorcycle-simulator

A toy project that fetches prices from motorcycles from OLX and does some calculations for those who want to buy them..

crawler motorcycle olx scraper

Last synced: 28 Feb 2025

https://github.com/izh318/genie-music-artist-album-crawler

지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.

crawler genie genie-music gui

Last synced: 08 Nov 2025

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 10 May 2026

https://github.com/freakwill/mycrawlers

🕷 My Crawlers for Movies、Information、Encyclopedia...

baidu crawler douban movie quotes taobao

Last synced: 21 Mar 2025

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 17 Nov 2025

https://github.com/zhou-chaoxian/ax-spider

A simple, powerful, and fast asynchronous Python crawler framework.

asyncio ax-spider crawler httpx python scrapy

Last synced: 18 Mar 2025

https://github.com/iomarmochtar/imagecrawler

Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+

crawler python-library

Last synced: 14 May 2025

https://github.com/tech-espm/misc-webbot

This project is aimed on creating personal assistants for replying messages about specifics issues.

classification-model crawler nlp

Last synced: 12 Jun 2026

https://github.com/shashankgroovy/crawler

Python crawler

crawler python webcrawler

Last synced: 30 Jul 2025

https://github.com/guillempuche/news_crawler

Scrape news from Olot town hall (https://www.olot.cat) with TypeScript and Crawlee. Collects summaries and full articles, stored in separate datasets.

biomejs crawlee crawler news-crawler olot townhall yarn-berry

Last synced: 23 Oct 2025

https://github.com/kweonminsung/crawl2toast

Real-time toast notification of crawled data with CSS selectors(Windows Only)

beautifulsoup4 crawler selenium tkinter toast-notifications

Last synced: 18 May 2026

https://github.com/pvital/cra-cra

Another web crawler

crawler python

Last synced: 16 Mar 2025

https://github.com/vaenow/chromeless-coursera-caption

Chromeless crawler coursera video's caption / subtitle

caption chromeless coursera crawler crx subtitle

Last synced: 31 Mar 2025

https://github.com/linjonh/videowebsidesparser

This Project is used to parse a video web side to remove ads.

crawler parser python

Last synced: 13 Jun 2025

https://github.com/cristiangreco/gcrawler

A simple (not concurrent) web crawler written in Java.

crawler java

Last synced: 30 Jul 2025

https://github.com/rebrowser/iheart-dataset

iHeart radio station database: 3,600+ stations with call letters, formats, markets, cume audience, stream URLs, and 185M+ daily airplay records. Updated daily.

airplay crawler data-collection data-science dataset datasets iheart music-data open-data radio radio-stations scraper web-scraping

Last synced: 03 May 2026

https://github.com/arman-aminian/divar-text-exploring

The first practice of Dr. Asgari's NLP lesson - Data Exploration

crawler natural-language-processing nlp preprocessing scrapy

Last synced: 15 Jun 2026

https://github.com/zhanziyuan/webdownloader

Download elements from the specified website.

crawler downloader image image-downloader python python-crawler web

Last synced: 15 Jun 2026

https://github.com/oleksandr-moik/spring-boot-web-crawler

Web Crawler app on Spring Boot. Getting categories and relevant news category.

crawler gradle java spring-boot

Last synced: 03 May 2026

https://github.com/yann-github/webcrawler-http

Command line application to crawl a website and generate a report of internal linking structure

crawler csv-format javascript jest node report tdd

Last synced: 03 May 2026

https://github.com/tetreum/price-crawler

Article price crawler

crawler nodejs

Last synced: 26 Apr 2026

https://github.com/taiizor/gocrawler

A high-performance web crawler with concurrent processing capabilities written in Go.

crawler csv go golang golang-application golang-library json storage url web

Last synced: 26 Apr 2026

https://github.com/bingxyz/btcethcrawler

telegram 比特幣、乙太幣廣播頻道

bash bash-script crawler telegram-bot

Last synced: 26 Apr 2026

https://github.com/qeqqe/cog

An MCP integerated intelligent RAG that gives relevent context to LLM's through crawled Docs

backend-api claude-desktop crawl4ai crawler fastapi mcp python rag sementic-chunking

Last synced: 04 May 2026

https://github.com/palpitate-xus/sge_data_insert

利用Github Actions实现自动获取sge数据并存入数据库

crawler mysql python

Last synced: 26 Apr 2026

https://github.com/tpeterw/summariser

summarizer for pdf and text based uploads

crawler hackathon nlp node nodejs python

Last synced: 15 Jun 2026

https://github.com/jamesjarvis/web-graph

Experiment with web scraping

colly crawler database golang web-graph

Last synced: 04 May 2026

https://github.com/zzzzer91/match_spider

某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:

crawler python

Last synced: 16 Jun 2026

https://github.com/kareemsasa3/arachne

A resilient, concurrent web scraper service built in Go, featuring a REST API, Redis-backed job queue, and circuit breaker for fault tolerance.

asynchronous circuit-breaker concurrency crawler docker docker-compose go golang job-queue rate-limiting redis rest-api web-scraper web-scraping

Last synced: 04 May 2026

https://github.com/bennettdams/vace-it-crawler

Python (Scrapy) crawler to access data of FACEIT.com

crawler python scrapy

Last synced: 03 Jun 2026

https://github.com/liu233w/ojhunt-lite

A lightweight async Python tool for querying Online Judge (OJ) statistics across multiple platforms. Track your accepted problems (AC) and total submissions from 29+ competitive programming platforms.

acm-icpc codechef-api codeforces-api crawler spoj-api

Last synced: 05 May 2026

https://github.com/zzzzer91/chinaxinge

chinaxinge 爬虫。

crawler python python3

Last synced: 17 Jun 2026

https://github.com/basemax/crawleryjc

This PHP crawler is designed to scrape news articles and categories from the YJC.ir news agency website. It provides a way to extract valuable data from the website for further analysis or any other purpose.

crawler crawler-php database database-news ir ir-yjc iran news news-database news-yjc php php-crawler yjc yjc-ir yjc-news

Last synced: 05 May 2026

https://github.com/hileix/jjxy-lib-search

图书馆书籍查询爬虫工具

crawler expressjs nodejs phantomjs request

Last synced: 05 May 2026

https://github.com/monumentality/ifiend

Check latest YouTube uploads without leaving the comfort of your terminal.

crawler headless-chrome terminal-based youtube yt-dlp

Last synced: 25 Apr 2026

https://github.com/lanesun/one-link

"One Link to rule them all."

crawler curl http svelte web-service

Last synced: 05 May 2026

https://github.com/fauzaanu/markdown-crawler

Python tool that crawls websites and neatly saves their text content into markdown files, providing a convenient way to archive the text content of the web locally

crawler llm markdown rag scraper

Last synced: 06 May 2026

https://github.com/tribecabrasil/tribeca-insights

Modular Python CLI for content extraction, term frequency analysis, and SEO reporting

analytics crawler django insights seo

Last synced: 06 May 2026

https://github.com/lsongdev/node-crawler

simple crawler

crawler node-crawler

Last synced: 06 May 2026

https://github.com/igapyon/selecrawler

Simple selenium based web crawler

chrome crawler java selenium web

Last synced: 06 May 2026

https://github.com/jnbdz/xtamia-crawler

(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux

crawler electron foundation foundation-css javascript scraper vuejs xtamia

Last synced: 06 May 2026