An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/somehowchris/swisslos-cralwer

(WIP) Crawler to access the current and history numbers of swisslos

crawler euromillions lotto rust swisslos

Last synced: 22 Mar 2025

https://github.com/geoffreybauduin/website-checker

Performs useful checks against a website, such as 404 errors reporting, structured data validation...

crawler seo structured-data web-spider website

Last synced: 19 Apr 2025

https://github.com/soulyma/web_crawler

A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.

beautifulsoup4 crawler csv data json python structured-data

Last synced: 15 May 2026

https://github.com/elky84/stock-crawler

Naver Stock Crawler & Mock Invest

asp-net asp-net-core crawler csharp dotnet

Last synced: 18 Apr 2026

https://github.com/amirespahbodi/url_crawler

Async Web Crawler for Website Title and Favicon

crawler fastapi pydantic python3 sqlalchemy

Last synced: 15 Apr 2026

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 08 Apr 2025

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 15 May 2026

https://github.com/tanja-4732/od-get

A Rust tool for recursively crawling & downloading data from open directories

cli crawler open-directory open-directory-downloader rust

Last synced: 26 May 2026

https://github.com/bingxyz/blackcat

使用telegram bot查詢黑貓物流

crawler nodejs telegram-bot

Last synced: 21 May 2026

https://github.com/sonhm3029/crawl-data-bot

This project making a base crawl data from web bot, include text data and images data

crawler google medical vietnamese

Last synced: 08 Mar 2026

https://github.com/cameronnewman/cli.crawler

Simple cli web crawler

cli crawler golang

Last synced: 14 Jan 2026

https://github.com/opda0887/bahamut-crawler-to-gmail

發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.

crawler crawler-python

Last synced: 21 Mar 2025

https://github.com/srx-2000/swaiter

a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序

crawler selenium selenium-python

Last synced: 18 May 2026

https://github.com/0xpr03/clantool

CF Management & Data Analysis Tool, crawler backend in rust

backend-server crawler data-analysis rust

Last synced: 05 Feb 2026

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 15 Feb 2026

https://github.com/f-ca7/movie-cat

A website displaying movies

crawler golang website

Last synced: 19 Apr 2026

https://github.com/zituocn/ziva

A golang crawler framework

crawler go golang

Last synced: 18 Jan 2026

https://github.com/win7user10/laraue.crawling

The set of tools for fast writing crawlers on the .NET

crawler csharp csharp-crawler parser

Last synced: 17 Aug 2025

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 09 Feb 2026

https://github.com/tubone24/askfm-qa-crawler

Crawl Ask.fm QA lists and create corpus for ML.

askfm chromedriver corpus-builder crawler selenium

Last synced: 14 May 2026

https://github.com/mc256/node-static-webpage-crawler

download entire website with its directory structure.

cache-server crawler nodejs static-site

Last synced: 16 Apr 2026

https://github.com/jorgeparavicini/medalytik-python

Python crawlers for a job mediation firm

crawler python scrapy

Last synced: 07 Jul 2025

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 27 Feb 2026

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 05 Jan 2026

https://github.com/mazzasaverio/lean-jobs-crawler

(Let's build) A lean, high-performance web crawler specializing in job posting extraction directly from company websites. Uses LLM for intelligent URL discovery and data extraction.

crawler docker llm logfire neon openai python uv

Last synced: 15 Mar 2025

https://github.com/xcrypt0r/xcrawler

✂️ A crawling example for maplestory with various languages using multi-threading

crawler crawling multithreading parsing regexp

Last synced: 14 Jun 2025

https://github.com/khdxsohee/email-miner-pro

EMail Miner Pro is designed specifically for professionals scraping data from search engines like Google, ensuring that generic emails (e.g., Gmail, Yahoo) are correctly linked to their business websites found on the page.

chrome crawler crawling email email-extractor extension-chrome lead-generation miner scraper

Last synced: 03 Feb 2026

https://github.com/javokhirbek1999/tez-spider

Distributed music scraper built in Go

concurrent crawler distributed-systems music-scraper

Last synced: 17 Jan 2026

https://github.com/injectrl/xhspicextractor

小红书原图提取工具

crawler dotnet7 minimalapi okteto xiaohongshu

Last synced: 20 Jun 2026

https://github.com/richecr/pyhltv

Repository to extract information from the HLTV website.

crawler csgo hacktoberfest hltv hltv-api python3

Last synced: 21 May 2026

https://github.com/buttermiilk/sentakusha

simple (and badly written express.js) crawler for the washing machine game.

api crawler imagegeneration maimai

Last synced: 07 Apr 2025

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 26 May 2026

https://github.com/nelcifranmagalhaes/web_crawler

A web crawler for all Naruto characters

anime beautifulsoup characters crawler naruto python

Last synced: 14 Jul 2025

https://github.com/mkfsn/chronos

A light cron-like container service - create cron job easily.

crawler cron cronjob golang

Last synced: 20 Jul 2025

https://github.com/programming-with-love/skyeyesystem

天眼系统,每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。

crawler mysql redis skyeye skyeyewall springboot

Last synced: 25 Sep 2025

https://github.com/danoctavian/proxy-master

manage a set of http proxies

crawler http-proxy node-proxy-server

Last synced: 27 May 2026

https://github.com/vietdoo/sg-property-hub

SG Property Hub is a comprehensive platform for managing and analyzing property data.

airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping

Last synced: 08 Apr 2026

https://github.com/wondervictor/spiderman

2017 Software Course Project

crawler distribute-crawler zhihu-crawler

Last synced: 21 Apr 2026

https://github.com/naveenaidu/google-crawler

Google Crawler - Curates the search results

beautifulsoup crawler scraper

Last synced: 27 May 2026

https://github.com/fiandev/otaku-crawler

simple way to scrape and collect anime list from otakudesu

anime bun crawler nodejs scraper

Last synced: 08 May 2026

https://github.com/hedon954/go-crawler

A crawler system implemented in Go.

crawler go

Last synced: 15 Mar 2025

https://github.com/basemax/css-properties

The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.

crawler css css-properties css-property css3

Last synced: 11 Jun 2026

https://github.com/flavien-hugs/scrapy-test

Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.

crawler python scraping scrapy

Last synced: 29 Mar 2025

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 09 Apr 2025

https://github.com/pymarcus/webscrapingiii

Um crawler que pega produtos em uma lista e percorre as páginas do mercado livre selecionando preços, o nome e o link para acessá-los.

crawler mercadolivre python webscraping

Last synced: 15 Sep 2025

https://github.com/anyparser/anyparser_core

Anyparser Python SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.

cache-augmented-generation crawler crewai etl-framework etl-pipeline knowledge-graph knowledgebase langchain langgraph llamaindex ms-office n8n ocr openai pdf python rag retrieval-augmented-generation search-engine typescript

Last synced: 05 Oct 2025

https://github.com/rogerchappel/crawldeck

Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.

agent-tools cli crawler local-first queue typescript

Last synced: 26 May 2026

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 15 May 2026

https://github.com/citiususc/polypus

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

analytics bigdata crawler scraper sentiment-analysis twitter

Last synced: 09 Feb 2026

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 24 Jun 2025

https://github.com/hwywl/mzitu-crawler

爬取mzitu网站的妹子,注意营养

crawler mzitu python

Last synced: 29 Apr 2026

https://github.com/simonrichardson/crwlr

Crawl all the things!

crawler meshuggah

Last synced: 24 Mar 2025

https://github.com/roccomuso/is-apple

Verify that a request is from Apple crawlers using DNS verification steps

apple bot crawler dns ip js nodejs

Last synced: 21 May 2026

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 25 Apr 2026

https://github.com/mahmoudgalalz/pupt

A starter for web crawling using Puppeteer

crawler nodejs scraping

Last synced: 17 May 2026

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 08 Nov 2025

https://github.com/khilnani/spidey.py

Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.

cli crawler python scaper web-spider

Last synced: 25 Mar 2025

https://github.com/droiddevgeeks/nodelearning

This is node learning demo. It has covered all basics of node.

crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign

Last synced: 05 Apr 2026

https://github.com/bing-su/arcalive-crawler-python

아카라이브 크롤러

crawler python

Last synced: 21 Jun 2026

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 03 Apr 2025

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 05 Oct 2025

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 07 May 2026

https://github.com/pnguyen215/instagram-crawler

Instagram Crawler is a Python script to download posts from a specified Instagram account.

crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler

Last synced: 12 Jun 2026

https://github.com/taurusolson/jobscraper

Je cherche un poste de développeur en France

crawler

Last synced: 23 Jun 2025

https://github.com/adamfisher/scrapyrt.client

A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.

crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider

Last synced: 21 Mar 2025

https://github.com/andrew-ld/wowroms-downloader

download all roms from wowroms

aiohttp asyncio crawler python3

Last synced: 17 Jan 2026

https://github.com/pythoript/pgn-scraper

PGN Scraper is a command-line application written in Go, designed to scrape Portable Game Notation (PGN) files and related formats from the internet.

7zip cbv chess chessbase cli command-line-tool crawler downloader go golang open-source pgn pgn-extract scid scraper web-crawler web-scraper zip

Last synced: 16 Mar 2025

https://github.com/mmqnym/pyppeteer-use-case

Show how to do web crawl via pyppeteer

crawl crawler pyppeteer python

Last synced: 24 Dec 2025

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 09 Mar 2026

https://github.com/noarche/darknoisy

Same as my Noisy but on TOR network. Logs links. Crawls onion sites.

crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks

Last synced: 08 Sep 2025

https://github.com/marcinrek/sauron

Basic page crawler written in Node.js

crawler json node-js nodejs requests

Last synced: 28 Apr 2025

https://github.com/rebrowser/autotrader-dataset

AutoTrader car listings database: new, used & CPO vehicles with make, model, trim, mileage, MSRP, KBB fair price range, deal rating, body style, fuel type, and seller state. Updated daily.

automotive autotrader car-listings car-prices crawler data-collection data-science dataset kbb open-data scraper used-cars vehicle-data web-scraping

Last synced: 03 May 2026

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 29 Jun 2026

https://github.com/iamgideonidoko/web-crawler-with-php

Sample implementation of web crawler in PHP

crawler php webcrawler

Last synced: 21 Mar 2025

https://github.com/shimech/pokemon-db-maker

Webクローリングでポケモン図鑑を生成

beautifulsoup crawler docker pokemon scraper

Last synced: 25 Jan 2026

https://github.com/tasooshi/digslash

A site mapping and enumeration tool for Web applications analysis

crawler mapping sitemap spider

Last synced: 08 Apr 2026

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 22 Sep 2025

https://github.com/leandrols/scliper

CLI Tool to make simple web scraping.

cli-scripts crawler golang scraping

Last synced: 01 Nov 2025

https://github.com/songjiayang/china_repos

github repo 爬虫

china crawler statistics

Last synced: 18 Jul 2025

https://github.com/fengdongfa1995/video-dl

download video from online video websites.

bilibili crawler pornhub python3 video

Last synced: 09 Apr 2026

https://github.com/ryanking13/bellorin

Multi-threaded Social Media Crawler 🔍

crawler instagram social-media

Last synced: 29 Jun 2025