An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 17 Sep 2025

https://github.com/willi-dev/dtcapp

dtcapp : distributed twitter crawler.

crawler distributed-systems hazelcast java twitter twitter-api

Last synced: 18 Sep 2025

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 15 Jun 2026

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Jul 2025

https://github.com/machu-gwu/crawlib-project

tool set for crawler project.

crawler framework mongodb python scrapy

Last synced: 01 Jul 2026

https://github.com/davidkhala/ml

classic AI index

crawler

Last synced: 17 Jan 2026

https://github.com/panakour/pkscraper

Extract structured data from the web

crawler crawling scraper scraping scraping-websites webcrawler

Last synced: 19 Feb 2026

https://github.com/fengdongfa1995/video-dl

download video from online video websites.

bilibili crawler pornhub python3 video

Last synced: 09 Apr 2026

https://github.com/leandrols/scliper

CLI Tool to make simple web scraping.

cli-scripts crawler golang scraping

Last synced: 01 Nov 2025

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 22 Sep 2025

https://github.com/lucaaszsx/spyder

A powerful schema-based web scraping library for Node.js built for fast, structured, and reliable data extraction.

cheerio crawler data dom dom-manipulation html json json-ld parser scraper web xml

Last synced: 11 Jun 2026

https://github.com/tubone24/askfm-qa-crawler

Crawl Ask.fm QA lists and create corpus for ML.

askfm chromedriver corpus-builder crawler selenium

Last synced: 14 May 2026

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 10 May 2026

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 08 Nov 2025

https://github.com/filsuin/linkedin-crawler

A Python tool for automating job searches on LinkedIn based on user-defined keywords.

crawler crawler-python linkedin offer

Last synced: 16 Jun 2025

https://github.com/raphaelm22/crawling

Set of crawlers to find out something on the internet and whether it succeeds, it will send a notification.

caesb crawler growth-suplements gsuplementos

Last synced: 06 Mar 2026

https://github.com/programming-with-love/skyeyesystem

天眼系统,每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。

crawler mysql redis skyeye skyeyewall springboot

Last synced: 25 Sep 2025

https://github.com/f-ca7/movie-cat

A website displaying movies

crawler golang website

Last synced: 19 Apr 2026

https://github.com/arihantbansal/cybersec-python

Cybersec/CTF practice problems solved in Python

crawler cryptography ctf cybersecurity sockets webscraping

Last synced: 02 Aug 2025

https://github.com/dineshsprabu/concurrent-web-crawler

Flexible and concurrent web crawler implemented in 'go'

concurrent-web-crawler crawler go-crawler spider web-crawler

Last synced: 12 Jan 2026

https://github.com/kbychkov/simplecrawler-app

The GUI for Simplecrawler

crawler simplecrawler spider

Last synced: 12 Jun 2025

https://github.com/udaykiran2017/seo-reports

📊 Generate and analyze SEO reports effortlessly to enhance your website's visibility and performance across search engines.

audit broken-links cli crawler extraction google-lighthouse hreflang-checker hreflang-matrix puppeteer scan-website searchengineoptimization seo seo-macroscope seo-manager seo-meta seo-optimization web-scraping webmaster

Last synced: 16 May 2026

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 08 Aug 2025

https://github.com/khadkarajesh/aptoide

Aptoide app crawler using beautifulsoup

beautifulsoup4 crawler flask python3 web-application

Last synced: 19 May 2026

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 03 Oct 2025

https://github.com/thiiagoms/car-stealth

REST API to all cars that were stolen

api cars crawler student

Last synced: 16 Jun 2025

https://github.com/mindfiredigital/deepscanbot

It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.

bot crawl crawler go golang google webcrawler

Last synced: 10 Aug 2025

https://github.com/tungct/golangcrawler

Crawler goroutine Golang

crawler go

Last synced: 07 Jun 2026

https://github.com/weaming/simple-crawler

my simple crawler

crawler

Last synced: 13 Jun 2025

https://github.com/buren/site_health

Crawl a site and check various health indicators

crawler rubygem site-health

Last synced: 21 Mar 2025

https://github.com/igorbrizack/web-scraper

Aplicação de raspagem de dados HTML, construída em python.

crawler pytest python3 scraper

Last synced: 08 May 2026

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 05 Oct 2025

https://github.com/win7user10/laraue.crawling

The set of tools for fast writing crawlers on the .NET

crawler csharp csharp-crawler parser

Last synced: 17 Aug 2025

https://github.com/salman0ansari/sitefetch

Fetch a site and extract its readable content as Markdown (to be used with AI models).

ai chatgpt crawler fetcher golang scraping

Last synced: 19 Aug 2025

https://github.com/roswelly/solana-transaction-crawler

crawl & parse solana transaction

crawler parser rust solana transaction

Last synced: 15 May 2026

https://github.com/maxiroellplenty/gs-robot

NodeJs tool to scrap gelbe-seiten

axios cheerio crawler gelbe-seiten nodejs scraper yargs

Last synced: 18 May 2026

https://github.com/basemax/kashan-university-phone-directory

This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.

crawler crawlers database html-scraper json kashan kashan-university scraper scraper-api scraper-html scrapers university university-of-kashan

Last synced: 18 May 2026

https://github.com/cameronnewman/cli.crawler

Simple cli web crawler

cli crawler golang

Last synced: 14 Jan 2026

https://github.com/curegit/nominium

個人間取引サイトの新着商品をメールなどで通知するクローラーシステム

c2c chromium crawler ecommerce firefox selenium shopping webdriver

Last synced: 12 Mar 2025

https://github.com/morungos/github-issue-crawler

Github crawler for public repositories, issues, and comments

crawler github issues

Last synced: 30 Apr 2026

https://github.com/cryptoc1/earl

Earl is looking for URLs in your area.

crawler middleware nuget webscraping

Last synced: 18 May 2026

https://github.com/zhs007/lottery-crawler

基于jarvis-task的爬虫,主要用来爬取lottery数据。

crawler jarvis-task

Last synced: 30 Oct 2025

https://github.com/altescy/mincrawler

A minimal web crawler.

configurable crawler python scraping

Last synced: 21 Mar 2025

https://github.com/teal33t/base_crawler

Simple scaffold for selenium based crawler bots

crawler scaffold-template selenium selenium-python

Last synced: 18 May 2026

https://github.com/opda0887/bahamut-crawler-to-gmail

發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.

crawler crawler-python

Last synced: 21 Mar 2025

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 17 May 2026

https://github.com/srx-2000/swaiter

a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序

crawler selenium selenium-python

Last synced: 18 May 2026

https://github.com/richecr/pyhltv

Repository to extract information from the HLTV website.

crawler csgo hacktoberfest hltv hltv-api python3

Last synced: 21 May 2026

https://github.com/rogerluo410/gcrawler

Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.

crawler crawling google ruby

Last synced: 22 Jun 2026

https://github.com/flavien-hugs/scrapy-test

Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.

crawler python scraping scrapy

Last synced: 29 Mar 2025

https://github.com/jovijovi/ether-crawler

A transaction crawler for the Ethereum ecosystem.

blockchain crawler ether ethereum transaction

Last synced: 08 May 2026

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 24 Jun 2025

https://github.com/mahmoudgalalz/pupt

A starter for web crawling using Puppeteer

crawler nodejs scraping

Last synced: 17 May 2026

https://github.com/okwilkins/web-crawler

This program will crawl through entire domains, exporting every link it can find into a txt file.

crawler crawling files html htmlparser python python3 reader scraper threading threads web writer

Last synced: 14 Mar 2025

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 25 Apr 2026

https://github.com/jiamingla/mvdis_i18n

機車駕照預約考試多語友善版 Non-official

crawler jquery koa koajs nodejs supertest

Last synced: 04 Jan 2026

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 04 Jan 2026

https://github.com/droiddevgeeks/nodelearning

This is node learning demo. It has covered all basics of node.

crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign

Last synced: 05 Apr 2026

https://github.com/rogerchappel/crawldeck

Local-first crawl job deck for fixture-backed queues, health, and crawler adapter seams.

agent-tools cli crawler local-first queue typescript

Last synced: 26 May 2026

https://github.com/bing-su/arcalive-crawler-python

아카라이브 크롤러

crawler python

Last synced: 21 Jun 2026

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 03 Apr 2025

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 04 May 2026

https://github.com/adamfisher/scrapyrt.client

A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.

crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider

Last synced: 21 Mar 2025

https://github.com/yowenter/career-roadmap

Oh, how I hate this living death which has swallowed all my teens, if I am cursed with any, will be worn away!

career crawler findjob job-crawler roadmap search-engine

Last synced: 10 Apr 2025

https://github.com/basemax/css-properties

The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.

crawler css css-properties css-property css3

Last synced: 11 Jun 2026

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 15 Feb 2026

https://github.com/dhsagaryt/multisearch

Search efficiently across different platforms with ease. Type your query and choose from multiple search engines, streamlining your experience.

browser crawler internet search search-algorithm search-engine searchbar searchengine webcrawler

Last synced: 14 Feb 2026

https://github.com/raphaelalmeidamartins/python-tech-news

Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course

crawler crawler-python data-science pytest python

Last synced: 22 May 2026

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 09 Mar 2026

https://github.com/jongwony/boardgame_finder

나무위키의 보드게임 카테고리를 모두 크롤링해서 특정 필터를 걸기 위한 프로젝트입니다.

asyncio crawler namuwiki python38

Last synced: 27 Feb 2026

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 26 May 2026

https://github.com/iamgideonidoko/web-crawler-with-php

Sample implementation of web crawler in PHP

crawler php webcrawler

Last synced: 21 Mar 2025

https://github.com/buttermiilk/sentakusha

simple (and badly written express.js) crawler for the washing machine game.

api crawler imagegeneration maimai

Last synced: 07 Apr 2025

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 27 Feb 2026

https://github.com/mc256/node-static-webpage-crawler

download entire website with its directory structure.

cache-server crawler nodejs static-site

Last synced: 16 Apr 2026

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 09 Feb 2026

https://github.com/citiususc/polypus

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

analytics bigdata crawler scraper sentiment-analysis twitter

Last synced: 09 Feb 2026

https://github.com/xcrypt0r/xcrawler

✂️ A crawling example for maplestory with various languages using multi-threading

crawler crawling multithreading parsing regexp

Last synced: 14 Jun 2025

https://github.com/mazzasaverio/lean-jobs-crawler

(Let's build) A lean, high-performance web crawler specializing in job posting extraction directly from company websites. Uses LLM for intelligent URL discovery and data extraction.

crawler docker llm logfire neon openai python uv

Last synced: 15 Mar 2025

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 05 Jan 2026