An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/bkdev98/ebooks-crawler

Ebooks crawler for personal purpose using ReactJS.

crawler material-ui nodejs reactjs

Last synced: 12 Apr 2026

https://github.com/jorgeparavicini/medalytik-python

Python crawlers for a job mediation firm

crawler python scrapy

Last synced: 07 Jul 2025

https://github.com/mlibre/clean-web-scraper

A Node.js web scraper that extracts clean, readable content from websites - perfect for AI/LLM training datasets. Features smart crawling, Mozilla Readability integration, and organized content storage 🤖

ai artificial-intelligence clean crawler data-preprocessing dataset fine-tuning llm recursive-crawling scraper training

Last synced: 17 Mar 2025

https://github.com/dhchenx/quick-crawler

A toolkit for quickly performing crawler functions

crawler crawler-python

Last synced: 27 Oct 2025

https://github.com/dimo414/pycrawl

Simple Python web crawler, primarily designed for inspecting and diagnosing your own website

crawler python

Last synced: 28 Oct 2025

https://github.com/amirespahbodi/url_crawler

Async Web Crawler for Website Title and Favicon

crawler fastapi pydantic python3 sqlalchemy

Last synced: 15 Apr 2026

https://github.com/citiususc/polypus

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

analytics bigdata crawler scraper sentiment-analysis twitter

Last synced: 09 Feb 2026

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 09 Feb 2026

https://github.com/mc256/node-static-webpage-crawler

download entire website with its directory structure.

cache-server crawler nodejs static-site

Last synced: 16 Apr 2026

https://github.com/cameronnewman/cli.crawler

Simple cli web crawler

cli crawler golang

Last synced: 14 Jan 2026

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 27 Feb 2026

https://github.com/jongwony/boardgame_finder

나무위키의 보드게임 카테고리를 모두 크롤링해서 특정 필터를 걸기 위한 프로젝트입니다.

asyncio crawler namuwiki python38

Last synced: 27 Feb 2026

https://github.com/khdxsohee/email-miner-pro

EMail Miner Pro is designed specifically for professionals scraping data from search engines like Google, ensuring that generic emails (e.g., Gmail, Yahoo) are correctly linked to their business websites found on the page.

chrome crawler crawling email email-extractor extension-chrome lead-generation miner scraper

Last synced: 03 Feb 2026

https://github.com/sonhm3029/crawl-data-bot

This project making a base crawl data from web bot, include text data and images data

crawler google medical vietnamese

Last synced: 08 Mar 2026

https://github.com/pvital/cra-cra

Another web crawler

crawler python

Last synced: 16 Mar 2025

https://github.com/phatpham9/scraper.fun

Building, using & sharing HTML scraper are way funnier!

crawler html-scraper scraper

Last synced: 24 Mar 2025

https://github.com/linjonh/videowebsidesparser

This Project is used to parse a video web side to remove ads.

crawler parser python

Last synced: 13 Jun 2025

https://github.com/joaooliveirapro/trawlergo

TrawlerGo 🐛 is a basic HTTP crawler written in Go, designed to efficiently discover all URLs within a specified domain while capturing related HTTP request information.

crawler go golang http

Last synced: 09 Jun 2026

https://github.com/danielemoraschi/sitemap-common

Simple PHP Sitemap generator and crawler library.

crawler php php-library php-sitemap-generator sitemap

Last synced: 11 Mar 2026

https://github.com/blarc/windsurf-crawler

A simple crawler that collects windsurf boards offers from different sites.

crawler windsurf

Last synced: 10 Sep 2025

https://github.com/raspi/scrapy-amp

Crawler for Amiga Music Preservation (AMP) site

amiga crawler mod module music python s3m scrapy spider tracker

Last synced: 11 Jul 2025

https://github.com/jjpaulo2/crawler-financeiro

Módulo em Python que extrai dados públicos de planos de previdência do portal da SUSEP.

crawler docker ocr python selenium tesseract

Last synced: 11 Jul 2025

https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen

Fetch Keskisuomalainen kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-almamedia

Fetch Almamedia kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/yuchenq/comp90055-project

This is the lastest version of my project belong to Comp90055.

couchdb crawler data-visualization python3 textblob tweepy

Last synced: 16 Jul 2025

https://github.com/jonasrenault/pubchem-api-crawler

Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.

chemistry crawler molecular-formula pubchem python

Last synced: 15 May 2026

https://github.com/jackfsuia/chats-crawler

Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。

crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser

Last synced: 09 Jul 2025

https://github.com/basemax/crawler-news-currency-gold-coins

PHP Crawler to get Persian news related to currency coin and gold.

crawler crawler-php crawler-testing currency currency-exchange-rates gold php php-crawler

Last synced: 05 Jul 2025

https://github.com/pyohei/rirakkuma-crawller

Crawler for my hobby.🐻

crawler python rirakkuma

Last synced: 29 Nov 2025

https://github.com/der3318/daily-pixiv

Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations

crawler line-notify pixiv workflow

Last synced: 03 Mar 2025

https://github.com/peterbencze/silene

Silene is an open source web crawler framework built upon Pyppeteer.

crawler framework pypp python scraper webcrawler

Last synced: 12 Jan 2026

https://github.com/shentengtu/cht-yp-crawler

Simple Crawler of www.iyp.com.tw.

crawler node-js nodejs yellow-pages yellowpages

Last synced: 09 May 2026

https://github.com/balintpethe/laravel-universal-scraper

Universal Scraper for Laravel

crawler laravel scraper web-scraper

Last synced: 13 Jan 2026

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 08 Apr 2025

https://github.com/seanghay/wpget

⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API

crawler wordpress wp-json

Last synced: 08 Feb 2026

https://github.com/tylpk1216/favorite-youtube-to-video

Download your favorite youtube video in PHP

crawler php tool youtube

Last synced: 16 May 2026

https://github.com/lolyratul025/web-email-bundler

A lightweight Python web crawler that extracts valid email addresses from websites. Features domain-bound crawling, false-positive filtering (@1x.png etc.), proxy support, and polite delays.

crawler cybersecurity-tools email-extractor osint-tool python3 web-scraping

Last synced: 22 May 2026

https://github.com/daviddavo/blogspot-crawler

Crawler for blogspot and blogger with beautifulsoup

crawler hacktoberfest python

Last synced: 19 Apr 2026

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 28 May 2026

https://github.com/massongit/ibaraki-univ-circle-crawler

Crawls official circles in Ibaraki University from university's website

crawler python

Last synced: 25 Mar 2025

https://github.com/w3labkr/ipynb-scraper

A collection of frequently used Jupiter notebook code.

crawler ipynb jupyter jupyter-notebook python scrapper

Last synced: 19 Apr 2026

https://github.com/hvtuananh/twitter_crawler

Daemon to call and get tweets from Twitter Public Stream API

crawler java streaming-api tweets twitter twitter-crawler

Last synced: 11 Mar 2025

https://github.com/cls1991/gank.io-go

A simple crawler for fetching pictures from http://gank.io, implemented in golang.

crawler gankio goquery pictures

Last synced: 27 Feb 2025

https://github.com/patrik-fredon/python_wallpaper_crawler

Wallpaper Crawler is an advanced web scraping tool designed to crawl websites and download high-resolution wallpapers.

crawler crawling-python image image-recognition images python scraping-websites scrapper selenium-python uv

Last synced: 14 Sep 2025

https://github.com/timchen10001/crawler-711-taiwan

Crawler for Python to scrapping updated informations of 711

711 crawler python python3 taiwan

Last synced: 27 Mar 2025

https://github.com/atasoglu/websense

A modular AI-powered web scraper for data pipelines.

ai automation crawler data-extraction llm parsing scraper structured-output web-scraping

Last synced: 31 Jan 2026

https://github.com/zhou-chaoxian/ax-spider

A simple, powerful, and fast asynchronous Python crawler framework.

asyncio ax-spider crawler httpx python scrapy

Last synced: 18 Mar 2025

https://github.com/rowyio/llm-web-crawler

Web Scraper and Crawler for LLM Apps and AI Workflows with NoCode / LowCode. Plug and play with your own logic and customize it flexibly and scalably on BuildShip.

ai automation crawler llm lowcode nocode scraper web web-crawler workflow

Last synced: 15 Jul 2025

https://github.com/ericc-ch/crawldown

Crawl websites and convert their pages into clean, readable Markdown content using Mozilla's Readability and Turndown.

crawler markdown scraper

Last synced: 05 Jul 2025

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 07 Apr 2026

https://github.com/raspi/scrapy-transcend

Crawler for transcend (us.transcend-info.com)

crawler hardware memory scrapy spider

Last synced: 16 Jul 2025

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 02 Aug 2025

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 14 Jun 2025

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 23 Feb 2025

https://github.com/gustavooferreira/wcrawler

Simple Web Crawler CLI tool with "minimal" dependencies

cli crawler golang graph html links web

Last synced: 31 Jan 2026

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 31 Mar 2025

https://github.com/tsaohucn/crawler_fb_user_group

This is crawler use selenium for facebook user groups

crawler facebook-user-groups rails ruby

Last synced: 16 May 2026

https://github.com/tetreum/puppeteer-for-crawling

Daily use crawling methods for puppeteer

crawler crawling puppeteer

Last synced: 12 Apr 2026

https://github.com/jlenon7/sef_automation

📑 Crawler that automatically enrol in open vacancies in SEF website.

athenna crawler esm nodejs playwright portugal residence sef typescript

Last synced: 03 Mar 2026

https://github.com/intina47/ee_error

implementation of a web crawler using c++

cpp crawler curl gumbo libcurl stanford-nlp web

Last synced: 31 Jan 2026

https://github.com/kodemartin/webcrawler

A simple webcrawler

crawler rust

Last synced: 18 Jul 2025

https://github.com/luminovrym/crawler-tools-js

Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web

crawler crawler-js data js web-scraping

Last synced: 08 Sep 2025

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 11 Apr 2026

https://github.com/dominikrys/web-scraper

🎬 IMDB Web Scraper in Go

crawler go mongodb

Last synced: 14 Apr 2026

https://github.com/rsheremeta/web-crawler

A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output

crawler go golang web-crawler webcrawler

Last synced: 12 Jun 2026

https://github.com/fulcrum6378/twitter_profile_exporter

A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.

crawler exporter profile social-media sqlite twitter twitter-api

Last synced: 17 May 2026

https://github.com/dubniczky/bad-robot

This is a python crawler that disregards robots.txt rules and downloads disallowed resources

crawler osint-python osint-tool python robots-txt

Last synced: 31 Mar 2025

https://github.com/kasperomari/simplecrawlerapi

A simple RESTful API that takes a URL and returns all the links in a specific depth.

crawler flask-api flask-restful

Last synced: 02 Apr 2025

https://github.com/dubniczky/webmap

Website mapping crawler implemented in python

crawler mapping mapping-tools package python scraping security

Last synced: 31 Mar 2025

https://github.com/sedrubal/webcrawler

Crawl sites and search for security issues.

crawler script security website-auditing

Last synced: 17 Mar 2025

https://github.com/basemax/okala-store-ids

A PHP script designed to systematically query the Okala API and extract a comprehensive list of valid store IDs. By automating the retrieval of store details, it enables users to efficiently compile and maintain an up-to-date dataset of active Okala stores for analysis, integration, or further processing.

crawler curl id ids ir iran okala okala-store okala-store-id php store store-okala

Last synced: 10 Jun 2025

https://github.com/Mahdijamebozorg/CryptoFundamentalAnalyzer

An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.

crawler crypto cryptocurrency data-mining datamining information-retrieval llm python

Last synced: 25 Sep 2025

https://github.com/earelin/jwraith

A Java clone of the Wraith website comparison tool.

crawler screenshots screenshots-comparison selenium webtest

Last synced: 17 May 2026

https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp

Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.

anglesharp crawler minhaentrada

Last synced: 19 Jul 2025

https://github.com/raspi/scrapy-crucial

Web crawler for Crucial (crucial.com)

crawler hardware memory scrapy spider

Last synced: 02 Jul 2025

https://github.com/jpleorx/tagblender

A simple java API to retrieve hashtags from https://www.tagblender.net/

api crawler hashtags java jsoup parser

Last synced: 20 Mar 2025

https://github.com/edumucelli/rubybikes

A set of Bike Sharing System parsers in Ruby

bike-sharing crawler ruby

Last synced: 12 Apr 2025

https://github.com/leonardopinho/instagramfeed

Image list based on a tag for the Instagram feed.

crawler instagram python

Last synced: 28 Mar 2025

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 13 Jun 2026

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 31 Jan 2026

https://github.com/tisfeng/bing-dict

A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.

bing-dictionary command-line crawler nodejs

Last synced: 13 May 2026

https://github.com/sajjadanwar0/booking.com-scraping

Scraping booking.com using Selenium and Beautiful Soup

crawler data python scraping selenium

Last synced: 18 Oct 2025

https://github.com/waived/pastebin-ripper

Scrape all pastes from pastebin page + sub-pages

crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper

Last synced: 24 Jun 2025

https://github.com/mnoalett/cscrawler

BSc degree thesis - crawler for www.couchsurfing.org

bsc-thesis couchsurfing crawler data-analysis database python

Last synced: 02 May 2026

https://github.com/suconghou/sitemap

a simple sitemap generator and page crawler

crawler html-parser nim-lang scraper sitemap spiders

Last synced: 15 May 2026

https://github.com/moe131/webcrawler

Python web crawler designed to scrape websites

crawler crawling-python python python-crawler scraping simhash web-crawler

Last synced: 09 Apr 2025

https://github.com/dappros/site_crawler

Site crawler used in Ethora platform as an option to import your specific business data into your AI agent chat bot.

crawler data-ingestion embedding-vectors embeddings ethora llm rag retrieval-augmented-generation retrieval-based-chatbots retrieval-chatbot semantic-search site-crawler vectorstore web-scraping website-indexing

Last synced: 20 Jan 2026

https://github.com/imrany/spindle

An open-source, lightweight web crawler and scraper. It can discover links on the web (crawler) and extract structured data from webpages (scraper).

crawler go golang scraper

Last synced: 24 Sep 2025

https://github.com/surister/scrupy

Python library to create web Crawlers which aims to be powerful yet simple.

crawler crawling-framework crawling-python http library python scraping

Last synced: 15 May 2026

https://github.com/andresayac/cuevana3

Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.

crawler cuevana3 php scraper

Last synced: 05 Apr 2025

https://github.com/ismoreirakt/spyder

The web is changing. Spyder sees it.

alerts automation crawler monitor

Last synced: 01 Mar 2025

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 11 Mar 2025

https://github.com/ryoii/hook

A declarative Java crawler framework

crawler declarative java java-crawler-framework jdk11

Last synced: 18 Mar 2025