An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/sadewadee/foxhound

Go scraping framework with native Camoufox anti-detection. Dual-mode fetching (TLS stealth + browser), 60+ identity profiles, human behavior simulation, adaptive parsing, 12-layer middleware, 9 export formats. 741 tests.

anti-detection camoufox crawler golang playwright proxy-rotation scraping stealth tls-fingerprint web-scraping

Last synced: 05 May 2026

https://github.com/aliubo/pixiv-crawler

爬取pixiv网站的图片,支持多种爬取模式

crawler pixiv python

Last synced: 04 Apr 2026

https://github.com/coalee/hotword

Chatbot of crawling & plotting keywords in recent news.

beautifulsoup chatbot crawler dialogflow flask slackbot wordcloud

Last synced: 29 Apr 2026

https://github.com/schbenedikt/web-crawler

A simple web crawler using Python that stores the metadata of each web page in a database.

crawler database mariadb mysql python python-crawler web

Last synced: 14 Apr 2025

https://github.com/wangshouh/qzone_api

使用Python调用QQ空间公开接口获取信息

crawler python qzone requests

Last synced: 09 Jul 2025

https://github.com/sefinek/known-bots-ip-whitelist

A whitelist of trusted IP addresses used by legitimate crawlers and services such as Googlebot, Bingbot, AhrefsBot, UptimeRobot, Pingdom, Cloudflare, Bunny CDN, Stripe, Shodan, FacebookBot, TelegramBot, etc.

bot bots crawler firewall goodbot goodbots googlebot ip-address ip-addresses ipset safe safe-bots safety security whitelist whitelist-bot whitelists

Last synced: 21 Jun 2025

https://github.com/filipefilardi/crunchyroll_filters

Discover new animes filtering Crunchyroll database

anime anime-list crawler crunchyroll flask

Last synced: 30 May 2026

https://github.com/vaibhavpandeyvpz/cbse-scraper

This script scrapes information about schools affiliated with CBSE for a given state.

cbse crawler data schools scraper

Last synced: 12 Jul 2025

https://github.com/tryagi/firecrawl

Generated C# SDK based on official Firecrawl OpenAPI specification

ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk

Last synced: 12 Apr 2025

https://github.com/waynechang65/baha-crawler

baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.

bahamut crawler javascript nodejs scraper spider webcrawler

Last synced: 22 Apr 2025

https://github.com/yoonje/soongsil-notice-crawler

숭실대학교와 각 학과들의 공지사항을 크롤링하는 프로그램

crawler soongsil

Last synced: 19 Jun 2025

https://github.com/ruedigervoigt/salted

Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files

asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python

Last synced: 27 Oct 2025

https://github.com/zebbern/reconx

🕷️ | ReconX is a Live-Website Crawler made to gather critical information with an option to take a picture of each site crawled!

crawler hacking information-gathering information-retrieval information-security livedata opsec osint osint-tool pentest python python-crawler search-engine security security-tools website website-crawler website-scraper website-security

Last synced: 03 Jul 2025

https://github.com/maximiliancw/crawlio

Asynchronous web crawling and scraping with Python for minimalists

asyncio crawler fastapi framework picocss python scraper vuejs

Last synced: 06 May 2025

https://github.com/mmqnym/etherscan_tracker

Show how to tacker wallet on etherscan.io

crawler ethereum python

Last synced: 25 Dec 2025

https://github.com/igeligel/TeamFortressOutpostApi

:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.

bot bot-framework crawler steam steam-api steambot teamfortress2

Last synced: 05 May 2025

https://github.com/simoninithomas/news-crawler-parse-backend

This is a crawler made with Scrapy.py to crawl french news articles and send them in your Parse.com backend

crawler news parse scrapy

Last synced: 29 Oct 2025

https://github.com/guessi/youtube-search-crawler

YouTube Search Results Crawler

crawler flask youtube

Last synced: 11 Apr 2025

https://github.com/cospectrum/pubmed-scraper

Scrape PubMed without an API

asyncio crawler python requests scraper

Last synced: 16 Aug 2025

https://github.com/indatawetrust/reporter

Crawler queue creation tool for paging

crawler

Last synced: 05 May 2025

https://github.com/roccomuso/is-baidu

Verify that a request is from Baidu crawlers using DNS verification

baidu crawler dns ip js nodejs verification

Last synced: 19 May 2026

https://github.com/arthur3486/google-play-scraper-kotlin

Library for scraping of the application data from the Google Play Store.

crawler google-play google-play-store java kotlin kotlin-library scraper scraping

Last synced: 14 Jan 2026

https://github.com/aicore/app_info_extracter

This application would be used to extract information about apps from the internet

android appreview apps crawler googleplaystore

Last synced: 26 Oct 2025

https://github.com/vivekg13186/easy_web_crawler

Web crawler around puppeteer to crawler ajax/java script enabled pages.

crawler spider web

Last synced: 24 Oct 2025

https://github.com/caido-community/crawler

Crawler for Caido

caido crawler plugin

Last synced: 16 Feb 2026

https://github.com/ozansz/github-crawler

A basic utility for crawling users and e-mails of users

crawler github python python3

Last synced: 15 May 2026

https://github.com/spa5k/quick-scraper

An easy, lightweight scraper built using typescript for good developer experience.

crawler dx easy-to-use esbuild scraper typescript

Last synced: 18 Aug 2025

https://github.com/dnlzrgz/winzig

A tiny search engine for personal use.

async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3

Last synced: 06 Apr 2025

https://github.com/rafaelglikis/sinama

Web scraping library

crawler crawling scraper scraping

Last synced: 12 Jan 2026

https://github.com/YektaDev/Krawler

A configurable HTML Crawler written in Kotlin (JVM), powered by Coroutines, Kotlin Serialization (JSON), Ktor Client, Exposed, and SQLite.

crawl crawler crawlers crawling

Last synced: 21 Oct 2025

https://github.com/chenmozhijin/mediawikiextractor

一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.

crawler crawler-python crawling extractor json mediawiki python regex web-crawler

Last synced: 14 Feb 2026

https://github.com/xiaoyvyv/androidcrawlerengine

A dynamic crawler plug-in for the Android platform based on Dex dynamic loading, which can dynamically load and execute the dex plug-in package, and can realize real-time updates of crawler and other functions.

android apk class crawler dex dynamic execute java jsoup jvm kotlin module okhttp pak plugin reflection scrapy spider web webmagic

Last synced: 22 Jan 2026

https://github.com/joshuaquek/docusite-to-pdf

Provide a URL and this will generate multiple PDF documents of the whole site within the bounds of the URL path. This code repo is for educational purposes only.

crawler documentation-generator html2pdf pdf pdf-converter pdf-document pdf-generation scraper

Last synced: 28 Feb 2026

https://github.com/zhoudaxia233/unilogo

A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.

crawler python visualization

Last synced: 02 Apr 2025

https://github.com/zebbern/regex-crawler

Regex Web Crawler that searches on custom regexes meanwhile crawling each site to find the information your looking for!

bug-bounty bugbounty crawler information-gathering information-retrieval osint osint-tool pentest python regex regex-engine regex-match regex-pattern regex-tool toolkit tools website

Last synced: 14 Apr 2025

https://github.com/hrvadl/goweekly

Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel

article chatgpt crawler go golang openai-api telegram telegram-bot

Last synced: 14 Feb 2026

https://github.com/basemax/jadi-net-blog

This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.

blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp

Last synced: 13 Oct 2025

https://github.com/arthurc0102/ntub-bot

北商大教學評量機器人

bot crawler ntub

Last synced: 22 Mar 2025

https://github.com/glutexo/onigumo

Parallel web scraping framework

crawler

Last synced: 07 Oct 2025

https://github.com/zenrows/crawling-from-scratch

Repository for the Mastering Web Scraping in Python: Crawling from Scratch blogpost with the final code.

crawler crawling python python3 scraping

Last synced: 21 Apr 2026

https://github.com/roccomuso/is-duckduck

Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo

crawler duckduck duckduckbot duckduckgo ip js nodejs verify web

Last synced: 14 Sep 2025

https://github.com/avennn/daily-wallpaper

⏱️A cli tool for fetching Bing wallpapers regularly

bing crawler cron daemon nodejs wallpaper

Last synced: 12 Feb 2026

https://github.com/veasion/automation_testing

自动化测试框架(通过 js 脚本执行自动化测试)

automation crawler

Last synced: 29 Jun 2025

https://github.com/igaozp/jobwitcher

JobWitcher 招聘网站爬虫合集

crawler python3 redis scrapy spider

Last synced: 03 Nov 2025

https://github.com/hyeockjinkim/baekjoon-management

Management program of BoJ

crawler parsing python

Last synced: 20 Mar 2025

https://github.com/jmkim/stock-crawler

Universal Stock Crawler

crawler stock stock-market yahoo-finance

Last synced: 28 Jul 2025

https://github.com/scottstraughan/simple-python-url-crawler

Super simple Python3 website URL scraper/crawler. Multi-threaded.

crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple

Last synced: 29 Jul 2025

https://github.com/ernesto-jimenez/crawler

Easily crawl websites in Go.

crawler golang

Last synced: 14 Feb 2026

https://github.com/arshadkazmi42/scraplink

Scraplink library, for scraping links and images url from a webpage

crawler mongdb nodejs scraplink url web

Last synced: 20 Mar 2025

https://github.com/ericz99/go-crawler

Simple lightweight crawler, that will find all endpoints on any website.

crawler golang

Last synced: 07 Oct 2025

https://github.com/exp-codes/python-crawler-template

Python 爬虫开发模板

crawler programming template

Last synced: 27 Feb 2026

https://github.com/ceylonai/apps-article-reader

📚 A powerful desktop app that extracts and analyzes web content using LLaMA AI. Features real-time processing, keyword extraction, and smart summarization. Built with Python + Tkinter.

ai crawler gpt ollama openai

Last synced: 23 Sep 2025

https://github.com/v-braun/hero-scrape

Find the hero (main) image of an URL

crawler fastimage hero hero-image opengraph webscraping

Last synced: 05 Mar 2025

https://github.com/n3wjack/sitecrawler

A command-line based web crawler

crawler tool webcrawler webcrawling webdevelopment

Last synced: 07 Mar 2026

https://github.com/ribeirogab/technology-insights

Program with the aim of using the data from Stack Overflow Insights 2020 and generating informative graphs.

crawler python scraping typescript

Last synced: 15 May 2025

https://github.com/misaka10843/yamibo-downloader

一款可以批量下载百合会论坛的漫画下载器(支持CBZ保存)

comic crawler downloader python yamibo

Last synced: 17 Jan 2026

https://github.com/danielfillol/crawler_lawsuitsesaj

Crawler for lawsuit data in ESAJ systems

brasil crawler direito esaj law lawsuit tjsp

Last synced: 10 Mar 2026

https://github.com/tonnytg/webreq

Light way to make web request GET and POST easily using standard library http. This is a helpful module for your days.

crawler prd tools webrequest

Last synced: 08 Feb 2026

https://github.com/ktont/curlas

a nodejs spider tool

chrome-extension crawler spider

Last synced: 22 Sep 2025

https://github.com/astef/artlebedev-dj-crawler

Listen online and download music from artlebedev.ru/dj

artlebedev crawler dj music

Last synced: 27 Mar 2026

https://github.com/testica/a3hrgo-sdk

a3HRgo sdk to automatize your reports

a3hrgo crawler javascript puppeteer

Last synced: 25 Oct 2025

https://github.com/knourian/freelancer.com-category-scrapping

Scrapping Categories from Freelancer.com Using scrapy with number of project for each category

crawler freelancer python3 scrapy web-crawler

Last synced: 12 Sep 2025

https://github.com/marcbperez/python-webcrawler

Crawls HTML pages for prices and other pieces of data.

crawler docker gradle python

Last synced: 13 Apr 2026

https://github.com/gill-singh-a/crawler

A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found

crawler multithreading osint python python3 requests scraper

Last synced: 07 Jul 2025

https://github.com/cls1991/gank.io

抓取干货集中营图片资源 (http://gank.io)

crawler curl gankio picture

Last synced: 29 Apr 2025

https://github.com/mohammadrezaamani/squirrel

Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.

crawler iran python

Last synced: 09 Feb 2026

https://github.com/erikjiang/book_crawler

:lizard: book_crawler

crawler douban golang

Last synced: 15 Dec 2025

https://github.com/a-x-/scian

Simple cian stat

cian crawler static-site

Last synced: 15 May 2026

https://github.com/vinouno/BilibiliDanmuCrawler

一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目

crawler python

Last synced: 16 Mar 2025

https://github.com/sanmak/queue-web-crawler

This application is developed to crawl a website with queue that determines no of allowed concurrent connections and find all possible hyperlinks present within it and save it to CSV file.

async chai crawler csv hyperlinks mocha nodejs queue scrapper web

Last synced: 12 Mar 2026

https://github.com/igeligel/teamfortressoutpostapi

:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.

bot bot-framework crawler steam steam-api steambot teamfortress2

Last synced: 14 Feb 2026

https://github.com/mouday/httpserver

用于爬虫请求头测试的简单服务器,使用Python + Flask

crawler flask python spider

Last synced: 06 Jul 2025

https://github.com/johnroyer/crawler-php

simple PHP web crawler

crawler php

Last synced: 16 Jun 2025

https://github.com/bitebait/curry

🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.

api brasil crawler currency-exchange-rates go golang paraguay webcrawler

Last synced: 15 Jan 2026

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 08 Apr 2025

https://github.com/agmmnn/nis-scraper

Scrapy script to scrape nisanyansozluk.com

cli crawler python scraper

Last synced: 08 Apr 2025

https://github.com/cmagnobarbosa/crawler_tiktok

Open tool to get TikTok Data - Crawler Tiktok

crawler scraper

Last synced: 17 Jan 2026

https://github.com/mazzasaverio/structured-data-jobs

A data pipeline that scrapes job opportunities from company websites and uses OpenAI to structure the data. Initially focused on tech roles, but easily adaptable for any job type.

crawler docker llm logfire neon openai python uv

Last synced: 14 May 2026

https://github.com/chalkpe/dimibob-py

한국디지털미디어고등학교 급식 데이터 크롤러

beautifulsoup crawler dimigo python3

Last synced: 14 Jan 2026

https://github.com/mouday/freeipproxy

通过抓取免费代理ip维护一个有效的proxy代理池

crawler proxy python spider

Last synced: 16 Feb 2026

https://github.com/sauerbraten/chef

Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.

crawler extinfo go sauerbraten spy stalker

Last synced: 27 Dec 2025

https://github.com/mikirasora/osuplayedbeatmapscrawler

A crawler that fetch and download osu!beatmaps which you had played

crawler osu

Last synced: 26 Mar 2026

https://github.com/marzzzello/appstore_crawler

(mirror) download the IDs and metadata of all apps in the apple appstore

apple appstore crawler metadata scrapy

Last synced: 16 Feb 2026

https://github.com/xanke/nscan

NodeJs 网页采集器

crawler javascript nodejs

Last synced: 28 Apr 2026

https://github.com/rimiti/ping-urls

🏓 Ping URLs by batch.

cache crawler ping prerender prerendering seo

Last synced: 03 Jul 2025

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 03 Feb 2026

https://github.com/Anakeyn/website-contextual-links

Récupération des liens contextuels d'un site Web avec R.

crawler gephi r

Last synced: 17 Jul 2025

https://github.com/markoczy/crawler

A Web Crawler based on Go and Chromedp

cli crawler golang

Last synced: 17 Jan 2026

https://github.com/lockblock-dev/crawlarr

Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.

crawler golang

Last synced: 18 Mar 2025

https://github.com/nakabonne/staticcollector

Application to analyze static files of competing sites

crawler go golang

Last synced: 19 May 2026