An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/airtoxin/stackable-crawler

middleware based lightweight crawler framework

crawler javascript lightweight

Last synced: 13 Apr 2025

https://github.com/liinen/vocalist-backend

vloom backend implementation in cloud service, with crawling dataset from karaoke website

connection-pool crawler express mysql ncloud-server pagination python3 selenium

Last synced: 13 Apr 2026

https://github.com/jemaf/stackoverflow-jobs

A wrapper for crawling data at Stack Overflow Jobs portal

crawler jobs python stack-overflow

Last synced: 14 Jan 2026

https://github.com/thiiagoms/dict-crawler

Simple crawler on UOL dictionary

beautifulsoup4 crawler dic python pythonic

Last synced: 26 May 2026

https://github.com/wangyihang/acw-sc-v2-py

Python requests.HTTPAdapter for `acw_sc__v2`

acw-sc-v2 crawler waf

Last synced: 18 Jun 2026

https://github.com/nazanin1369/searchengine

Implementing a search engine using Java, AngularJS and Elastic search

angularjs crawler elasticsearch java search-engine

Last synced: 12 Apr 2026

https://github.com/tvrcgo/collect

数据采集

crawler scraper

Last synced: 06 Apr 2025

https://github.com/telanflow/scrago

A micro crawler framework. achieved by GOLANG.

crawler go micro-framework spider

Last synced: 25 Jun 2025

https://github.com/joelkoen/wls

Easily crawl multiple sitemaps and list URLs

crawler sitemap url

Last synced: 12 Apr 2025

https://github.com/der3318/zijfhchat-crawler

手遊「紫禁繁花」-聊天室爬蟲、即時查詢

crawler dashboard line-notify

Last synced: 04 Oct 2025

https://github.com/yukito0209/is6941-ml-social-media

IS6941 Machine Learning & Social Media Analytics 课程小组项目代码仓库,探索机器学习在社交媒体数据分析中的应用。

bert city-university-of-hong-kong crawler data-collection llama machine-learning python sentiment-analysis social-media

Last synced: 01 Apr 2025

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 10 Mar 2025

https://github.com/qianbinbin/moebooru-crawler

Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .

crawler moebooru shell

Last synced: 22 Oct 2025

https://github.com/sebi75/lightweight-sitemapper

A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies

crawler node-js sitemap

Last synced: 21 Jan 2026

https://github.com/dean9703111/shopee_find_mac

用最快的速度找到便宜符合自己要求規格的mac

argparse crawler mac pip python python2 xlsxwriter

Last synced: 14 Apr 2026

https://github.com/pedrohs1771/hyenzy-x-anime-scraper

A powerful all-in-one media scraper for Anime and Games with 4K Upscale (MPV) and Discord RPC.

anime-scrapper anime4k crawler discord-rpc game-downloader mpv-player playwright python upscale

Last synced: 30 May 2026

https://github.com/vmandic/tris-web-crawler

Tris is a simple NodeJS web crawler tool to help you collect links from visited links of a website's domain.

crawler data-tools nodejs scraping seo-tools web-scraper

Last synced: 20 May 2026

https://github.com/nakabonne/netsurfer

netsurfer is a very lightweight scraping framework

crawler go library scraping

Last synced: 01 Apr 2025

https://github.com/nueip/curl

NUEiP Curl Lib

crawler php

Last synced: 11 Jun 2025

https://github.com/codeforequity-at/botium-crawler

Botium Crawler - Like a Website Crawler, just for Conversation Flows

botium chatbots crawler

Last synced: 23 Apr 2025

https://github.com/anyparser/anyparserjs

Anyparser Typescript SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.

anyparser artificial-intelligence cache-augmented-generation crawler etl-pipeline graph-rag knowledgebase langchain microsoft-office microsoft-word ms-office n8n-nodes ocr pdf-extraction rag retrieval-augmented-generation text-extraction web-crawler

Last synced: 17 Feb 2026

https://github.com/andreoliwa/scrapy-tegenaria

🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢

crawler flask postgresql python python3 scrapy

Last synced: 13 Apr 2025

https://github.com/zabuzard/songcrawler

Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.

command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler

Last synced: 09 Jun 2026

https://github.com/jxeng/site-info-crawler

A tool for batch crawling website's title, description, favicon.

crawler favicon title

Last synced: 30 May 2026

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 07 Sep 2025

https://github.com/zekrotja/r34-crawler

A simple CLI tool to fetch and download images from rule34.xxx

crawler go rest-api rule34 worker-pool xml

Last synced: 06 Mar 2026

https://github.com/travorlzh/temperature-analyzer

Python crawler that helps fetch temperature of Beijing, China

crawler homework python variance

Last synced: 25 Aug 2025

https://github.com/bimmr/site-crawler

Chromium Extension: Crawl a website

chrome-extension crawler downloader sitemap

Last synced: 12 Mar 2026

https://github.com/leo9960/waimai_crawler

抓取外卖平台商户信息

crawler

Last synced: 23 Apr 2025

https://github.com/nbdy/prntscrngrb

prnt.sc / lightshot crawler, nudity detection and text extraction to a sqlite database

crawler nudity-detection prntsc text-extraction

Last synced: 04 Oct 2025

https://github.com/first-coding/django-and-web

This is a django and Web front - and back -end separation project.

crawler django python

Last synced: 16 Feb 2026

https://github.com/exca-dk/node-util

Usefull utils for analyzing p2p crypto networks.

crawler ethereum mev p2p scanner

Last synced: 16 May 2026

https://github.com/nemmusu/free-vpn-downloader

This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.

automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn

Last synced: 07 Feb 2026

https://github.com/qiubits2007/xml-sitemap

Multi-domain XML sitemap generator with support for robots.txt, meta tags, email logging & search engine pinging

crawler generator gzip multi-domain php8 robots-txt seo seotools sitemap-builder sitemap-generator sitemap-xml

Last synced: 25 Feb 2026

https://github.com/lucky845/animetimeline

使用Python脚本爬取动漫信息时间表,并保存为Markdown文件。

anime crawler python-script

Last synced: 09 Jul 2025

https://github.com/supadata-ai/py

Official Python SDK for the Supadata API.

ai api crawler llm markdown scraping sdk transcript web-scraper youtube

Last synced: 22 Mar 2025

https://github.com/xiantang/mini_scrapy

模仿scrapy的轻量级爬虫框架

crawler python3 requets scrapy

Last synced: 27 Mar 2025

https://github.com/jjlibra/bake-mediacrawler

NanmiCoder‘s self-media data crawling software

crawler learning

Last synced: 06 May 2025

https://github.com/norconex/committer-neo4j

Implementation of Norconex Committer for Neo4j.

crawler neo4j neo4j-committer norconex-committer

Last synced: 19 Jan 2026

https://github.com/afsh7n/crawly-automation

Crawly Automation is a lightweight, modular, and extensible web crawling framework built on top of Puppeteer. Whether you need to scrape data, automate browser interactions, manage CAPTCHAs, or handle advanced data extraction, Crawly Automation simplifies the process.

automation crawler nodejs puppeteer webscraping

Last synced: 25 Feb 2026

https://github.com/spraakbanken/svt-crawler

Programme for crawling SVT's API for news articles and converting the data to XML.

corpus crawler

Last synced: 07 Mar 2026

https://github.com/microlinkhq/ua

A simple redis primitives to incr() and top() user agents

crawler redis user-agent user-agent-parser

Last synced: 18 Mar 2026

https://github.com/stangirard/crawlycolly

Website Crawler to extract all urls

colly crawler discover golang sitemap

Last synced: 04 Mar 2025

https://github.com/sc0vu/jspachong

Js crawler library.

crawler pachong

Last synced: 06 Feb 2026

https://github.com/denrydu/baiduimagecrawler

自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!

baidu crawler dynamic python3

Last synced: 04 Nov 2025

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 06 Feb 2026

https://github.com/antoinegagne/treewalker

A web crawler in Erlang that respects `robots.txt`.

crawler erlang webcrawler

Last synced: 11 Feb 2026

https://github.com/byt3n33dl3/thc-katanax

The Next generation of Samurai blades that Crawling and Spidering Framework.

cli crawler domain framework golang hacking http pentesting subdomain subfinder tls

Last synced: 16 Apr 2025

https://github.com/superreal/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 14 May 2026

https://github.com/darealfreak/figure-tracker

application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops

crawler figure-tracker monitoring

Last synced: 30 Mar 2025

https://github.com/genfuture/cryptocurrency-scraper

Cryptocurrency Data Crawler 🚀 Updates CoinData Every 12 hours. High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 28 Jan 2026

https://github.com/polakosz/smf-scraper

You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:

crawler csharp forum machines php scraper simple simplemachines smf

Last synced: 30 Apr 2026

https://github.com/rebrowser/stubhub-dataset

StubHub secondary ticket market data: event listings with section, row, quantity, delivery type, ticket class, and 500+ venues across US, Canada, and Europe. Updated daily.

concert-tickets crawler data-collection data-science dataset event-tickets live-events open-data resale-tickets scraper secondary-market sports-tickets stubhub tickets web-scraping

Last synced: 03 May 2026

https://github.com/leveled-up/memedl

Memedl is a very simple tool to download the latest images from a specific sub reddit.

crawler download extract images javascript meme memes node reddit regex rip

Last synced: 30 Apr 2026

https://github.com/kapitanluffy/sunny-crawler

That moment when I tried learning things about "Big Data" and "Inverted Indexes"

big-data crawler inverted-index php search

Last synced: 30 Apr 2026

https://github.com/restuwahyu13/node-scraper-content

example node scraper all content programming using puppeteer

crawler nodejs puppeter scrapper

Last synced: 14 May 2026

https://github.com/ysh329/stock-newspaper-crawler

[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).

corpus crawled-data crawler database stock-newspaper-crawler

Last synced: 28 Apr 2026

https://github.com/YGGverse/pulsarss

RSS Aggregator for Gemini Protocol

aggregator cli crawler daemon feed gemini gemini-protocol gemtext parser rss rust

Last synced: 15 Jun 2026

https://github.com/yuminn-k/crawling-tabelog

Crawling store information from tabelog

crawler python3

Last synced: 08 Jun 2026

https://github.com/eduardozepeda/go-web-crawler

A concurrent web crawler written in go that looks for exposed .git and .env uris.

crawler environment-variables git go pentesting security-audit

Last synced: 16 Apr 2026

https://github.com/elky84/lol-crawler

Notification from LOL friend game start & end.

crawler csharp docker dotnet web-crawler

Last synced: 07 May 2026

https://github.com/maraf/staticsitecrawler

A simple util for crawling links from root URL and saving HTML documents.

crawler static-site-generator

Last synced: 21 Apr 2026

https://github.com/shunk031/lineblogscraper

Scraper for LINE Blog in Scrapy

crawler lineblog scraper scrapy

Last synced: 17 Jun 2026

https://github.com/arshamroshannejad/scrapify

Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.

403-bypass arkose cloudflare crawler golang http-client scraper

Last synced: 18 Apr 2026

https://github.com/fbielejec/nagger

nag reviewers of PRs

bot crawler github slack

Last synced: 04 May 2026

https://github.com/mashukui/xhs_pic_tool

用python开发的小红书图片采集软件,支持下载小红书笔记无水印图片、采集笔记数据、评论数据等。小红书爬虫|小红书无水印图片|小红书无水印下载|小红书评论爬虫|小红书采集工具|小红书评论采集|小红书采集软件|小红书爬取数据|xiaohongshu|xhs|XHS

crawler gui gui-application python-spider spider xhs xhs-downloader xhs-spider xiaohongshu xiaohongshu-downloader

Last synced: 04 Apr 2026

https://github.com/buaadreamer/buaastar

北航星球网站 北航2021年夏季学期Python英文课大作业

crawler css flask html javascript python

Last synced: 28 Apr 2026

https://github.com/coverified/spider

A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)

akka crawler graphql hacktoberfest microservice spider

Last synced: 29 Apr 2026

https://github.com/zanmato/shouting-robin

SEO Crawler focused on E-commerce

crawler developer-tools seo seo-tools

Last synced: 21 Jun 2026

https://github.com/manojahi/is-there-any-song-reference-in-article

It will tell if there are any songs references in article from a website.

crawler lyrics-search python webscraping

Last synced: 28 Mar 2026

https://github.com/gitzhiqing/netprogcode

网络编程实验代码~

crawler network socket

Last synced: 24 Apr 2026

https://github.com/gnujoow/crawl-repo

crawling github's repositories basic info

crawler github github-api python3

Last synced: 03 May 2026

https://github.com/nava45/simplempcrawler

Simple Multiprocessing Crawler in python

crawler multiprocessing python

Last synced: 22 Jun 2026

https://github.com/viclafouch/pe-crawler

📌 An automated system that serves data extracted from the Google Help Center

crawler javascript nodejs postgresql sequelize

Last synced: 17 Apr 2026

https://github.com/kahsolt/allchan

An image crawler for xChan(4chan/8ch/...) image board.

4chan 4chan-downloader 8chan crawler image-crawler

Last synced: 23 Jun 2026

https://github.com/anjackson/scrapy-url-frontier

A Scrapy module for URL Frontier integration

crawler frontier scrapy spider

Last synced: 23 Jun 2026

https://github.com/natshah/natshah-crawler

Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.

crawler database filter natshah-crawler

Last synced: 29 Apr 2026

https://github.com/marabesi/social-crawler

Easy way to find emails from social networks

crawler emails php social-crawler social-network

Last synced: 02 Mar 2026

https://github.com/devkoriel/teslalarm-kr

🚀 Teslalarm KR Real-time, AI-powered Tesla news & price alerts tailored for the Korean market. Stay updated on price changes, new model releases, and more – delivered directly to your Telegram. 🔔 Join us and help revolutionize Tesla news in Korea!

crawler telegram-bot tesla

Last synced: 04 Apr 2026

https://github.com/mauricelambert/cr0wl3r

Full and discreet web crawler for pentest, red-teaming or hacking discovery using simple HTTP request or selemium.

crawler discovery links pentest scan scraper security selenium uri url web web-links

Last synced: 11 Jun 2026

https://github.com/eduardosbcabral/desafio-tecnico-mp

Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.

crawler csharp dotnet

Last synced: 08 May 2026

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 19 Apr 2026

https://github.com/cyberdolfi/serverrawler

ServerRawler is a Minecraft Server Crawler, written in Rust

crawler minecraft ratatui-rs rust seeker servercrawler serverseeker

Last synced: 04 Mar 2026

https://github.com/ewertoncodes/mind-crawler

A simple api written in Rails to extract quotations from the Quotes to Scrape site.

crawler ruby ruby-on-rails

Last synced: 14 May 2026

https://github.com/fi1a/crawler

PHP crawler

crawler php

Last synced: 29 Apr 2026

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 19 Apr 2026

https://github.com/chenty2333/tiktok-youtube_commentscraper

This tool allows you to collect public comments from TikTok and YouTube videos, either via direct video URLs or keyword-based search. It's useful for data analysis, opinion mining, and building datasets for machine learning tasks.一个轻量级的 TikTok 与 YouTube 评论爬虫工具,支持通过视频链接或关键词批量获取评论数据,适用于情感分析、文本挖掘、机器学习等数据收集任务。

comment crawler nlp scraper sentiment-analysis tiktok youtube

Last synced: 20 Apr 2026

https://github.com/luukalindgren/jobposts-utu

Web site for a database that holds job post data of IT jobs.

crawler docker fastapi mariadb react virtual-machine

Last synced: 29 Apr 2026

https://github.com/zukahai/formosa-views

View Formosa employee profile, salary, bonus year

bonus-year crawler css formosa html javascript nodejs python salary views

Last synced: 29 Apr 2026