An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/wahengchang/node-dcard-scraper

it is an example of implementing cheerio scraper of extracting images in dcard

cheerio crawler dcard example javascript nodejs npm scraper tutorial

Last synced: 11 Apr 2025

https://github.com/DiscovAI/DiscovAI-crawl

🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.

ai api crawler embedding vector-database web-scraping

Last synced: 11 Sep 2025

https://github.com/twiny/wbot

A simple & efficient web crawler.

big-data crawler golang scraper seo spider

Last synced: 27 Aug 2025

https://github.com/szczyglis-dev/php-ultra-small-proxy

[PHP] Lightweight proxy with full support for sessions, cookies, POST/FORM submissions, and URL rewriting. The proxy offers two methods of URL rewriting: XML and Regex. It also includes features such as HTTP Auth, caching, and more.

cookies crawler crawler-php css http-client http-proxy networking proxy proxy-server webbrowser website www

Last synced: 05 Oct 2025

https://github.com/sunsetmkt/bilibili-video-reply-crawler

Python爬虫获取Bilibili视频/专栏评论

bilibili crawler github-actions python python3 spider

Last synced: 11 Apr 2025

https://github.com/DavideViolante/socialblade-com-api

Unofficial APIs for socialblade.com website.

crawler scraper scraping social social-media socialblade

Last synced: 30 Jun 2025

https://github.com/vignif/crawler-google-scholar

This bot crawls and downloads statistics and pictures from google scholar's researchers.

crawler downloading-statistics google-scholar indexes statistics

Last synced: 07 Apr 2025

https://github.com/pourmand1376/PersianCrawler

Open source crawler for Persian websites.

crawler machine-learning news python scrapy tasnim text-classification

Last synced: 09 Jul 2025

https://github.com/lixi5338619/lxparse

用于解析列表页链接和提取详细页内容的库

crawler htmlparse python

Last synced: 27 Oct 2025

https://github.com/nothing12321/proxy-grabber

Python-based Massive Proxy Grabber. This bot grabs proxies from public websites so you can use them.

bot checker crawler grabber javascript parser proxies proxies-scraper proxy proxy-checker proxy-list proxy-parser proxy-scraper proxy-scrapper proxy-tool proxygrabber python socks socks4 socks5

Last synced: 15 Apr 2025

https://github.com/shaoxiongdu/skyeye

一个基于SpringBoot的全网热点爬虫项目,原始热搜数据会入库,分词统计会存入Redis。方便之后的数据分析。

crawler crawlers mysql redis spring spring-boot

Last synced: 31 Jul 2025

https://github.com/knovour/json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

crawler javascript jquery json web-crawler

Last synced: 06 Oct 2025

https://github.com/bitscoper/bitscoper_cyberkit

A Flutter App: Bluetooth LE Scanner, IPv4 Subnet Scanner, mDNS Scanner, UPnP Scanner, Route Tracer, TCP Port Scanner, Pinger, File Hash Calculator, String Hash Calculator, CVSS Calculator, Base Encoder, Morse Code Translator, QR Code Generator, OGP Data Extractor, Series URI Crawler, DNS Record Retriever, WHOIS Retriever, and Wi-Fi Details Viewer

android calculator crawler cybersecurity dart decoder docker encoder extractor flutter github-action ios mac retriever scanner tracer translator web windows

Last synced: 03 Apr 2026

https://github.com/Knovour/json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

crawler javascript jquery json web-crawler

Last synced: 10 May 2025

https://github.com/victormartinez/shub_cli

A CLI for dealing with the features of ScrapingHub

cli crawler scrapinghub scrapinghub-api scrapy shub-cli spider spiders

Last synced: 08 Feb 2026

https://github.com/arshadkazmi42/github-scanner-local

Locally scan all the repositories of a github organization

bounty bug bug-bounty crawler github local no-api scanner

Last synced: 12 Aug 2025

https://github.com/ariya/penjabarberita

Extract the article list from its raw news HTML

articles cheerio crawler headlines html indexer indonesia news scraper spider

Last synced: 30 Apr 2025

https://github.com/rsoury/serverless-web-crawler

Serverless Web Crawler that executes for an indefinite amount of time. Perfect for Crawling Jobs that last longer than a minute and only need to be executed once or twice a month.

boilerplate crawler fargate serverless serverless-framework template

Last synced: 23 Apr 2025

https://github.com/shadawck/recon-archy

Linkedin Tools (and maybe later other source) to reconstruct a company hierarchy from scraping relations and jobs title

automation company-data crawler cybersecurity geckodriver golang linkedin organisational-analysis osint osinttool reconnaissance scraper selenium

Last synced: 13 Apr 2025

https://github.com/davideviolante/socialblade-com-api

Unofficial APIs for socialblade.com website.

crawler scraper scraping social social-media socialblade

Last synced: 07 May 2025

https://github.com/achannarasappa/locust

Distributed web data discovery and collection framework built for serverless

aws-lambda crawler locust scraping serverless

Last synced: 13 May 2025

https://github.com/risyasin/arachnod

High performance crawler for Nodejs

cheerio crawler javascript nodejs redis scraper spider

Last synced: 05 Apr 2025

https://github.com/gambolputty/newscorpus

A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.

corpus crawler news newsarticles scraper

Last synced: 16 Jan 2026

https://github.com/wuseman/wmirror

wmirror allows you to download any website from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer.

crawler mirror website wget

Last synced: 10 Apr 2025

https://github.com/pceuropa/youtube-crawler

Youtube crawler & scraper based on scrapy. Written in Python3.

crawler csv mariadb python3 scraper scrapy sqlalchemy youtube

Last synced: 04 May 2025

https://github.com/cable8mm/water-melon

Water Melon is simple melon.com api sdk for php

composer crawler kpop laravel melon package php

Last synced: 09 Apr 2025

https://github.com/twtrubiks/eynycrawlermega

eyny 電影 Mega and Google 連結爬蟲 use python

crawler eyny mega python

Last synced: 15 Apr 2025

https://github.com/fooock/robots.txt

:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API

antlr4 api crawler crawler-engine docker docker-compose gradle java kotlin makefile postgresql redis redis-stream redis-streams robots-parser robots-txt spiders spring-boot

Last synced: 14 Feb 2026

https://github.com/valmisson/ytubes

Search for videos, playlists, channels, movies. live and musics on youtube without api key.

channel crawler live movie nodejs playlist scraper search typescript videos youtube youtube-api youtube-music youtube-search ytube

Last synced: 28 Oct 2025

https://github.com/kirillplatonov/proxy_manager

Ruby proxy manager. Gem for easy usage proxy in parser/web bots.

crawler parser proxy ruby

Last synced: 24 Apr 2025

https://github.com/ze3kr/wheres-my-offer

University Admission Portal Checker

crawler offer university university-admission

Last synced: 03 Oct 2025

https://github.com/selmi-karim/img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

buffer crawler crawling downloader image-downloader image-downloading nodejs phantomjs webpage

Last synced: 12 Jul 2025

https://github.com/omarhashem123/venom

Tool designed for fast crawl and extract endpoints

crawler python python3 spider

Last synced: 12 Jul 2025

https://github.com/tn3w/flask-humanify

A strong bot protection system for Flask with many features: rate limiting, special rules for users, web crawler detection, and automatic bot detection.

bot-protection captcha crawler ddos flask python rate-limiting robot

Last synced: 01 Jul 2025

https://github.com/toannd96/crawler_web_js

Dùng scrapy-splash kết hợp lua script để crawl các trang web sử dụng Javascript (websosanh)

crawler javascript lua-script scrapy scrapy-splash splash

Last synced: 13 May 2025

https://github.com/Selbi182/SpotifyDiscoveryBot

A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!

bot crawler java music spotify spring-boot springboot sqlite

Last synced: 17 Mar 2025

https://github.com/jsrei/javascript-window-listener-library

javascript逆向开发基础组件,监听window的变化

crawler js-library js-reverse reverse-engineering web-security-research

Last synced: 19 Apr 2025

https://github.com/agenty/scrapingai

Build web scraping agents using AI to auto-extract the data from websites, capture screenshot, generate pdf from URL and web crawling with Agenty

crawler crawling datascraping extract-data scraping webscraper webscraping

Last synced: 12 Apr 2025

https://github.com/wux1an/fake-useragent

Provide random user agent

crawler random spider ua user-agent useragent

Last synced: 11 Sep 2025

https://github.com/kasthack-labs/kasthack.osp

Генератор сырых дампов пользователей VK.

crawler crawling data-mining kasthack programmable-web vk vk-api vkapi vkontakte

Last synced: 29 Sep 2025

https://github.com/ruichongliu/Crawler_pubg.op.gg

This is a web crawler for pubg.op.gg, written by Ruichong Liu. 绝地求生游戏数据抓取

beautifulsoup4 crawler pubg python3 scrape selenium

Last synced: 25 Mar 2025

https://github.com/hoangsonww/ai-gov-content-curator

💡An end-to-end solution for aggregating, summarizing, and displaying news articles using an AI-powered backend, an automated CRON crawler, and a responsive Next.js frontend. It integrates technologies like Express.js, MongoDB, Puppeteer, and GenAI/LLMs to deliver up-to-date, curated content to government staff and other users.

artificial-intelligence axios cheerio crawler cron cronjob docker express expressjs google-generative-ai mongodb mongoose nextjs nodejs puppeteer react shadcn-ui tailwindcss typescript vercel

Last synced: 09 Apr 2025

https://github.com/betta-cyber/netease_music_api

netease cloud music api for python

crawler data-analysis netease-cloud-music

Last synced: 30 Jul 2025

https://github.com/danhje/dead-link-crawler

An efficient, asynchronous crawler that identifies broken links on a given domain.

async broken-links crawler dead-links python python3

Last synced: 23 Jun 2025

https://github.com/gajus/headless-crawler

A crawler implemented using a headless browser (Chrome).

chrome crawler headless puppeteer spider

Last synced: 15 Apr 2025

https://github.com/shavit/crawlero

Distributed web crawlers. Fault tolerance, user-agent randomizer, RabbitMQ, Tor, PostgreSQL.

crawler marketing-automation marketing-tools pbn proxy rabbitmq tor

Last synced: 15 Jul 2025

https://github.com/ikergarcia1996/questionclustering

Clasificador de preguntas escrito en python 3 que fue implementado en el siguiente vídeo: https://youtu.be/qnlW1m6lPoY

clustering crawler deep-learning inteligencia-artificial machine-learning natural-language-processing nlp pln sentiment-analysis techonology unsupervised-machine-learning word-embeddings

Last synced: 05 Oct 2025

https://github.com/douglasdcm/caqui

Run synchronous and asynchronous commands in WebDrivers

appium asynchronous crawler python scraper synchronous webdriver winappdriver winium

Last synced: 01 Apr 2026

https://github.com/clasense4/scrapy-bhinneka-crawler

Scraping bhinneka.com, just for fun

crawler python scrapy

Last synced: 17 Dec 2025

https://github.com/refraction-ray/wos-statistics

The crawler for data on web of science, especially focus on the analysis of citation data

aiohttp citation crawler webofscience

Last synced: 14 Oct 2025

https://github.com/src-d/rovers

Rovers is a service to retrieve repository URLs from multiple repository hosting providers.

bitbucket cgit crawler github

Last synced: 05 May 2025

https://github.com/ravern/gollum

Robots.txt parser and fetcher for Elixir

crawler elixir robots-parser robots-txt

Last synced: 11 Dec 2025

https://github.com/burnzz/scrapy-twitter

Web scraper based on Scrapy to fetch tweets from a list of user accounts

bot crawler scraping scrapy twitter

Last synced: 11 Sep 2025

https://github.com/isolateob/exiainvasion

一个从 blablalink 获取Nikke数据并生成练度表的开源浏览器插件。A Chrome-extension that obtains Nikke character data from blablalink and generates progress tracker.

chrome-extension crawler javascript material-ui nikke-goddess-of-victory python react vite

Last synced: 09 Apr 2026

https://github.com/twtrubiks/crawler_click_tutorial

click tutorial ( crawler ) use python

click command-line-tool crawler python tutorial

Last synced: 07 Oct 2025

https://github.com/matheuscas/pynfce

Busca e extrai dados de uma NFCe dada sua URL de acesso.

crawler nfce python3

Last synced: 17 Jun 2025

https://github.com/fedebotu/neurips2022-openreviewdata

Crawl & Visualize NeurIPS 2022 Data from OpenReview

crawler dataset neurips neurips-2022 openreview peer-review review scraper

Last synced: 09 Apr 2025

https://github.com/aurelg/linkbak

linkbak is a web page archiver : it reads a list of links and dumps the corresponding pages in HTML and PDF.

archive backup crawler html pdf python3

Last synced: 04 Apr 2025

https://github.com/gabrielguarisa/brdata

Brazilian financial market data sources

brasil crawler data finance

Last synced: 12 Apr 2025

https://github.com/dev-chenxing/jjwxc-crawler

基于Scrapy开发的晋江爬虫,根据书号下载小说非V章节,生成可编辑的Word文档 | A simple tool to scrape and download non-V chapters of any novel from jjwxc.net in .docx format, built with Python and Scrapy

chinese cli crawler docx download jjwxc open-source python scraping scrapy terminal word

Last synced: 09 Apr 2025

https://github.com/cybercongress/crawler

A toolchain for bringing web2 to web3

cosmos-sdk crawler cyber cyberd ipfs web3 wiki

Last synced: 15 Dec 2025

https://github.com/chinmayrane16/scraping-amazon-for-mobile-details-with-scrapy

Scraping Amazon website using Proxies for extracting Mobile details

amazon-scraper crawler googlebot json proxy pycharm pypiwin32 scrapy user-agents

Last synced: 18 Mar 2025

https://github.com/zamhown/limit-up-stock-crawler

📈 沪深股市涨停板数据爬虫

crawler python python3 stock

Last synced: 25 Aug 2025

https://github.com/qieguo2016/doffy

a web auto run lib base on chrome headless

casper chrome-headless crawler nightmare uitest

Last synced: 13 Jul 2025

https://github.com/a3r0id/httpscan

Scan a host for open HTTP ports and gain information about the services present.

crawler hacking hacking-tool http low-level penetration-testing pentest pentesting portscan portscanner scan scanner scanner-web scraper security service-discovery

Last synced: 06 Apr 2025

https://github.com/maxgio92/krawler

A crawler for kernel releases distributed by the major Linux distributions.

crawler kernel linux scraping

Last synced: 22 Mar 2025

https://github.com/minicli/curly

Simple Curl Client

crawler curl hacktoberfest php

Last synced: 22 Jun 2025

https://github.com/supadata-ai/mcp

Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.

ai crawler llm mcp scrape tiktok transcript whisper youtube

Last synced: 14 Oct 2025

https://github.com/wuxudong/rxcrawler

a java crawler base on rx-java

crawler nio rxjava

Last synced: 02 Aug 2025

https://github.com/wearetyomsmnv/gptbuster

Generative web directory fuzzer,crawling and subdomain checker based on chatgpt

crawler gpt hacking pentesting python3 reconnaissance web

Last synced: 13 Apr 2025

https://github.com/xiaoluoboding/metafy-svg

Easily crawl a website's metadata and generate SVG as a service.

crawler metadata saas serverless-functions svg vercel-serverless

Last synced: 23 Mar 2025

https://github.com/hoc081098/comic_app_server_nodejs

Node.js sever for android comic app | https://comic-app-081098.herokuapp.com/

comic-app crawler nodejs nodejs-crawler nodejs-typescript typescript

Last synced: 06 Mar 2026

https://github.com/niloysikdar/go-imdb-crawler

Want to know which celebrities have a common birthday with yours? 👀 Get the full data about them. Made using Go + Colly

colly crawler golang imdb

Last synced: 23 Oct 2025

https://github.com/koshqua/scrapio

Simple and easy-to-use scraper and crawler in Go.

crawler framework go golang json scraper spider

Last synced: 14 Jan 2026

https://github.com/floschnell/flatcrawl-processors

A set of processors that will instantly inform users via a set of channels (ie. Telegram) of new flats that are found on different rental websites.

bot crawler flatcrawl flats real-estate rentals-search telegram

Last synced: 01 Feb 2026

https://github.com/bgadrian/warmcache

A simple tool to scan your website to keep your cache hot & ready. Helper tool for Prerender, Squid, CDN etc..

cache cdn crawler go golang prerender prerenderio squid

Last synced: 13 Apr 2025

https://github.com/gridaco/figma-archives

Figma Files Scraper for Research & Studies

crawler dataset design-database figma machine-learning scrapy selenium

Last synced: 06 Oct 2025

https://github.com/saltyshiomix/web-master

Web mastering tools for my personal services

crawler javascript nodejs scraper typescript web

Last synced: 16 Mar 2025

https://github.com/dxsooo/shortvideocrawl

Short video crawler based on scrapy

crawler kuaishou scrapy spider video-crawler

Last synced: 26 Jul 2025

https://github.com/freekatz/jd_sentiment_analysis

一个简单的京东商品评论爬虫、处理、可视化、情感分析与模型评估实践

crawler jd spider

Last synced: 09 Apr 2025

https://github.com/1491270550/xueqiu_spider_lqh_lzq

雪球爬虫 高效爬取近期沪深A股股票评论并自动生成PDF版情感分析报告

crawler python3 spider xueqiu xueqiu-stock

Last synced: 12 Jun 2025

https://github.com/fanhuaandluomu/qqzoneparse

模拟登陆QQ空间,获取好友信息,并做分析(年龄分布、性别分布、地址分布等)具体参见说明文档及1049755192文件夹下的分析结果展示。

crawler python27 qqzone spider

Last synced: 01 May 2025

https://github.com/wangy8961/python3-concurrency-pics-01

爬虫多线程或异步下载 http://gank.io/api/data/%E7%A6%8F%E5%88%A9/1000/1 所分享的美女图片

aiohhtp asyncio coroutine crawler progressbar python3 requests threadpool

Last synced: 10 Jun 2025

https://github.com/amirzenoozi/insta-downloader

You Can Download Instagram Post With This Script

crawler crawling downloader instagram

Last synced: 20 Jul 2025