An open API service indexing awesome lists of open source software.

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/spraakbanken/svt-crawler

Programme for crawling SVT's API for news articles and converting the data to XML.

corpus crawler

Last synced: 07 Mar 2026

https://github.com/Anakeyn/website-contextual-links

Récupération des liens contextuels d'un site Web avec R.

crawler gephi r

Last synced: 17 Jul 2025

https://github.com/skulltech/arachnid

Crawling Instagram for reasons.

crawler instagram instagram-scraper python3 scraper scrapy

Last synced: 13 Jun 2025

https://github.com/krishpranav/spider

A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely

crawler ruby spider web-crawler web-scraping

Last synced: 14 Feb 2026

https://github.com/andreoliwa/scrapy-tegenaria

🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢

crawler flask postgresql python python3 scrapy

Last synced: 13 Apr 2025

https://github.com/zhaoweih/meizitu-crawler

🕷️妹子图爬虫-Scrapy

crawler meizitu python scrapy spider

Last synced: 11 Apr 2025

https://github.com/joelkoen/wls

Easily crawl multiple sitemaps and list URLs

crawler sitemap url

Last synced: 12 Apr 2025

https://github.com/viper373/xovideos

一个为用户打造的个性化视频下载工具

boto3 crawler downloader githubactions m3u8 mongodb mp4 pornhub python s3-storage

Last synced: 16 Jun 2025

https://github.com/izumisy/scalable-crawler

Scalable crawler, fully-managed by Google Cloud Platrom

crawler docker gcp golang ruby

Last synced: 12 Apr 2026

https://github.com/pjt3591oo/exchange-crawler

업비트, 코인원 크롤러

crawler data exchange python

Last synced: 27 Oct 2025

https://github.com/anyparser/anyparserjs

Anyparser Typescript SDK for RAG/ETL Pipelines - File Content Extraction. Supports extraction from various file formats including PDF, Microsoft Office documents, OCR/Image to Text, Audio to Text, and Website to Text.

anyparser artificial-intelligence cache-augmented-generation crawler etl-pipeline graph-rag knowledgebase langchain microsoft-office microsoft-word ms-office n8n-nodes ocr pdf-extraction rag retrieval-augmented-generation text-extraction web-crawler

Last synced: 17 Feb 2026

https://github.com/exca-dk/node-util

Usefull utils for analyzing p2p crypto networks.

crawler ethereum mev p2p scanner

Last synced: 16 May 2026

https://github.com/keosariel/ramby

Ramby is a simple way to setup a webscraper

beautifulsoup crawler python3 webscraping

Last synced: 27 Mar 2025

https://github.com/exp-codes/sina-crawler

新浪博客爬虫

crawler programming

Last synced: 03 Apr 2025

https://github.com/diogoazevedos/x-ray-build

A helper that build a x-ray based on a schema

crawler schema scraper structure x-ray

Last synced: 24 Feb 2026

https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse

[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.

browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms

Last synced: 06 Feb 2026

https://github.com/genfuture/cryptocurrency-scraper

Cryptocurrency Data Crawler 🚀 Updates CoinData Every 12 hours. High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 28 Jan 2026

https://github.com/lucky845/animetimeline

使用Python脚本爬取动漫信息时间表,并保存为Markdown文件。

anime crawler python-script

Last synced: 09 Jul 2025

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 06 Feb 2026

https://github.com/xiantang/mini_scrapy

模仿scrapy的轻量级爬虫框架

crawler python3 requets scrapy

Last synced: 27 Mar 2025

https://github.com/sc0vu/jspachong

Js crawler library.

crawler pachong

Last synced: 06 Feb 2026

https://github.com/becky-dai/flower-knowledge-graph-visualization

A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目

crawler css django echarts html js knowledge-graph neo4j python

Last synced: 11 Mar 2026

https://github.com/zhuruili/spider

一些简单的爬虫代码,会不定时更新,希望能帮到你

crawler drissionpage python requests

Last synced: 18 Mar 2025

https://github.com/afsh7n/crawly-automation

Crawly Automation is a lightweight, modular, and extensible web crawling framework built on top of Puppeteer. Whether you need to scrape data, automate browser interactions, manage CAPTCHAs, or handle advanced data extraction, Crawly Automation simplifies the process.

automation crawler nodejs puppeteer webscraping

Last synced: 25 Feb 2026

https://github.com/leona/go-crawler

Concurrent web crawler built in Golang

crawler golang scraper spider

Last synced: 17 Jan 2026

https://github.com/qiaocco/crawler

爬虫:百度贴吧、今日头条(阳光宽频网)、笔趣阁

crawler python

Last synced: 27 Mar 2025

https://github.com/nemmusu/free-vpn-downloader

This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.

automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn

Last synced: 07 Feb 2026

https://github.com/qiubits2007/xml-sitemap

Multi-domain XML sitemap generator with support for robots.txt, meta tags, email logging & search engine pinging

crawler generator gzip multi-domain php8 robots-txt seo seotools sitemap-builder sitemap-generator sitemap-xml

Last synced: 25 Feb 2026

https://github.com/vmandic/tris-web-crawler

Tris is a simple NodeJS web crawler tool to help you collect links from visited links of a website's domain.

crawler data-tools nodejs scraping seo-tools web-scraper

Last synced: 20 May 2026

https://github.com/yggverse/aquatic-crawler

SSD-friendly FS crawler for the Aquatic BitTorrent tracker, based on librqbit API

api aquatic bencode bittorrent btracker crawler daemon info-hash ipv6 librqbit magnet parser resolver rqbit torrent tracker

Last synced: 11 Mar 2026

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 09 Apr 2025

https://github.com/panyanyany/vps_spider

VPS Spider powering https://findallvps.com

crawler spider vps

Last synced: 28 Feb 2025

https://github.com/basemax/fakefaces

This repository contains a crawler that downloads thousands of fake human face images from various sources on the internet. Additionally, the repository includes a dataset of thousands of face images of fake humans.

crawler crawler-php crawler-testing crawlers curl dataset datasets face face-fake faces fake-face fake-faces php php-curl

Last synced: 27 Apr 2026

https://github.com/benderpan/fakeagent.net

Fake Agent for .Net Standard.

agent crawler fake-agent http-headers

Last synced: 12 Apr 2025

https://github.com/yidas/tw-stock-crawler-php

PHP Crawler for Taiwan Stock Data (台股資料爬蟲)

crawler stock taiwan taiwan-stock-information taiwan-stock-market

Last synced: 25 Mar 2025

https://github.com/marvnc/pixiv-dump

Pixiv Encyclopedia DB Dumps, updated daily

crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping

Last synced: 12 Jan 2026

https://github.com/travorlzh/temperature-analyzer

Python crawler that helps fetch temperature of Beijing, China

crawler homework python variance

Last synced: 25 Aug 2025

https://github.com/wangshouh/icourse163_script

A python script designed for like and comments to MOOC. 用于中国大学MOOC点赞和评论的Python脚本

crawler icourse163 python requests

Last synced: 28 Mar 2025

https://github.com/nazanin1369/searchengine

Implementing a search engine using Java, AngularJS and Elastic search

angularjs crawler elasticsearch java search-engine

Last synced: 12 Apr 2026

https://github.com/jxeng/site-info-crawler

A tool for batch crawling website's title, description, favicon.

crawler favicon title

Last synced: 30 May 2026

https://github.com/madis/flatcrawl

Clojure app for crawling apartment information from http://kv.ee

clojure crawler real-estate webapp

Last synced: 05 Jul 2025

https://github.com/yukito0209/is6941-ml-social-media

IS6941 Machine Learning & Social Media Analytics 课程小组项目代码仓库,探索机器学习在社交媒体数据分析中的应用。

bert city-university-of-hong-kong crawler data-collection llama machine-learning python sentiment-analysis social-media

Last synced: 01 Apr 2025

https://github.com/arpan404/spidey

Spidey is a powerful asynchronous web crawler built in Python that can crawl websites and download files with specified extensions. It's designed to be efficient, configurable, and easy to use.

asynchronous crawler dataminer opensource python webscraper

Last synced: 04 Feb 2026

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 10 Mar 2025

https://github.com/nakabonne/netsurfer

netsurfer is a very lightweight scraping framework

crawler go library scraping

Last synced: 01 Apr 2025

https://github.com/congcoi123/crawler-sheis

A small crawler for getting data from the website: https://sheis.vn

crawler webcrawler webcrawling webscraper webscraping

Last synced: 25 Feb 2026

https://github.com/codeforequity-at/botium-crawler

Botium Crawler - Like a Website Crawler, just for Conversation Flows

botium chatbots crawler

Last synced: 23 Apr 2025

https://github.com/leo9960/waimai_crawler

抓取外卖平台商户信息

crawler

Last synced: 23 Apr 2025

https://github.com/raspi/scrapy-kuntavaalit2021-yle

Fetch YLE kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 26 Apr 2025

https://github.com/first-coding/django-and-web

This is a django and Web front - and back -end separation project.

crawler django python

Last synced: 16 Feb 2026

https://github.com/tikazyq/colly-crawlers

Crawlers using Golang-based web crawling framework Colly

crawler

Last synced: 15 Jun 2025

https://github.com/ozakboy/taiwan-news-crawlers

.net-based Crawlers for news of Taiwan (.net 台灣新聞爬蟲,數據物件化,方便使用)

crawler data-collection dataset-generation dotnet news taiwan webcrawlers

Last synced: 15 Apr 2025

https://github.com/amirhoseinsalimi/boxapi-python

Python client for https://boxapi.ir to crawl and read Instagram data.

crawler instagram instagram-api python python3

Last synced: 26 May 2026

https://github.com/akagi201/spy

A lightweight distributed web crawler

crawler distributed lightweight nsq

Last synced: 26 Feb 2025

https://github.com/pedrohs1771/hyenzy-x-anime-scraper

A powerful all-in-one media scraper for Anime and Games with 4K Upscale (MPV) and Discord RPC.

anime-scrapper anime4k crawler discord-rpc game-downloader mpv-player playwright python upscale

Last synced: 30 May 2026

https://github.com/microlinkhq/ua

A simple redis primitives to incr() and top() user agents

crawler redis user-agent user-agent-parser

Last synced: 18 Mar 2026

https://github.com/saadali1996/goose-rest-api

https://github.com/advancedlogic/GoOse based REST API for article content extraction

crawler golang rest-api

Last synced: 09 Mar 2026

https://github.com/zabuzard/songcrawler

Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.

command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler

Last synced: 09 Jun 2026

https://github.com/maxbubblegum47/spotydump

Spotify Scraper combined with a Genius Scraper. Scrape artist of a certain period of time/region of the world and dump all their songs!

crawler dump genius lyrics python spotify unimore-informatica

Last synced: 22 Mar 2025

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 07 Sep 2025

https://github.com/harryandriyan/21scrap

Cinema XXI movie data scraper

crawler python scrapy

Last synced: 14 Mar 2025

https://github.com/bimmr/site-crawler

Chromium Extension: Crawl a website

chrome-extension crawler downloader sitemap

Last synced: 12 Mar 2026

https://github.com/fernandod1/yahoo-finance-scraper

This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.

crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api

Last synced: 24 Aug 2025

https://github.com/ging-dev/sitemap-crawler

Collect links through the sitemap.xml or robots.txt

crawler php php8 sitemap sitemap-crawler

Last synced: 10 Jan 2026

https://github.com/ductnn/curls

Simple tool crawler URLs form domain

colly crawler domain golang scanning url

Last synced: 04 Apr 2025

https://github.com/0fatal/zjxxc-crawl

在浙学爬虫:作业情况和登录

crawler

Last synced: 03 Apr 2025

https://github.com/pinpox/go-random-downloader

Download Html using "Random Page"

crawler golang

Last synced: 17 Aug 2025

https://github.com/der3318/zijfhchat-crawler

手遊「紫禁繁花」-聊天室爬蟲、即時查詢

crawler dashboard line-notify

Last synced: 04 Oct 2025

https://github.com/nbdy/prntscrngrb

prnt.sc / lightshot crawler, nudity detection and text extraction to a sqlite database

crawler nudity-detection prntsc text-extraction

Last synced: 04 Oct 2025

https://github.com/wangyihang/acw-sc-v2-py

Python requests.HTTPAdapter for `acw_sc__v2`

acw-sc-v2 crawler waf

Last synced: 18 Jun 2026

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 02 Aug 2025

https://github.com/jofaval/webscraping

WebScraper providing tools to scrape tons of websites with the same base

crawler e-commerce python scraper webscraper webscraping

Last synced: 06 Oct 2025

https://github.com/eeriemyxi/nosori

Online image viewer for https://coomer.su and https://kemono.su

api coomer crawler docker image javascript kemono server typescript video viewer web

Last synced: 01 Aug 2025

https://github.com/litingyes/cobweb

Collect, store and distribute meaningful static data

apis bing-image bing-wallpapers crawler image random-image

Last synced: 31 Jul 2025

https://github.com/joeri-abbo/python-credly-scraper

This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an

badges crawler credly data-extraction json organizations python python3 requests-library skills web-crawling

Last synced: 23 Sep 2025

https://github.com/nextlevelshit/fick

Fucking Incredible Command line King. Add CLI flavour to any website you like to.

cli crawler

Last synced: 17 Feb 2026

https://github.com/zekrotja/r34-crawler

A simple CLI tool to fetch and download images from rule34.xxx

crawler go rest-api rule34 worker-pool xml

Last synced: 06 Mar 2026

https://github.com/darkoatanasovski/htmltags

A useful GoLang package for strip HTML tags from a string

crawler go golang html package strip tags web

Last synced: 17 Jan 2026

https://github.com/sieep-coding/web-crawler

A simple web crawler implemented in Go.

crawler go golang web-crawler

Last synced: 09 Mar 2026

https://github.com/overcat/orianna

Another DHT crawler written in Node.js

bittorrent crawler dht

Last synced: 17 Jan 2026

https://github.com/destan0098/go-agent

you can use this package to make random user agent

crawler security security-tools user-agent user-agents

Last synced: 20 Sep 2025

https://github.com/sachin-kumar-2003/seocrawler

SEO Link Checker | Find Broken Links & Improve SEO I have built an SEO Link Checker that helps businesses, marketers, and site owners scan their websites, detect broken or harmful links, and fix them fast. This improves site health, user experience, and search rankings. Features: -Scan entire website for broken internal and external links

beautifulsoup crawler fastapi reactjs seo seo-optimization

Last synced: 15 Apr 2026

https://github.com/laurybueno/crawler-olhovivo

Coletor de dados mapeáveis do transporte público de ônibus em São Paulo

api crawler docker olhovivo python sptrans

Last synced: 27 Apr 2026

https://github.com/jjlibra/bake-mediacrawler

NanmiCoder‘s self-media data crawling software

crawler learning

Last synced: 06 May 2025

https://github.com/marcus-v-freitas/crawlerbrazilgovdata

Projeto ASP.NET Core .NET 5 para Extração e Parseamento de Dados do governo de São Paulo com integração com Buckets S3, Filas SQS AWS e Persistência realizada via EF Core no Mysql.

api-rest aspnetcore automapper aws crawler csharp efcore government-data htmlagilitypack linux memory-cache mysql net5 onion-architecture parallel-computing parser s3-bucket serilog sqs-queue swagger

Last synced: 17 Jan 2026

https://github.com/darealfreak/figure-tracker

application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops

crawler figure-tracker monitoring

Last synced: 30 Mar 2025

https://github.com/antoinegagne/treewalker

A web crawler in Erlang that respects `robots.txt`.

crawler erlang webcrawler

Last synced: 11 Feb 2026

https://github.com/supadata-ai/py

Official Python SDK for the Supadata API.

ai api crawler llm markdown scraping sdk transcript web-scraper youtube

Last synced: 22 Mar 2025

https://github.com/lockblock-dev/crawlarr

Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.

crawler golang

Last synced: 18 Mar 2025

https://github.com/galaxiat/galaxiat.serve.seo

Node.JS package to serve React app and prerender path (cron)

crawler cron puppeteer seo seo-optimization ssr

Last synced: 31 Jan 2026