Crawler | Ecosyste.ms: Awesome

https://github.com/ronin-rb/ronin-web-spider

A collection of common web spidering routines

crawler infosec recon ruby scraper spider utils web websecurity

Last synced: 01 Aug 2025

https://github.com/xunzhuo/airspider

A Fast and Light Python Spider Framework 🕷️

asynchronous crawler crawler-python distributed python3 redis spider spider-framework web

Last synced: 23 Mar 2025

https://github.com/helingfeng/stay-reader

📚Miniprogram Book Reader

crawler laravel-application miniprogram php

Last synced: 30 Jul 2025

https://github.com/aquilax/opendirindexer

Open directory indexer

crawler go indexing

Last synced: 11 Jul 2025

https://github.com/adileo/MicroFrontier

A lightweight crawler frontier implementation in TypeScript using Redis.

crawler frontier microservice redis robots-txt spider

Last synced: 07 May 2025

https://github.com/michaelradu/web-crawler

A Web Crawler developed in Python.

crawler crawler-python crawlers python python-3 python-script python3 script scripting scripting-language scripts web web-crawler web-crawler-python web-crawlers web-crawling webcrawl webcrawler webcrawling

Last synced: 25 Jul 2025

https://github.com/nerohin/millions-crawler

Homework III of NCKU course WEB RESOURCE DISCOVERY AND EXPLOITATION , I've used the distribute crawler to crawling over miliion web page.

crawler distributed scrapy spider web-crawler

Last synced: 09 Feb 2026

https://github.com/n370/springer-scraper

academic book crawler ebook free scraper springer

Last synced: 18 Jul 2025

https://github.com/gbolmier/newspaper-crawler

:spider: An autonomous French newspaper crawler based on Scrapy framework

crawler scrapy

Last synced: 11 Apr 2025

https://github.com/599316527/nakeyouku

抓取优酷视频信息

crawler headless-chrome youku

Last synced: 14 Apr 2025

https://github.com/ruanwenjun/crawl-demo

一个简单的JAVA爬虫项目，爬取微博热搜，百度等网页的热搜词

crawler java

Last synced: 15 Apr 2025

https://github.com/sdq/kaggle-crawler

simple scrapy project for kaggle.com

crawler kaggle

Last synced: 15 May 2026

https://github.com/twtrubiks/google-play-store-spider-selenium

Google-Play-Store-spider use Selenium +Beautiful Soup on Python

beautifulsoup chrome crawler firefox python selenium spider sqlite

Last synced: 15 Apr 2025

https://github.com/flute/instagram-crawler

instagram crawler, downloads all video and photos from users or tags

crawler instagram instagram-crawler instagram-downloader

Last synced: 11 Jun 2025

https://github.com/foolishway/imagespider

超轻量级多协程百度图片爬虫

baidu crawler go goroutine image spider

Last synced: 14 Jan 2026

https://github.com/anikhasibul/stackoverflow-scraper-messenger-bot

A messenger bot that answers messages by scraping stackoverflow questions and answers

chatbot crawler messenger-bot scrapper stackoverflow

Last synced: 09 Apr 2025

https://github.com/ammarfaizi2/newsscraper

News Scraper

api api-service crawler scraper web-scrapper

Last synced: 14 Apr 2025

https://github.com/SupervisedCo/HyperCrawlTurbo

HypercrawlTurbo is a turbocharged web scraper for extracting URLs from a webpage.

ai crawler ml nlp retrieval retrieval-augmented-generation

Last synced: 29 Jul 2025

https://github.com/vndee/visee

Just a typical search engine in this universe :fire::fire::fire:

crawler django docker e-commerce elasticsearch flask kafka python visual-search

Last synced: 26 Jun 2025

https://github.com/alaouimehdi1995/simplified-search-engine

Multithreaded Web Crawler, Scraper, Indexer

container crawl crawler crawling database docker docker-compose engine index indexer indexing mongodb python python-3 scraper scraping search-algorithm search-engine searching

Last synced: 06 Mar 2025

https://github.com/pi-2r/devoxxfr2025-tock-studio-ia-gen

Projet issu du codelab Devoxx France 2025 “À la recherche du RAG perdu” : atelier de 3h pour apprendre à créer un chatbot IA Générative autonome, local et sans Internet, basé uniquement sur des frameworks open source

ai chatbot crawler devoxx devoxx-fr-2025 docker generative-ai jailbreak kotlin langchain langfuse localai mistral ollama open-source rag scrapoxy scrapy

Last synced: 07 Oct 2025

https://github.com/dori-dev/flask-corona-info

Live Corona statistics and information site with flask.

coronavirus-real-time coronavirus-tracking crawler flask python python3 scrapy spider

Last synced: 13 Sep 2025

https://github.com/dachcom-digital/pimcore-dynamic-search-data-provider-crawler

A Spider Crawler Extension for Pimcore Dynamic Search.

crawler index pimcore scraper search

Last synced: 10 Apr 2025

https://github.com/kevincobain2000/go-app-reviews-scraper

Apple app store reviews and ratings scraper.

applestore applestoreconnect crawler ios iosapp ratings ratings-extractor reviews reviewscrapper scraper

Last synced: 12 May 2025

https://github.com/jack482653/go-ptt-web-craweler

ptt 網路版爬蟲

crawler golang ptt

Last synced: 14 Jan 2026

https://github.com/azimjohn/musicspider

MusicSpider API

crawler flask google-assistant music

Last synced: 01 Mar 2025

https://github.com/utkucanbykl/webcrawler

Web Crawler , Search Links , Submit Button or Search any html tag

crawler flask module python web

Last synced: 14 Aug 2025

https://github.com/wisecirno/wechat-official-account-toolkit

处理微信公众号文章的工具包

crawler image-downloader wechat wechat-official-account

Last synced: 15 Jun 2025

https://github.com/omkarcloud/web-scraping-template

🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING WEB SCRAPING BOTS. 🤖

beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping

Last synced: 24 Oct 2025

https://github.com/sinramyeon/go_slack_bot

고언어 기반 슬랙 크롤링 봇입니다. Slack interactive bot made by go, including rss feed parsing, web crawling, github commit alarm

bot crawler github-api go golang rss-feed rss-feed-scraper slack slackapi slackbot

Last synced: 16 Jan 2026

https://github.com/Ivan-Alone/InstaStories-Saver

Program to saving Instagram Stories

api backup crawler grambler insta instagram instagram-stories instastories-saver instastory stories

Last synced: 13 Jul 2025

https://github.com/tosone/githubtraveler

Travel all of the GitHub users, orgs, repos.

crawler github golang

Last synced: 11 Jun 2025

https://github.com/shawon922/jobs-crawler

Crawl IT/Telecommunication jobs from bdjobs.com

beautifulsoup4 crawler python3

Last synced: 22 Apr 2025

https://github.com/visuellverstehen/t3fetch

Fetches a website (including all subpages), so the TYPO3 cache gets filled.

cache crawler fetch typo3 typo3-extension

Last synced: 03 Aug 2025

https://github.com/tsoliangwu0130/spotify-news

A Flask application to retrieve the singers' latest news according to your Spotify current playing song.

bootstrap crawler flask oauth2 python3 restful-api spotify-api

Last synced: 26 Apr 2025

https://github.com/blesstosam/registerappleid

a node js program for registering appleid automatically

crawler nodejs

Last synced: 14 Mar 2026

https://github.com/pawod/gis-berlin-rents

A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.

apartment-rents berlin crawler gis immobilienscout24

Last synced: 03 Apr 2025

https://github.com/webcoast-dk/versatile-crawler

Extendable and easy to use crawler extension for TYPO3 CMS

crawler extendable indexing search typo3

Last synced: 18 Oct 2025

https://github.com/igeligel/BackpackLogin

:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.

bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2

Last synced: 05 May 2025

https://github.com/68publishers/crawler

:spider_web: Awesome scenario based crawler

crawlee crawler crawling node nodejs scraper scraping

Last synced: 22 Apr 2025

https://github.com/mediamonks/symfony-crawler-bundle

Implements the crawler package into Symfony

crawler php symfony symfony-bundle

Last synced: 28 Jul 2025

https://github.com/chipscoco/oceanmonkey

OceanMonkey is a High-Level Distributed Web Crawling and Web Scraping framework base on multi-process and multi-coroutines, used to crawl websites and extract structured data from their pages like the classical scrapy framework.

coroutines crawler multiprocessing python python3 scraper scraping spider

Last synced: 11 Sep 2025

https://github.com/public-law/nevada-revised-statutes-parser

Parses the Nevada NRS into well formed JSON

crawler haskell legaltech opengov parser scraper

Last synced: 11 Nov 2025

https://github.com/a252937166/quick-selenium

主要使用quick-spring和selenium两个框架爬取各种动态网页的信息

crawler quickstart selenium

Last synced: 20 Jul 2025

https://github.com/sabinbajracharya/insta-crawler

Pulls data from instagram and saves it to Firebase for storage and Algolia for search

accounts algolia algolia-search crawler firebase firebase-database instagram instagram-feed instagram-post javascript nodejs public scraper

Last synced: 03 Jul 2025

https://github.com/spekulatius/spatie-crawler-cached-queue-example

Example to demonstrate the usage of cached queues across multiple requests.

crawler crawler-engine laravel php-crawler php-scraper queues spatie-crawler

Last synced: 01 May 2025

https://github.com/bioinformatist/py3_scripts

Life is short, *****.

blast crawler gtf pacbio scrapy

Last synced: 24 Apr 2025

https://github.com/bbc2/discolinks

Command-line tool which checks a website for broken links.

broken-links crawler html http link-checker link-checkers link-checking validator web

Last synced: 22 Mar 2025

https://github.com/duongdev/facebook-group-crawler

Facebook Groups Discussions Crawler

crawler facebook groups puppeteer

Last synced: 01 May 2025

https://github.com/insign/spatie-crawler-queue-with-laravel-model

Spatie's Crawler with Laravel Model as Queue

cache crawler eloquent laravel queues spatie spatie-crawler

Last synced: 15 Apr 2025

https://github.com/twtrubiks/pttstatistics

統計PTT看板推文 or 文章標題熱門關鍵詞 on python

crawler ptt ptt-hot-key python statistics

Last synced: 15 Apr 2025

https://github.com/umihico/minigun-requests

Web scraping API to outsource tons of GET & xpath to cloud computing

crawler crawling scraping scraping-api scraping-framework scraping-python web-scraping

Last synced: 13 Apr 2025

https://github.com/mmqnym/nft-market-sniper

This bot helps people to get more infomation (e.g. Floor price) automatically from Ebisu's bay (The NFT Market on Cronos).

crawl crawler discord nft pycord python

Last synced: 29 Jan 2026

https://github.com/activatedgeek/winemag-dataset

Dataset of Wine Reviews from Wine Enthusiast Magazine :grapes: :wine_glass: :earth_asia:

crawler dataset python3 scrapy scrapy-spider vega-lite visualization wine wine-tasting

Last synced: 09 Oct 2025

https://github.com/samiahmedsiddiqui/http-auth

Provides comprehensive security during development by protecting your entire site and your admin pages from brute-force attacks.

admin auth authentication brute-force brute-force-attacks crawl crawler http-auth http-authentication locked login restrict-pages restrict-site wordpress wordpress-plugin

Last synced: 12 Apr 2025

https://github.com/capturr/jsonld-extract

A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDom ...).

cashdom cheerio crawler crawling data extract extractor javascript jquery json jsonld metadata nodejs parser scraper scraping spider typescript

Last synced: 24 Mar 2025

https://github.com/bdadam/metatag-crawler

This is a simple node.js module for scraping meta information from web pages.

crawler metadata nodejs parser

Last synced: 26 Jun 2025

https://github.com/sabinbajracharya/Insta-crawler

Pulls data from instagram and saves it to Firebase for storage and Algolia for search

accounts algolia algolia-search crawler firebase firebase-database instagram instagram-feed instagram-post javascript nodejs public scraper

Last synced: 12 Apr 2025

https://github.com/toddlerya/learn_scrapy

learn Scrapy 1.4.0

crawler demo python scrapy tutorial

Last synced: 07 May 2025

https://github.com/trungdq88/movie-showtimes

Web Service & Android Application to look up Vietnam movie showtimes

crawler java movie-showtimes theater

Last synced: 12 Apr 2025

https://github.com/0memo07/web-crawler

Web Crawler with Python

beautifulsoup4 bs4 crawler crawlers crawling crawling-python web-crawler web-crawler-python web-crawling webcrawler

Last synced: 24 Apr 2025

https://github.com/irq0/llar

🖖 Live Long and Read! A self-hosted news aggregator focused on customizability.

clojure crawler feed-reader hackernews-api news-aggregator news-reader reddit-api rss rss-reader

Last synced: 30 Jan 2026

https://github.com/petersonjr/MetadataCrawler

A simple tool to extract metadata from relational databases

avro crawler database-schemas java jdbc metadata rdms relational-databases

Last synced: 06 May 2025

https://github.com/keul/allanon

A Web crawler that visit a predictable set of URLs, and automatically download resources you want from them

crawler python

Last synced: 28 Apr 2025

https://github.com/flute/coub-crawler

coub.com crawler, download all videos.

coub coub-com-crawler coub-crawler crawler

Last synced: 01 Mar 2026

https://github.com/itwars/golang-scraping-colly

Exemples de récupération de données non structurées avec le framework Golang COLLY

bigdata colly crawler crawling data forecast golang scraper scraping sports

Last synced: 17 May 2025

https://github.com/igeligel/backpacklogin

:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.

bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2

Last synced: 15 May 2025

https://github.com/pps-22-scooby/pps-22-scooby

Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.

crawler crawlers internal-dsl scala scraper scrapers web web-crawler web-crawling web-scraper web-scrapers

Last synced: 24 Oct 2025

https://github.com/sebobo/shel.crawler

Neos based crawler for nodes and sites

crawler neos-cms

Last synced: 12 Apr 2025

https://github.com/piotrpdev/WeBuy-Cex-Price-Tracker

A python script that gets the prices of certain Cex products and uploads them to google sheets

cex cex-api cex-products crawler google-sheets gspread prices python-script webuy webuy-api webuy-cex webuycex

Last synced: 10 Mar 2025

https://github.com/eight04/ptt-mail-backup

一個用來抓取 PTT 站內信的 BBS Bot

bbs cli crawler ptt ptt-crawler python python3

Last synced: 05 Jul 2025

https://github.com/29dch/word_cloud

python制作词云项目

crawler jieba wordcloud

Last synced: 05 Oct 2025

https://github.com/luizppa/web-crawler

A web crawler that collects and indexes web pages. Made with chilkat and gumbo parser.

chilkat cpp crawler webcrawler

Last synced: 17 Aug 2025

https://github.com/luyadev/luya-module-crawler

Crawle a Website and provide intelligent search results

crawler hacktoberfest intelligent-search luya search yii2

Last synced: 25 Oct 2025

https://github.com/fanzeyi/torchic

A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.

bm25 crawler search-engine

Last synced: 28 Apr 2025

https://github.com/bajins/scripts_python

Python 脚本

crawler faker faker-generator python-3 python3 rclone rclone-client rclone-config rclone-configuration reptile reptile-image reptiles scraper spider

Last synced: 03 Oct 2025

https://github.com/yerkopalma/bash-crawler

:computer: Get a site links with bash

bash crawler

Last synced: 05 Aug 2025

https://github.com/rggh/scrapy18

Scrapy start_urls from csv demo

crawler linkextractor scrapy

Last synced: 03 Aug 2025

https://github.com/aurelius84/pycrawler

A flexible spider based on mysql

crawler etl mysql scrapy spider

Last synced: 10 Apr 2025

https://github.com/bfwg/node-tinycrawler

Tiny web-crawler in a nute shell for Node.js

crawler nodejs redis

Last synced: 10 Nov 2025

https://github.com/jayantgoel001/artyvistechnologies

crawler csv json scrapy

Last synced: 02 May 2025

https://github.com/aerogo/crawler

:rowboat: Web crawler.

crawler go

Last synced: 10 Apr 2025

https://github.com/yggverse/yggstate

Yggdrasil Network Explorer

analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate

Last synced: 14 Jan 2026

https://github.com/adambankz/tiktok-scraper

A simple, no download scraper for social media platforms like TikTok. Just input parameters and parse useful data. Download TikTok videos with no watermark

crawler no-watermark parse scraper scraper-site tiktok-no-watermark tiktok-scraper