Crawler | Ecosyste.ms: Awesome

https://github.com/wenyalintw/job-scraper-bot

幫朋友做好玩的Telegram機器人，已部署到Heroku

amazon-web-services aws-s3 boto3 crawler google-drive google-drive-api heroku heroku-deployment python-telegram-bot scraper scraping scrapy telegram telegram-bot telegram-bot-api web-scraping

Last synced: 13 Sep 2025

https://github.com/baraja-core/webcrawler

Simple crawling websites by following links.

bot crawler crawling-websites fast php robot speed

Last synced: 03 Sep 2025

https://github.com/rggh/scrapy18

Scrapy start_urls from csv demo

crawler linkextractor scrapy

Last synced: 03 Aug 2025

https://github.com/vmarcosp/supervise-crawler

:male_detective: Supervise crawler

crawler esy ocaml reasonml webcrawler

Last synced: 13 May 2025

https://github.com/alexqi/webphantom

面向 Web 数据采集任务的开源爬虫框架，支持接口调用、任务调度、会话管理等核心功能，适用于构建具备一定反爬能力的自动化采集系统（抖音｜小红书｜淘宝｜京东）

crawler douyin qps scheduler taobao xiaohonghsu

Last synced: 22 Jun 2026

https://github.com/windfarer/biu

biubiubiu~~ I'm a tiny web crawler framework

crawler python spider spider-framework web-crawler

Last synced: 23 Mar 2025

https://github.com/appliedsoul/promise-crawler

Promise support for node-crawler (Web Crawler/Spider for NodeJS + server-side jQuery)

crawler node-crawler nodejs promise-node-crawler spider

Last synced: 28 Feb 2026

https://github.com/suqingdong/pubmed2

NCBI PubMed Crawler

crawler ncbi pubmed python

Last synced: 25 Dec 2025

https://github.com/khaleddallah/LinkedinScraper

Python Scrapy project parse people profiles of Linkedin Search and arrange result content in Excel and Json file

crawler excel json linkedin python scraper scrapy spider

Last synced: 06 Apr 2025

https://github.com/yggverse/yggstate

Yggdrasil Network Explorer

analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate

Last synced: 14 Jan 2026

https://github.com/bfwg/node-tinycrawler

Tiny web-crawler in a nute shell for Node.js

crawler nodejs redis

Last synced: 10 Nov 2025

https://github.com/chrisweb/universal-nodejs-scraper

Universal node.js scraper, is a simple tool to crawl web pages and extract content that can then be stored in csv files (sheets) or directly into a database

crawler harvester javascript nodejs scraper typescript

Last synced: 13 Jul 2025

https://github.com/kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 13 May 2025

https://github.com/ntthanh2603/crawl-analysis-data-facebook

📊 Project: Analysis & Data Crawling for Two Football Pages – Manchester United & Liverpool FC ⚽🔍

ana crawler facebook-tools jupyter-notebook numpy pandas selenium

Last synced: 26 Jun 2025

https://github.com/alexmili/reachable

Check if a URL exists and is reachable

crawler health-check monitoring reachability webscraping

Last synced: 14 Aug 2025

https://github.com/markmelnic/mobile-de-crawler

A crawler for mobile.de to index all car listings on the website.

crawler requests scraper sqlite3

Last synced: 08 Oct 2025

https://github.com/stopka/fedicrawl

Collect feeds to follow on Fediverse nodes.

crawler docker fediverse nodejs prisma typescript

Last synced: 04 Apr 2025

https://github.com/bugfishtm/bugfish-image-downloader

🖥️ Windows 🚀 Effortless web image downloads, subsite exploration, and HD selection. Windows app, .NET 4.5, no registry usage.

bugfish bugfish-windows bugfishtm crawler downloader downloadmanager downloadtool gplv3 image imagedownloader imagedownloadertool imageprocessing portable-executable portableapps software utilityapp webscraping windows windows-desktop

Last synced: 26 Jan 2026

https://github.com/jacobsteves/crawlperl

A web crawler made with Perl. Great for grabbing or searching for data off the web, or ensuring that your own site files are secure and hidden.

crawler perl scripting web-crawler

Last synced: 14 Apr 2025

https://github.com/dori-dev/quotes-crawler

Quotes crawler using scrapy and python.

crawler crawling python scraping-python scraping-websites scrapy scrapy-crawler scrapy-spider web-scraper

Last synced: 08 Oct 2025

https://github.com/xixu-me/library-data-assistant

Java-based client-server application for managing library book data with web crawling capabilities

crawler crawling database java mysql

Last synced: 08 Oct 2025

https://github.com/box-archived/vlive-py

VLIVE(vlive.tv) parser for python

api-wrapper crawler kpop parser python vlive

Last synced: 14 Jan 2026

https://github.com/rodyherrera/cdrake-se

✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).

bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube

Last synced: 19 Apr 2026

https://github.com/the1812/bingwallpapers

A tool for downloading wallpapers from Bing.

crawler csharp wpf

Last synced: 03 Apr 2025

https://github.com/yerkopalma/bash-crawler

:computer: Get a site links with bash

bash crawler

Last synced: 05 Aug 2025

https://github.com/xvc323/omnidocs

Automated documentation crawler that generates LLM-friendly Markdown from any docs site. Export as single or multi-file, ready for AI ingestion.

crawler documentation llm markdown

Last synced: 27 Jun 2025

https://github.com/Hound-fm/podcatcher

Audio media crawler for lbry.

crawler lbry python

Last synced: 12 May 2025

https://github.com/bernabe9/render-it

Render any JavaScript content to create static sites ready for SEO

crawler javascript prerender prerenderio puppeteer render seo seo-tools server-side-rendering static-site static-site-generator

Last synced: 12 Jun 2025

https://github.com/idealchain/dhtcrawler-cluster

BitTorrent DHT crawling cluster

cluster crawler dht docker-images torrent

Last synced: 27 Sep 2025

https://github.com/hypervapor/bilibili-crawler

根据关键字列表爬取 Bilibili 视频信息的后端应用 / Backend application for crawling Bilibili video information based on a list of keywords.

bilibili crawler express nodejs

Last synced: 14 Apr 2025

https://github.com/luckyzxl2016/go-spider

concurrent crawler golang spider

Last synced: 29 Jul 2025

https://github.com/mrrfv/webarchive

Crawls websites and saves found URLs to a file.

archive archiveteam archiving crawler crawling ia internet-archive scraper web-archiving web-scraping

Last synced: 18 Mar 2025

https://github.com/antosser/web-crawler

Rust Web Crawler that finds every page, image, and script on a website (and downloads it)

crawler html rust seo web

Last synced: 04 Sep 2025

https://github.com/Antosser/web-crawler

Rust Web Crawler that finds every page, image, and script on a website (and downloads it)

crawler html rust seo web

Last synced: 25 Sep 2025

https://github.com/moehmeni/ezweb

Easy to use web page analyzer

analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www

Last synced: 06 Apr 2025

https://github.com/marshhu/ma-tools

a golang tool package

crawler go golang htmltopdf mapping

Last synced: 23 Jan 2026

https://github.com/ivangrana/minerador-noticias-labsc

Raspador de notícias utilizando palavras-chaves // utilizando a biblioteca BeautifulSoup em Python

crawler python

Last synced: 17 Oct 2025

https://github.com/dist1ll/hltv-rust

A client to fetch and parse data from HLTV.org

api crawler hltv parser rust

Last synced: 03 Oct 2025

https://github.com/rvegas/dota_crawler

Crawler for dotapedia. Fills a Mongo and a PG database with game data.

crawler dota dota2 flask mongodb postgresql python3 regex scrapy

Last synced: 05 Sep 2025

https://github.com/lucasboscatti/mercado-livre-crawler

A beginner data engineering project which involves scrapping offers from https://www.mercadolivre.com.br/ofertas, stores in a postgres database and analyze the data scrapped.

crawler docker docker-compose heroku mercado-livre postgresql python scrapy sqlalchemy

Last synced: 06 Mar 2025

https://github.com/brucewind/fear-and-greed-index-alarm

A notification reminder for indicating when the CNN Fear and Greed Index is out of range.

crawler fear-and-greed fear-greed-index investment sctock stock-market us-stock-market

Last synced: 21 Jul 2025

https://github.com/AmirAref/Torobot

an inline telegram robot to easy access and search in torob.com products from telegram.

crawler python python-telegram-bot scraper telegtam-bot

Last synced: 13 Jul 2025

https://github.com/dylanhogg/legaldata

Provides access to Australian legal data

crawler data law lawtech legal legaltech

Last synced: 21 Jul 2025

https://github.com/akiosarkiz/manga-collector

The manga collector is a library designed to easily scrape manga content from various websites. This package is licensed under the MIT License and is fully test-covered

api crawler manga scraper

Last synced: 10 Jul 2025

https://github.com/xlisp/ai-auto-crawler

ai-auto-crawler: puppeteer + autogen

autogen crawler gpt puppeteer

Last synced: 31 Aug 2025

https://github.com/leo9960/bilibili_live_danmu_crawler

b站直播的弹幕抓取

bilibili crawler danmu live

Last synced: 23 Apr 2025

https://github.com/systemfsoftware/youtube-autocomplete-scraper

YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.

actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api

Last synced: 25 Jun 2025

https://github.com/markelog/map

Simple site map generator, supports couple reporters, depth levels and etc

crawler map sitemap spider

Last synced: 11 Apr 2025

https://github.com/52cik/creeper

简单爬虫引擎 (苦力怕)

crawler node-crawler

Last synced: 22 Apr 2025

https://github.com/beingvirus/jobminer

JobMiner – A Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data.

automation beautifulsoup career crawler data-collection data-mining hacktoberfest hacktoberfest-accepted hacktoberfest2025 job-scraper jobs open-source python selenium web-scraping

Last synced: 10 Oct 2025

https://github.com/arshadkazmi42/blc

Broken link checker

blc broken-link-checker broken-link-finder bug-bounty bugbounty crawler python

Last synced: 30 Oct 2025

https://github.com/frectonz/rampilo

A telegram crawler

crawler rust telegram telegram-crawler

Last synced: 07 Sep 2025

https://github.com/btlmd/thuhole_crawler

A crawler to save holes on the deceased thuhole

crawler

Last synced: 16 Jun 2025

https://github.com/mashukui/dy_trans_tool

crawler douyin douyin-api gui gui-application python3

Last synced: 04 Apr 2026

https://github.com/hctilg/taaghche-dl

Save books purchased from taaghche.com !

crawler downloader pillow-library python3 selenium taaghche

Last synced: 12 May 2025

https://github.com/thaddeusjiang/campcat

キャンプ場予約情報監視 Bot

bot crawler telegram

Last synced: 01 Aug 2025

https://github.com/basemax/googleplaydatabasemirror

Repository of designing a crawler script to update a mirror database from Google Play on PHP.

crawl crawl-pages crawler crawlers crawling database database-schema google-play mysql php

Last synced: 24 Sep 2025

https://github.com/simsso/vision-based-page-rank-estimation

Student research project on pagerank estimation with deep graph networks

cnn crawler deep-learning graph-networks page-rank student-research-project

Last synced: 24 Apr 2025

https://github.com/twtrubiks/pttcrawlercontent

PTT Crawler Content on python PTT文章爬蟲

crawler gossiping ptt python

Last synced: 15 Apr 2025

https://github.com/doroudi/imdb-crawler

imdb.com movies crawler in scrapy

crawler data-mining python scrapy

Last synced: 22 Jun 2025

https://github.com/bitscoper/bitscoper_cyber_toolbox

A Flutter application consisting of TCP Port Scanner, Route Tracer, Pinger, File Hash Calculator, String Hash Calculator, Base Encoder, Morse Code Translator, Open Graph Protocol Data Extractor, Series URI Crawler, DNS Record Retriever, and WHOIS Retriever.

android calculator crawler cybersecurity dart decoder docker encoder extractor flutter github-action ios mac retriever scanner tracer translator web windows

Last synced: 31 Jul 2025

https://github.com/bingxyz/tg-earthquake-warning

telegram 台灣地震報告廣播頻道

bash crawler telegram-bot-api

Last synced: 11 Jul 2025

https://github.com/giscafer/ziroom-crawler

自如友家租房，房源爬虫，房源状态监听，目的是抢房

crawler nodejs

Last synced: 28 Apr 2025

https://github.com/labic/ze-the-scraper

brazil crawler mongodb news newspaper portals scraper

Last synced: 09 Jul 2025

https://github.com/ajcerejeira/base.gov.pt

A crawler that fetches data from base.gov.pt

crawler csv python scrapy

Last synced: 14 Jul 2025

https://github.com/prdx23/async-crawler

A recursive async crawler which creates a graph of connected webpages

async crawler python3

Last synced: 17 Jan 2026

https://github.com/busterc/crwlr

🕷a minimal puppeteer crawler api

crawl crawler crawling puppeteer spider walker

Last synced: 23 Apr 2025

https://github.com/poyea/coronaflight-hkg

😷 Crawler and history manager for dangerous, coronavirus-infected flights to Hong Kong (VHHH)

corona coronaflight-hkg coronavirus coronavirus-analysis coronavirus-info coronavirus-tracker coronavirus-tracking crawl crawler crawlers crawling hacktoberfest hong-kong hongkong javascript json json-api node node-js nodejs

Last synced: 24 Mar 2025

https://github.com/jonasgeiler/Iconmonstr-API

An unofficial API to access icons from iconmonstr.com

api collection collections crawler eps font icon icon-font iconmonstr iconmonstr-api icons image images png psd scraper svg unofficial vector vector-graphics

Last synced: 10 Mar 2025

https://github.com/dynesshely/everydaynews

A repo fetched most of news and infomation, where stored and organized them.

crawler data fetcher network news

Last synced: 22 Feb 2026

https://github.com/exp-codes/jzone-crawler

QQ空间爬虫（Java版）

crawler programming

Last synced: 15 Jun 2025

https://github.com/jean-baptiste-camps/iiif-crawler

Interrogate IIIF servers and get images of manuscripts

crawler iiif iiif-image manuscripts

Last synced: 29 Oct 2025

https://github.com/gabfl/sitecrawl

Simple Python module to crawl a website and extract URLs

crawl crawler crawler-python crawling-sites

Last synced: 10 Apr 2025

https://github.com/amirzenoozi/persian-news-crawler

Simple Script To Crawl Data From Persian News Agencies Including Fars, Mehr.

cli crawler database fars-news farsi-datasets kaggle-dataset mehr-news news news-agencies newspaper python python3 script shargh-news sqlite3 tensorflow tensorflow2

Last synced: 13 Apr 2025

https://github.com/dotenorio/freeloader-of-data

A simple crawler or scraper to get open graph and other meta data from any website.

crawler graph hacktoberfest meta-data open-graph scraper

Last synced: 13 Mar 2025

https://github.com/AmirAref/DivarCrawler

an script to crawl divar.ir and extract phone numbers

crawler scraper selenium

Last synced: 13 Jul 2025

https://github.com/meysam81/scry

Your website has problems you can't see. Scry finds them. Crawl your entire website across SEO, security, performance, and accessibility. No browser, no subscription.

accessibility cli command-line-tool crawler devops golang hreflang lighthouse link-checker pagespeed sarif security-headers seo seo-tools site-audit structured-data technical-seo web-performance web-security website-audit

Last synced: 14 Jun 2026

https://github.com/eric2788/platformscrawler

多平台爬蟲 + 模塊化管理，用於搜集資料並經 redis pubsub 發送

bilibili crawler crawling pubsub redis twitter youtube

Last synced: 27 Oct 2025

https://github.com/oldkingcone/pbandj

PasteBin Crawler, crawls the url https://pastebin.com/archive

crawler headless headless-chrome python python-crawler selenium-python selenium-webdriver

Last synced: 26 Sep 2025

https://github.com/mcstreetguy/crawler

An advanced web-crawler written in PHP.

composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler

Last synced: 09 Apr 2025

https://github.com/integralist/go-web-crawler

A web crawler built in the Go programming language

concurrency crawler go golang web-crawler

Last synced: 26 Oct 2025

https://github.com/lon9/arxiv-crawler

Crawler for arxiv.org

arxiv crawler golang

Last synced: 24 Jul 2025

https://github.com/surelle-ha/dogma

Dogma is a CLI tool that enables interaction with the GitHub API for the purpose of searching .env files with specified keywords. You can configure a GitHub token and use the crawler to search for keys in .env files across public repositories.

cli crawler github nodejs

Last synced: 22 Jun 2025