Crawler | Ecosyste.ms: Awesome

https://github.com/jjpaulo2/crawler-financeiro

Módulo em Python que extrai dados públicos de planos de previdência do portal da SUSEP.

crawler docker ocr python selenium tesseract

Last synced: 11 Jul 2025

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 07 Jun 2026

https://github.com/raspi/scrapy-amp

Crawler for Amiga Music Preservation (AMP) site

amiga crawler mod module music python s3m scrapy spider tracker

Last synced: 11 Jul 2025

https://github.com/blarc/windsurf-crawler

A simple crawler that collects windsurf boards offers from different sites.

crawler windsurf

Last synced: 10 Sep 2025

https://github.com/sc0vu/gocrawl

Simple crawl for golang

crawler golang

Last synced: 23 Jul 2025

https://github.com/claudio-code/nap-web-crawler

Created It crawler to find broken links in docs of framework and languages

crawler

Last synced: 07 Jul 2025

https://github.com/jregistr/laker-parser

Small program to scrape and sanitize scheduling data.

crawler gradle htmlunit lakers oswego scraping suny

Last synced: 16 May 2026

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 29 Apr 2026

https://github.com/berecat/selenium_facebook_scraper

A simple python3 script used to download a users's friend list from facebook.

automation crawler facebook facebook-scraper webscraper

Last synced: 24 Jul 2025

https://github.com/microlinkhq/cloudflare-bot-directory

CloudFlare Radar verified bots directory – 500+ web crawlers and user agents as JSON.

bot-detection bots cloudflare cloudflare-radar crawler crawlers dataset datasets googlebot user-agent user-agents user-agents- verified-bots web-crawler web-scraping

Last synced: 20 Apr 2026

https://github.com/juangesino/ah-bonus-crawler

React + Express application that crawls Albert Heijn's promotions.

crawler crawling express expressjs headless-chrome nodejs react reactjs

Last synced: 06 May 2026

https://github.com/basemax/okala-database-crawler

A robust, UTF-8 compliant PHP-based crawler designed to extract structured product data from Okala. This tool efficiently scrapes and saves store information, category slugs, and detailed product listings into organized JSON files. Ideal for data analysis, backup, or integration into other systems.

crawler crawler-php curl data json okala okala-com okalacom php php-crawler scraper

Last synced: 01 May 2026

https://github.com/aminehsan/datamining-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scraping

Last synced: 25 Jul 2025

https://github.com/pjt3591oo/python-parse

this are modules for url pasing

crawler

Last synced: 04 Aug 2025

https://github.com/victorbaumgartner/electron_app

Testing electron app for macOS

crawl4ai crawler electron mac multithreading python3 scraping sitemaps

Last synced: 06 May 2026

https://github.com/prorobot-ai/worker

A concurrent web worker written in Go (Golang) designed to crawl websites efficiently while respecting basic crawling policies. The worker stops automatically after crawling a specified number of links (default: 64).

crawler golang grpc-server scraper

Last synced: 29 Jul 2025

https://github.com/eneax/web-crawler

A web crawler built in Node.js

crawler javascript nodejs web-crawler

Last synced: 15 Apr 2026

https://github.com/sauerbraten/monzter

Link crawler with configurable maximum depth and rate limiting

crawler go golang web-crawler

Last synced: 23 May 2026

https://github.com/kahsolt/tieba-dl

A simple image crawler/downloader for Baidu tieba.

baidu-tieba crawler image-crawler tieba

Last synced: 12 Jun 2026

https://github.com/istador/mediaindexer

Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.

crawler website

Last synced: 03 Jan 2026

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 30 Jul 2025

https://github.com/izh318/genie-music-artist-album-crawler

지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.

crawler genie genie-music gui

Last synced: 08 Nov 2025

https://github.com/shashankgroovy/crawler

Python crawler

crawler python webcrawler

Last synced: 30 Jul 2025

https://github.com/cristiangreco/gcrawler

A simple (not concurrent) web crawler written in Java.

crawler java

Last synced: 30 Jul 2025

https://github.com/imrany/spindle

An open-source, lightweight web crawler and scraper. It can discover links on the web (crawler) and extract structured data from webpages (scraper).

crawler go golang scraper

Last synced: 24 Sep 2025

https://github.com/dappros/site_crawler

Site crawler used in Ethora platform as an option to import your specific business data into your AI agent chat bot.

crawler data-ingestion embedding-vectors embeddings ethora llm rag retrieval-augmented-generation retrieval-based-chatbots retrieval-chatbot semantic-search site-crawler vectorstore web-scraping website-indexing

Last synced: 20 Jan 2026

https://github.com/Mahdijamebozorg/CryptoFundamentalAnalyzer

An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.

crawler crypto cryptocurrency data-mining datamining information-retrieval llm python

Last synced: 25 Sep 2025

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 02 Aug 2025

https://github.com/is0383kk/ai-docs-sync-workflow

claude claude-code crawler github-actions python python-script workflow

Last synced: 16 May 2026

https://github.com/juliocesarscheidt/stock-trader

aws-alb aws-ecs aws-xray crawler flask github-actions mongodb python rabbitmq terraform

Last synced: 09 Apr 2026

https://github.com/tom-draper/wiki-crawl

A game of path finding through Wikipedia topics.

api crawler crawlers crawling crawling-python game pathfinding python requests wiki wikipedia wikipedia-api wikipedia-search

Last synced: 09 Mar 2026

https://github.com/seart-group/github-keyword-crawler

A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints

api-mining crawler dockerized github-api miner mongodb-database python-script

Last synced: 04 Aug 2025

https://github.com/whateverzpy/douban_comments

HITSZ 2025 秋季的大数据导论课程作业内容

bigdata crawler scrapy

Last synced: 01 Oct 2025

https://github.com/marceloneppel/crawler

Simple web crawler developed in Go.

crawler go golang web-crawler

Last synced: 07 Aug 2025

https://github.com/pixlcrashr/stwhh-mensa

Better STWHH Mensa menu data / interface / notifier

api crawler data food studierendenwerk-hamburg university website

Last synced: 07 Aug 2025

https://github.com/alonecandies/golwarc

All-in-One crawlers for Golang

crawler crawling go golang scraper scraping

Last synced: 12 Jan 2026

https://github.com/dyslab/otglite

Online TXT Grabee Lite Edition :bee:

crawler expressjs jquery nodejs sqlite3

Last synced: 09 Apr 2026

https://github.com/anshiii/pixder

🤔 A spider for pixiv.net

crawler pixiv spider

Last synced: 09 Aug 2025

https://github.com/iamkushvanth/real-time-data-analysis-using-kafka

In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.

athena aws aws-ec2 aws-s3 crawler glue kafka kafka-consumer python sql

Last synced: 18 Jun 2026

https://github.com/clumsyme/ziroom_watcher

crawler email python ziroom

Last synced: 12 Apr 2025

https://github.com/dylancl/sitemap-crawler

Verify the status of each url in a (hosted) sitemap XML file.

crawler parser scraper sitemap xml

Last synced: 04 Oct 2025

https://github.com/jul10l1r4/objetive

This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*

bigdata crawler data-science security-tools web

Last synced: 12 Aug 2025

https://github.com/casoon/astro-crawler-policy

Policy-first crawler control for Astro — generates robots.txt and llms.txt with presets, per-bot rules, AI crawler registry, and build-time audits.

ai-crawler astro astro-integration crawler llms-txt robots-txt seo typescript

Last synced: 24 May 2026

https://github.com/uinaf/lincrawl

Local-first Linear work-graph archive CLI

age-encryption archive cli crawler crawlkit linear sqlite

Last synced: 24 May 2026

https://github.com/billy0402/tibame-python-data-analysis

A learning project from TibaMe Python data analysis course.

ai course crawler jupyter-notebook matplotlib pandas python requests

Last synced: 10 Apr 2026

https://github.com/hong539/ip_lookup

For ip_lookup with some Public or Private API

crawler ipv4 ipwhois python

Last synced: 19 Aug 2025

https://github.com/luickk/vulnerability-crawler

Small python program meant to analyze random sites found on google for any vulnerabilities!

crawler xss

Last synced: 20 Aug 2025

https://github.com/ronniery/crawler.synom

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml

Last synced: 10 Apr 2026

https://github.com/mohitk05/drstrange

A simple breadth-first search web crawler

bfs crawler

Last synced: 22 Aug 2025

https://github.com/kevincolemaninc/mm-crawler

Scrapes meetme user profiles

crawler docker fake-data meetme ruby scraper sidekiq

Last synced: 07 May 2026

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 24 Aug 2025

https://github.com/hoosnick/olx-parser

OLX Real Estate Parser

crawler olx

Last synced: 25 Aug 2025

https://github.com/kahsolt/qzone_mood_dumper

Dump your qzone mood(说说) history to local SQL database storage

crawler dumper qzone-mood

Last synced: 25 Aug 2025

https://github.com/ferru97/jsketchfabcrawler

jSketchfabCrawler is a java for the automatic crawling of model's information from sketchfab.com

crawler data database java sketchfab sql

Last synced: 03 Jan 2026

https://github.com/orkan/tlc

Simple PHP/cURL/FlareSolverr framework with Logger, Cache and more!

crawler curl flaresolverr net scrap

Last synced: 27 Aug 2025

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 12 Apr 2026

https://github.com/jmousqueton/check-broken-link

Multi-threaded Python tool for crawling and checking all internal links on a website, with live Rich dashboard, broken link export (CSV), and detailed source tracking.

check crawler error400 error404 error500 links

Last synced: 29 Aug 2025

https://github.com/orshahar91/crawler

Simple Web Crawler

crawler crawling-websites image-crawler java servlets webcrawler

Last synced: 11 Nov 2025

https://github.com/tormol/zenphoto-dl

A script for recursively downloading all pictures from zenphoto-based photo albums.

crawler python-script

Last synced: 30 Aug 2025

https://github.com/diegojromerolopez/relwrac

A basic crawler developed with python and asyncio

asyncio crawler page-rank python

Last synced: 11 Nov 2025

https://github.com/chenbingwei1201/threads_scraper

A Python package for scraping Threads posts.

chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites

Last synced: 03 Feb 2026

https://github.com/fusetim/bitcrawler

Small experiments to learn a bit more about BitTorrent, DHT and etc. Might also be a BitTorrent DHT crawler one day?

bittorrent crawler dht

Last synced: 30 Mar 2025

https://github.com/tech-espm/misc-webbot

This project is aimed on creating personal assistants for replying messages about specifics issues.

classification-model crawler nlp

Last synced: 12 Jun 2026

https://github.com/alphabs/navercafeclient

네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리

crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping

Last synced: 06 May 2026

https://github.com/govau/warcraider

Convert WARC files into Avro for big data processing

avro bigquery crawler rust warc

Last synced: 16 May 2026

https://github.com/jiusanzhou/reaper

Distributed Elegant Scraper and Crawler Framework for Rust.

crawler data-scraping rust scraper spider

Last synced: 24 Jul 2025

https://github.com/lulurun/kick-off-crawling

make web scraping easy

crawler nodejs scraper

Last synced: 01 May 2026

https://github.com/burakkaygusuz/web-security-scanner

A Java-based web security browser, it detects common web vulnerabilities such as SQL Injection, XSS and sensitive information disclosure.

crawler java vulnerability-scanner web-security xss

Last synced: 16 May 2026

https://github.com/smikodanic/dex8-sdk

DEX8 SDK is software development kit for DEX8.com platform.

crawler crawler-engine data-extraction dex8 scraper scraping-websites spider

Last synced: 11 Jul 2025

https://github.com/fengzixu/crawlinganything

如果你对数据有兴趣，那么就应该立即行动起来

crawler python

Last synced: 15 Jun 2026

https://github.com/fritz-c/itunes-stats

Fetch info on podcasts, etc. from iTunes RSS data

crawler itunes

Last synced: 18 Jun 2026

https://github.com/andrefs/derzis

A path-aware distributed linked data crawler

crawler linked-data

Last synced: 09 Aug 2025

https://github.com/jonesrussell/pipelinex

Firecrawl-style web intelligence pipeline powered by North Cloud

crawler pipeline vue

Last synced: 09 Mar 2026

https://github.com/moparisthebest/nginx-limit-crawlers

rate limit crawlers in nginx

ai crawler nginx

Last synced: 14 Mar 2025

https://github.com/engineer2b/cure_crawl

Cure afvalbeheer kalender crawler

afval afvalwijzer browser crawler kalender

Last synced: 22 Oct 2025

https://github.com/patrickschababerle/schabbi-webscraper

Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites.

crawler puppeteer scraper scraping web-crawler

Last synced: 04 Apr 2025

https://github.com/crosscutsaw/iscsicrawler

iscsicrawler is a bash script that crawls files in the iscsi targets with ease.

crawler iscsi iscsi-target iscsiadm

Last synced: 16 Jan 2026

https://github.com/alphadev3296/scrap-www.floridabar.org

automation crawler csv playwriht python scraper selenium xlsx

Last synced: 26 Dec 2025

https://github.com/tetreum/xupopter_chrome_extension

Extension to easily create crawling recipes

crawler scrapper scrapping webscraper

Last synced: 04 Apr 2025

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 10 Jun 2026

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 22 May 2026

https://github.com/tungct/tngtcrawler

Crawler using Scrapy

crawler python scrapy

Last synced: 29 May 2026

https://github.com/abx123/crawler

Simple lambda function to crawl daily web novel updates.

crawler firebase-database golang lambda-functions

Last synced: 28 Mar 2025

https://github.com/insectmk/douban-crawler

豆瓣电影Top250爬虫及数据展示

analysis crawler django echarts mysql python3 website

Last synced: 10 Mar 2026

https://github.com/leshniak/robotstxt-debug

A tool for debugging robots.txt

crawler debugger indexing robots-txt seo seo-optimization seo-tools tester

Last synced: 25 Jun 2025

https://github.com/ghsaboias/alpha-agent

An intelligent web research assistant that combines web crawling, search functionality, and AI-powered analysis using Anthropic's Claude API.

ai claude crawler search web

Last synced: 14 Mar 2025

https://github.com/rabattkarte/free-domain-scanner

crawler dns domain domain-name domain-names go golang scanner whois

Last synced: 26 May 2026

https://github.com/sanskar107/c-subject-predictor

Predicts topic of a code.

crawler nlp rnn

Last synced: 14 Mar 2025

https://github.com/pmuens/crawler

Multi-threaded Web crawler with support for custom fetching and persisting logic

crawler crawler-engine rust rust-lang web-crawler web-crawling

Last synced: 15 May 2025

https://github.com/lesterrry/mutt

More Usable Time Tracker

crawler ios-calendar parser

Last synced: 15 Jul 2025

https://github.com/agucova/needs-seeding

🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.

crawler sci-hub torrents

Last synced: 12 Oct 2025

https://github.com/bockstaller/europarl-crawler

Crawler for the documents published by the European Parliament

crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union

Last synced: 16 May 2026

https://github.com/mohammadreza-mohammadi94/python-webscraper-projects

A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.

bs4 crawler object-oriented-programming python requests scrapy webscraping

Last synced: 13 Jul 2025

https://github.com/frostming/daily-wallpaper

A small crawler to get wallpapers from Unsplash

crawler python requests unsplash wallpaper

Last synced: 20 Mar 2025

https://github.com/jyasskin/pbot-crawler

Crawler for PBOT's website to show what has changed.

crawler

Last synced: 23 Mar 2025

https://github.com/lillyschramm/spiegel.de-miner

A bot that automatically saves any posts created at Spiegel.de

crawler spiegel-online

Last synced: 01 Sep 2025

https://github.com/gnehs/twse-financial-ratios-crawler

透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊，並自動計算平均

crawler nodejs

Last synced: 29 Apr 2026

https://github.com/jayzhan211/python-crawler-startups

python crawler learning

crawler python

Last synced: 20 Mar 2025

https://github.com/phanletrunghieu/webcrawler

A web crawler with Spring MVC

crawler java servlet spring-mvc springframework

Last synced: 23 Mar 2025

https://github.com/dinofizz/sitemapper

sitemapper is a site mapping tool which provides a JSON output listing each internal URL and the internal links found at that URL. The crawl depth is configurable, as well as the mode of operation: "synchronous", "concurrent" and "concurrent limited". The tool runs stand-alone or as a distributed crawl engine running in a Kubernetes cluster.

astradb cassandra concurrency crawler go golang kubernetes nats sitemap

Last synced: 16 Jan 2026