Crawler | Ecosyste.ms: Awesome

https://github.com/gnuns/raspa

data mining stuff

crawler robot scraper web-scraper web-scraping web-spider

Last synced: 06 Jul 2025

https://github.com/muhfalihr/pyxdtelebot

PyXDTeleBot is a Telegram bot created using the Python programming language, specifically designed to facilitate the seamless sharing of media such as photos and videos from Twitter user posts.

crawler crawling crawling-python crontab python3 telegram-bot telegram-bot-api twitter twitter-api x

Last synced: 06 Apr 2025

https://github.com/uranusx86/dcard-crawler-analyzer

get Dcard & Meteor forum content and analyze !

crawl crawler dcard nlp python word-cloud word-count word-frequency

Last synced: 14 Jul 2025

https://github.com/tibiasolutions/sharp-parser

Tibia.com parser informations in C#

crawler nuget parsed-data tibia tibia-parser

Last synced: 17 May 2026

https://github.com/matheusfelipeog/google-doodles

Mapeie e faça download dos Doodles do Google.

crawler google google-doodle python web-scraping

Last synced: 13 Jul 2025

https://github.com/panakour/pkscraper

Extract structured data from the web

crawler crawling scraper scraping scraping-websites webcrawler

Last synced: 19 Feb 2026

https://github.com/jimmy-ly00/dhe-prime-grabber

Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.

certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3

Last synced: 26 Dec 2025

https://github.com/elektrostudios/bt4g-torrent-magnet-scraper

Scrapes BT4G magnet links using configurable search and filtering rules.

bt4g command-line console-applications crawler dotnet magnet magnet-link scraper scraping searchengine torrent torrents vbnet web-crawler web-spider webcrawler webspider windows windows-10 windows-app

Last synced: 24 Jun 2026

https://github.com/pjullrich/link-crawler

Python Crawler that reports broken links on a given website and its sup-pages

asyncio breadth-first-search broken-links crawler python

Last synced: 11 Jul 2025

https://github.com/beomi/pycon2017

2017 파이콘 발표자료: <처음부터 알아보는 웹 크롤러>

crawler pyconkr python

Last synced: 10 Jan 2026

https://github.com/sreejoy/crawlerfriend

A light weight crawler which gives search results in HTML form or in Dictionary form, given URLs and keywords.

crawler python-crawler python-scraper python27 scrapper

Last synced: 12 Jun 2025

https://github.com/oglinuk/goccer

Go Concurrent Crawler Library

concurrency crawler go library

Last synced: 06 Jul 2025

https://github.com/purrproof/smartcrawl

An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.

blockchain cli crawler explorer framework go golang hacktoberfest

Last synced: 16 May 2026

https://github.com/henkman/crawlers

:squirrel: some crawlers and downloaders

crawler

Last synced: 28 May 2026

https://github.com/davidkhala/ml

classic AI index

crawler

Last synced: 17 Jan 2026

https://github.com/sinipelto/repo-license-crawler

Collects and summarizes license information on Python and NPM packages into output files.

crawler crawler-python license license-checker license-checking license-crawler license-management licenses licensing nodejs npm npm-license-crawler npm-license-tracker npm-licenses python python-script python3

Last synced: 09 May 2026

https://github.com/panagiks/asset

ASynchronous Spidering Essential Tool (ASSET).

async asyncio crawler graph reporting spider

Last synced: 28 Jul 2025

https://github.com/orsinium-labs/gpcc

Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)

crawler gpc gs1

Last synced: 19 Jun 2025

https://github.com/mindfiredigital/deepscanbot

It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.

bot crawl crawler go golang google webcrawler

Last synced: 10 Aug 2025

https://github.com/sudolife/shopify

An easy-to-use crawler to keep track of reviews of an app on Shopify.

crawler go golang shopify

Last synced: 16 May 2026

https://github.com/mlibre/clean-web-scraper

A Node.js web scraper that extracts clean, readable content from websites - perfect for AI/LLM training datasets. Features smart crawling, Mozilla Readability integration, and organized content storage 🤖

ai artificial-intelligence clean crawler data-preprocessing dataset fine-tuning llm recursive-crawling scraper training

Last synced: 17 Mar 2025

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 14 Feb 2026

https://github.com/lencx/hero-crawler

⚔️ Hero Info(King Of Glory)

crawler hero

Last synced: 01 Jul 2025

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 27 Apr 2026

https://github.com/orafaelfragoso/itunes-crawler

Retrieves information about an artist by crawling the iTunes API and iTunes Page

api crawler itunes itunes-api

Last synced: 31 Jul 2025

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 16 Apr 2026

https://github.com/andmerk93/scrapy_parser_pep

Учебный проект на Scrapy, парсит PEP, выводит в 2х форматах

crawler scrapy

Last synced: 17 Mar 2025

https://github.com/sachin21/dmm-crawler

Fetch DMM.R18's data by crawler. Now, All arts for dojin and eroge is crawlable.

crawler dmm dojin doujin gem ruby

Last synced: 12 Sep 2025

https://github.com/dangdungcntt/crawl-fb-v2

Simple script to detect email and phone from facebook comment.

crawler facebook

Last synced: 26 Apr 2026

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫，並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 03 Oct 2025

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 10 Jul 2025

https://github.com/godbout/htmlpagedom

jQuery-inspired DOM manipulation extension for Symfony's Crawler

crawler dom html htmlpagedom php symfony

Last synced: 14 Jan 2026

https://github.com/ccrashzer0/web_crawler

A python based web crawler

crawler internet python python3 webcrawler

Last synced: 22 Mar 2025

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 10 Jul 2025

https://github.com/turtiesocks/zendriver-rs

Async-first, undetectable browser automation in Rust via the Chrome DevTools Protocol. Stealth-by-default port of zendriver — no WebDriver, no JS shim.

anti-detection async automation bot browser-automation cdp chrome-devtools-protocol chromium cloudflare-bypass crawler headless-chrome playwright-alternative rust scraping stealth tokio undetectable-chromedriver web-scraping web-testing zendriver

Last synced: 13 Jun 2026

https://github.com/homuchen/instagram-crawler

Instagram crawler

crawler instagram nodejs-crawler

Last synced: 24 Mar 2025

https://github.com/machu-gwu/crawlib-project

tool set for crawler project.

crawler framework mongodb python scrapy

Last synced: 20 Sep 2025

https://github.com/scrwdrv/siege-crawler

This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.

benchmark cli crawler ddos debug siege tool

Last synced: 05 Apr 2025

https://github.com/meilisearch/actions

Meilisearch Github Actions

crawler meilisearch

Last synced: 26 Jun 2025

https://github.com/gozeon/weibo-crawler

微博爬虫

crawler web-crawler

Last synced: 21 Mar 2025

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 20 Apr 2026

https://github.com/eea/eea-crawler

EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).

airflow-dags crawler elasticsearch etl-pipeline indexing

Last synced: 20 May 2026

https://github.com/greatdrake/contributecounter

crawl Wikipedia for contributers

crawler python scraping

Last synced: 02 Apr 2025

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 17 May 2026

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Jul 2025

https://github.com/hamidrabedi/digikala-crawler

a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls

crawler digikala digikala-crawler django python scraper

Last synced: 16 May 2026

https://github.com/arshadkazmi42/gh-crawl

Crawler for Github repositories. Finds all the broken links from the repositories

bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python

Last synced: 20 Jan 2026

https://github.com/camilamaia/crawl4us

[WIP] A Python web crawler looking wildly for tables 🕵️‍♀️

beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping

Last synced: 28 Mar 2025

https://github.com/greycloudss/greave

Greave is a fast, multi-mode scanner for locating sensitive information in both local filesystems and Confluence pages.

armourer confluence crawler python reconnaissance security

Last synced: 07 Oct 2025

https://github.com/victorpre/erlich

Erlich Bachman - Hacker Hostel

chatbot crawler elixir housing umbrella

Last synced: 28 Mar 2025

https://github.com/rxcai/python3-weibo-crawler

基于Python3实现的微博小爬虫

crawler python python3 spider weibo

Last synced: 22 Mar 2025

https://github.com/jiamingla/mvdis_i18n

機車駕照預約考試多語友善版 Non-official

crawler jquery koa koajs nodejs supertest

Last synced: 04 Jan 2026

https://github.com/devidw/google-untitled-spam-spider

A spam spider which is targeting 'Untitled' spam pages from the Google search results.

crawler crawling crawling-algorithm crawling-python crawling-sites crawling-tool google-untitled python python3 spam spam-detection spammer untitled untitled-spam

Last synced: 28 Mar 2025

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 09 Apr 2025

https://github.com/shiritai/wallpaper_master

My first individual project!

crawler file-explorer javafx-application maven-shade mini-system wallpaper wallpaper-master

Last synced: 16 May 2026

https://github.com/dhsagaryt/multisearch

Search efficiently across different platforms with ease. Type your query and choose from multiple search engines, streamlining your experience.

browser crawler internet search search-algorithm search-engine searchbar searchengine webcrawler

Last synced: 14 Feb 2026

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 15 Jun 2026

https://github.com/exp-codes/pyzone-crawler

QQ空间爬虫（Python版）

crawler programming

Last synced: 03 Apr 2025

https://github.com/willi-dev/dtcapp

dtcapp : distributed twitter crawler.

crawler distributed-systems hazelcast java twitter twitter-api

Last synced: 18 Sep 2025

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 17 Sep 2025

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 08 Aug 2025

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 08 Oct 2025

https://github.com/basemax/rondircrawler

A crawler for extracting a list of top sim cards and tel numbers from the Rond.ir website. (PHP)

crawle-php crawler crawler-testing crawlers crawlers-php php php-crawler rondir

Last synced: 03 Apr 2025

https://github.com/hudson-newey/user-web-crawler

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.

archive crawler open-internet

Last synced: 27 Feb 2025

https://github.com/udaykiran2017/seo-reports

📊 Generate and analyze SEO reports effortlessly to enhance your website's visibility and performance across search engines.

audit broken-links cli crawler extraction google-lighthouse hreflang-checker hreflang-matrix puppeteer scan-website searchengineoptimization seo seo-macroscope seo-manager seo-meta seo-optimization web-scraping webmaster

Last synced: 16 May 2026

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 27 Mar 2025

https://github.com/gesugao-san/pcgw-crawler

Digital assistant for working hard on PCGW.

bad-code bad-coding-style crawler javascript js nodejs pcgamingwiki pcgw shitty spaghetti-code

Last synced: 12 Apr 2026

https://github.com/abdus/scrape-web

A simple web scrapper for Node.js

crawler web-scraping web-scrapper

Last synced: 25 Mar 2025

https://github.com/developerjosh/gogo-crawler

The tool kit for making an anime website with a database full of anime

crawler crawler-js gogoanime gogoanime-api gogoanime-scraper

Last synced: 07 Aug 2025

https://github.com/thiagopanini/datadelivery

Um módulo Terraform open source capaz de proporcionar um toolkit completo de infraestrutura para que usuários iniciem suas respectivas jornadas de exploração em serviços de Analytics na AWS.

analytics athena aws catalog crawler data datamesh glue s3 terraform

Last synced: 29 Nov 2025

https://github.com/vinzdef/apartment-crawler

Crawler for housing in the Netherlands. It scrapes FB groups and Kamernet listings

amsterdam crawler fb-groups housing phantomjs

Last synced: 31 Mar 2025

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 25 Jul 2025

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 04 May 2026

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 10 May 2026

https://github.com/copha-project/copha

Open-Source Software For Managing Tasks

crawler framework nodejs puppeteer selenium

Last synced: 14 Apr 2026

https://github.com/baerwang/sec_craw

一个方便安全研究人员获取每日安全日报的爬虫，目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客，持续更新中。

crawler security security-tools threat threat-intelligence

Last synced: 04 Jul 2025

https://github.com/zabuzard/wslotter

WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.

bot crawler gruppe-w

Last synced: 10 Oct 2025

https://github.com/fnkr/gocrawl

Simple web crawler.

crawler http-client

Last synced: 23 Mar 2025

https://github.com/jovijovi/ether-crawler

A transaction crawler for the Ethereum ecosystem.

blockchain crawler ether ethereum transaction

Last synced: 08 May 2026

https://github.com/nirjharlo/complete-google-seo-scan

WordPress Plugin with inbuilt SEO crawler

crawl-pages crawler seotools web-crawler web-spider wordpress wordpress-plugin

Last synced: 12 Oct 2025

https://github.com/trixsec/zeuscrawler

The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.

crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper

Last synced: 07 Apr 2025

https://github.com/basemax/doostihaacrawler

A PHP-implemented crawler for Doostihaa.com. (Database of thousands of movies)

crawler crawler-example crawler-php crawler-testing crawlers database-movie database-movies doostihaa doostihaa-com movie movie-database movie-database-api movie-database-website movie-db movies movies-database php php-crawler

Last synced: 13 Sep 2025

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 05 Sep 2025

https://github.com/okwilkins/web-crawler

This program will crawl through entire domains, exporting every link it can find into a txt file.

crawler crawling files html htmlparser python python3 reader scraper threading threads web writer

Last synced: 14 Mar 2025

https://github.com/mdazlaanzubair/amazon-scraper-api

A web scraper to crawl on amazon to extract products information and return in JSON format.

amazon crawler expressjs json-api nodejs webscraping

Last synced: 14 Apr 2026

https://github.com/dineshsprabu/concurrent-web-crawler

Flexible and concurrent web crawler implemented in 'go'

concurrent-web-crawler crawler go-crawler spider web-crawler

Last synced: 12 Jan 2026

https://github.com/bac0id/wayback-machine-auto-save

A crawler to save web pages on list to Save Page Now of Internet Archive's Wayback Machine.

crawler internet-archive python save-page-now wayback-machine

Last synced: 28 May 2026

https://github.com/roswelly/solana-transaction-crawler

crawl & parse solana transaction

crawler parser rust solana transaction

Last synced: 15 May 2026

https://github.com/im-perativa/public_crawler

A collection of crawler project for Indonesia dataset

crawler indonesia indonesia-api scrapy

Last synced: 20 Mar 2025

https://github.com/phanikmr/linkcrawler

A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.

async crawler linkcrawler parse python scrapy spider