Crawler | Ecosyste.ms: Awesome

https://github.com/bitxx/pholcus

对基于golang的henrylee2cn/pholcusl爬虫框架的修复和完善，满足自身需要

crawler golang pholcus

Last synced: 21 Nov 2024

https://github.com/xiongwilee/techweekly

高可配的技术周报邮件推送工具

crawler nodejs techweekly

Last synced: 08 Nov 2024

https://github.com/wwwwwydev/crawlist

A universal solution for web crawling lists

crawl crawler crawler-python python reptile

Last synced: 12 Nov 2024

https://github.com/rzo1/crawler4j

Open Source Web Crawler for Java - A maintained fork of yasserg/crawler4j

crawler crawler4j java spider web-crawler web-spider

Last synced: 29 Sep 2024

https://github.com/alanshaw/libp2p-dht-scrape-aas

🧹 A libp2p DHT scraper as a service allowing anyone to collect, consume and use to generate useful reports & visualisations.

crawler dht kademlia libp2p p2p scraper

Last synced: 31 Dec 2024

https://github.com/tokahuke/lopez

Crawling and scraping the Web for fun and profit

crawler rust scraper seo web-scraping

Last synced: 14 Nov 2024

https://github.com/capjamesg/indieweb-search

Source code for the IndieWeb search engine.

crawler indieweb search search-engine

Last synced: 16 Nov 2024

https://github.com/mechazawa/redbetter-wm2

Better.php crawler for Redacted that uses WhatManager

crawler flac redacted seedbox transcoding whatcd whatmanager

Last synced: 06 Nov 2024

https://github.com/Actomaton/ActoCrawler

🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.

crawler swift

Last synced: 29 Nov 2024

https://github.com/RuedigerVoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 08 Nov 2024

https://github.com/thaoshibe/crawl-original-google-images

python scripts for crawling original image from Google Images

chrome-extension crawler crawling crawling-python google google-images pafy scraper youtube youtube-dl youtube-search

Last synced: 11 Oct 2024

https://github.com/nvk681/gumo

A crawler that extracts data from a dynamic webpage. Written in node js.

crawler elasticsearch neo4j nodejs

Last synced: 11 Oct 2024

https://github.com/tokenmill/crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

crawler crawling crawling-framework elasticsearch java scraping storm storm-crawler vaadin

Last synced: 10 Nov 2024

https://github.com/yokawasa/scrapy-azuresearch-crawler-samples

Scrapy as a Web Crawler for Azure Search Samples

azure azure-search crawler python python3 scrapy search

Last synced: 30 Oct 2024

https://github.com/gruppio/slackwebhooksgithubcrawler

Search for Slack Webhooks token publicly exposed on Github

crawler crawling hack messages nodejs puppeteer slack slack-bot slack-webhook slackbot webhook

Last synced: 16 Nov 2024

https://github.com/fanhuaandluomu/qqspider

爬取QQ用户信息（qq号、昵称、生日、地址等基本信息）并做简要analysis。

crawler python qq spider

Last synced: 12 Nov 2024

https://github.com/asing1001/movierater

A useful website for finding movie's rating in Chinese and English. By crawling Yahoo, Ptt, IMDB.

apollo-client chai crawler graphql material-ui mocha mongodb movies nodejs reactjs redis server-side-rendering service-worker sinon typescript

Last synced: 07 Nov 2024

https://github.com/norconex/collector-filesystem

Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.

crawler filesystem-crawler java norconex-filesystem-collector search-engine

Last synced: 11 Nov 2024

https://github.com/capturr/scraper

All In One API to easily scrape data from any website, without worrying about captchas and bot detection mecanisms.

captcha cheerio crawler crawling data declarative extract growth-hacking html javascript json jsonld nodejs recaptcha scraper scraping spider typescript web web-scraping

Last synced: 06 Dec 2024

https://github.com/ruedigervoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 08 Nov 2024

https://github.com/casprwang/sse-option-crawler

SSE 50 index options crawler 上证50期权数据爬虫

crawler python python3 sina stock stock-market stocks

Last synced: 07 Dec 2024

https://github.com/petehouston/udemy-crawler

Crawling Udemy course info and save into JSON format.

crawler crawling node node-cli udemy udemy-api udemy-crawl

Last synced: 23 Oct 2024

https://github.com/fmw666/python

🍋 Python基础、Pygame游戏编程、Python算法与面试题、四种常用的Python Web框架、爬虫、数据可视化、机器学习。一共七个Python大方向！

algorithm basis crawler files gui learning-notes markdown pygame pyqt5 python3 script web

Last synced: 18 Dec 2024

https://github.com/loomisloud/onion-crawler

Tor website crawler (specific for Alphabay at the time)

crawler onion parser python tor

Last synced: 17 Nov 2024

https://github.com/p0dalirius/crawlersuseragents

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

bugbounty crawler crawlers pentest request tool user-agent web

Last synced: 30 Dec 2024

https://github.com/xiyuan-fengyu/ppspider_example

ppspider爬虫例子，B站视频信息及评论爬取，qq音乐信息及评论爬取，推特主题评论和用户信息爬取

bilibili cheerio crawler ppspider puppeteer qq-music spider twitter

Last synced: 07 Nov 2024

https://github.com/s045pd/sharingan

We will try to find your visible basic footprint from social media as much as possible - 😤 more sites is comming soon

asyncio crawler httpx python38 social-network

Last synced: 07 Nov 2024

https://github.com/waynechang65/ptt-crawler

ptt-crawler is a web crawler module designed to scarpe data from Ptt.

crawler javascript nodejs ptt scraper scraping spider web-crawler webcrawler

Last synced: 19 Oct 2024

https://github.com/zyszys/zhengfang_system_spider

:bug:一只登录正方教务管理系统，爬取数据的小爬虫

crawler python spider zhengfang

Last synced: 19 Nov 2024

https://github.com/fernandod1/producthunt-scraper

Producthunt.com famous website scraper script. Scrap all offers and save in spreadsheet excel file.

crawler crawling crawling-sites data-mining datamining producthunt producthunt-api producthunt-users python python-script python3 scrape scraped-data scraper scraper-engine scraping scraping-bot scraping-python scraping-tool scraping-websites

Last synced: 12 Nov 2024

https://github.com/sigoden/rag-crawler

Crawl a website to generate knowledge file for RAG

crawler knowledge llm rag

Last synced: 06 Dec 2024

https://github.com/chairco/2017_pycontw_talk

crawler django django-q pycontw scheduled-tasks task

Last synced: 25 Nov 2024

https://github.com/archiveteam/webarchiver

Decentralized web archiving

archiver archiving crawler decentralized python warc web webarchiving

Last synced: 19 Nov 2024

https://github.com/ArchiveTeam/WebArchiver

Decentralized web archiving

archiver archiving crawler decentralized python warc web webarchiving

Last synced: 06 Nov 2024

https://github.com/iflycn/hero

百万英雄答题助手 - 兼容全部答题 APP

adb android crawler orc python3

Last synced: 20 Nov 2024

https://github.com/tower1229/crawler

Nodejs crawler for cnbeta.com

crawler nodejs

Last synced: 14 Oct 2024

https://github.com/mediamonks/crawler

Crawl your own website with various clients for SEO and indexing purposes.

browserkit crawler crawling php prerender prerenderio seo spider

Last synced: 03 Dec 2024

https://github.com/paambaati/websight

🕷A simple but *really* fast crawler built with Node.js & TypeScript

coding-challenge crawler interview-questions javascript monzo nodejs typescript

Last synced: 03 Dec 2024

https://github.com/wahengchang/node-dcard-scraper

it is an example of implementing cheerio scraper of extracting images in dcard

cheerio crawler dcard example javascript nodejs npm scraper tutorial

Last synced: 08 Dec 2024

https://github.com/inspirehep/hepcrawl

Scrapy project for feeds into INSPIRE-HEP

crawler harvest-data publishing python

Last synced: 12 Jan 2025

https://github.com/lupino/grapy

Grapy, a fast high-level web crawling framework for Python 3.3 or later base on asyncio.

crawler python-library python3 spider

Last synced: 21 Nov 2024

https://github.com/cristianzsh/python-hacking-tools

Python tools for ethical hacking

arp-spoofing backdoor code-injection crawler dns interceptor keylogger mac malware network packet python scanner scapy scapy-arp send-email sniffer spoofing tool tools

Last synced: 17 Nov 2024

https://github.com/twtrubiks/youtube-trends-spider

crawler youtube trends use selenium on python

crawler python selenium tutorial youtube-trends-spider

Last synced: 16 Nov 2024

https://github.com/smolijar/offensive-fortune

A script for generating fortune cookie from the the funniest and most offensive stuff collected off the Internet.

crawler fortune fortune-cookie vilejoke

Last synced: 07 Nov 2024

https://github.com/DiscovAI/DiscovAI-crawl

🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.

ai api crawler embedding vector-database web-scraping

Last synced: 06 Jan 2025

https://github.com/josecelano/my-favourite-appliances

Laravel CRUD sample

crawler crud laravel sample

Last synced: 29 Oct 2024

https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel

A toolkit for Spatie's Crawler and Laravel.

crawler laravel laravel-crawler php-crawler php-scraper spatie-crawler

Last synced: 12 Nov 2024

https://github.com/PadishahIII/SecretScraper

SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.

crawler cyper hyperscan pentest-tool pentesting python sensitivity-analysis webscraper

Last synced: 04 Dec 2024

https://github.com/bkeepers/spiderman

your friendly neighborhood web crawler

crawler crawler-engine http httprb nokogiri ruby spider spider-framework web-crawler web-scraping webcrawler webscraping

Last synced: 28 Dec 2024

https://github.com/mauriceconrad/xml-parser

A Node.js XML DOM, Parser & Stringifier.

crawler crawling dom html html-parser html-parsing xml xml-parser xml-parsing xml-schema

Last synced: 28 Oct 2024

https://github.com/alinebastos/crawler

Web Crawler created with Node.js and Puppeteer

crawler fs javascript nodejs puppeteer scraping

Last synced: 05 Nov 2024

https://github.com/nothing12321/proxy-grabber

Python-based Massive Proxy Grabber. This bot grabs proxies from public websites so you can use them.

bot checker crawler grabber javascript parser proxies proxies-scraper proxy proxy-checker proxy-list proxy-parser proxy-scraper proxy-scrapper proxy-tool proxygrabber python socks socks4 socks5

Last synced: 28 Nov 2024

https://github.com/enijkamp/supermonkey

A crawler for automated Android UI testing.

ai android crawler

Last synced: 09 Nov 2024

https://github.com/omkarcloud/botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping

Last synced: 08 Nov 2024

https://github.com/pourmand1376/PersianCrawler

Open source crawler for Persian websites.

crawler machine-learning news python scrapy tasnim text-classification

Last synced: 20 Nov 2024

https://github.com/pourmand1376/persiancrawler

Open source crawler for Persian websites.

crawler machine-learning news python scrapy tasnim text-classification

Last synced: 11 Oct 2024

https://github.com/knovour/json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

crawler javascript jquery json web-crawler

Last synced: 27 Nov 2024

https://github.com/Knovour/json-web-crawler

Use JSON to list all elements (with css 3 and jquery selector) that you want to crawl.

crawler javascript jquery json web-crawler

Last synced: 16 Nov 2024

https://github.com/tim-saijun/gpt-web-crawler

A web crawler for GPTs to build knowledge bases 用于GPT构建知识库的网站爬虫

chatgpt crawler gpt-crawler knowledge-base

Last synced: 21 Nov 2024

https://github.com/racinmat/premium-downloader

crawler pornhub pornhub-downloader python

Last synced: 06 Nov 2024

https://github.com/neuralegion/bright-cli

Command Line Interface (CLI) tool for NeuraLegion's solutions.

api cli crawler cyber-security devops har nexploit oas secops security typescript

Last synced: 08 Jan 2025

https://github.com/lixi5338619/lxparse

用于解析列表页链接和提取详细页内容的库

crawler htmlparse python

Last synced: 05 Nov 2024

https://github.com/vignif/crawler-google-scholar

This bot crawls and downloads statistics and pictures from google scholar's researchers.

crawler downloading-statistics google-scholar indexes statistics

Last synced: 06 Nov 2024

https://github.com/twiny/wbot

A simple & efficient web crawler.

big-data crawler golang scraper seo spider

Last synced: 17 Dec 2024

https://github.com/ariya/penjabarberita

Extract the article list from its raw news HTML

articles cheerio crawler headlines html indexer indonesia news scraper spider

Last synced: 08 Jan 2025

https://github.com/chainski/chino-proxy-scraper

A python script that scrape proxies from frequently updated proxy sources.

crawler http https proxies proxy proxy-api proxygrabber proxyscrape-api proxyscraper proxytool python python3 scraper socks4 socks5

Last synced: 10 Nov 2024

https://github.com/achannarasappa/locust

Distributed web data discovery and collection framework built for serverless

aws-lambda crawler locust scraping serverless

Last synced: 18 Nov 2024

https://github.com/shadawck/recon-archy

Linkedin Tools (and maybe later other source) to reconstruct a company hierarchy from scraping relations and jobs title

automation company-data crawler cybersecurity geckodriver golang linkedin organisational-analysis osint osinttool reconnaissance scraper selenium

Last synced: 15 Nov 2024

https://github.com/cable8mm/water-melon

Water Melon is simple melon.com api sdk for php

composer crawler kpop laravel melon package php

Last synced: 12 Oct 2024

https://github.com/twtrubiks/eynycrawlermega

eyny 電影 Mega and Google 連結爬蟲 use python

crawler eyny mega python

Last synced: 16 Nov 2024

https://github.com/kasthack-labs/kasthack.osp

Генератор сырых дампов пользователей VK.

crawler crawling data-mining kasthack programmable-web vk vk-api vkapi vkontakte

Last synced: 26 Sep 2024

https://github.com/MontFerret/worker

Containerized Ferret worker

chrome crawler docker dsl ferret go hacktoberfest hacktoberfest2020 scraping scraping-websites service worker

Last synced: 04 Nov 2024

https://github.com/fooock/robots.txt

:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API

antlr4 api crawler crawler-engine docker docker-compose gradle java kotlin makefile postgresql redis redis-stream redis-streams robots-parser robots-txt spiders spring-boot

Last synced: 27 Oct 2024

https://github.com/Selbi182/SpotifyDiscoveryBot

A Java-based bot that automatically crawls for new releases by your followed artists on Spotify. Never miss a release again!

bot crawler java music spotify spring-boot springboot sqlite

Last synced: 27 Oct 2024

https://github.com/natlee/ehentai-crawler

Clone a panda yourself.

anime chrome crawler downloader ehentai ehentai-crawler exhentai python selenium

Last synced: 21 Nov 2024

https://github.com/selmi-karim/img-cli

An interactive Command-Line Interface Build in NodeJS for downloading a single or multiple images to disk from URL

buffer crawler crawling downloader image-downloader image-downloading nodejs phantomjs webpage

Last synced: 08 Nov 2024

https://github.com/fanyong920/crawlitem

用于爬取淘宝天猫网页的谷歌插件

crawler javascript taobao tmall

Last synced: 27 Oct 2024

https://github.com/toannd96/crawler_web_js

Dùng scrapy-splash kết hợp lua script để crawl các trang web sử dụng Javascript (websosanh)

crawler javascript lua-script scrapy scrapy-splash splash

Last synced: 18 Nov 2024

https://github.com/kirillplatonov/proxy_manager

Ruby proxy manager. Gem for easy usage proxy in parser/web bots.

crawler parser proxy ruby

Last synced: 06 Dec 2024

https://github.com/ze3kr/wheres-my-offer

University Admission Portal Checker

crawler offer university university-admission

Last synced: 18 Dec 2024

https://github.com/omarhashem123/venom

Tool designed for fast crawl and extract endpoints

crawler python python3 spider

Last synced: 21 Nov 2024

https://github.com/ruichongliu/Crawler_pubg.op.gg

This is a web crawler for pubg.op.gg, written by Ruichong Liu. 绝地求生游戏数据抓取

beautifulsoup4 crawler pubg python3 scrape selenium

Last synced: 29 Oct 2024

https://github.com/nanitefactory/chromebot

Run headless Chrome using Go.

automation bot chrome-devtools chromebot crawler developer-tools golang headless-browser headless-chrome testing web

Last synced: 23 Dec 2024

https://github.com/abhineetraj1/phonenumber-scraper

This will tell you which carrier does your SIM belongs. Make sure your internet connection before running this !!

crawler phone-number-information phone-number-validation python3 scraper

Last synced: 28 Nov 2024

https://github.com/gajus/headless-crawler

A crawler implemented using a headless browser (Chrome).

chrome crawler headless puppeteer spider

Last synced: 13 Jan 2025

https://github.com/twtrubiks/crawler_click_tutorial

click tutorial ( crawler ) use python

click command-line-tool crawler python tutorial

Last synced: 16 Nov 2024

https://github.com/shavit/crawlero

Distributed web crawlers. Fault tolerance, user-agent randomizer, RabbitMQ, Tor, PostgreSQL.

crawler marketing-automation marketing-tools pbn proxy rabbitmq tor

Last synced: 23 Nov 2024

https://github.com/ikergarcia1996/questionclustering

Clasificador de preguntas escrito en python 3 que fue implementado en el siguiente vídeo: https://youtu.be/qnlW1m6lPoY

clustering crawler deep-learning inteligencia-artificial machine-learning natural-language-processing nlp pln sentiment-analysis techonology unsupervised-machine-learning word-embeddings

Last synced: 10 Jan 2025

https://github.com/sadeghhayeri/twitter-friend-connections

Visualizing Twitter Friend Connections

crawler data gephi gephi-visualizations graph jupyter-notebook network-analysis networkx twitter twitter-api twitter-crawler visualization

Last synced: 12 Nov 2024

https://github.com/matheuscas/pynfce

Busca e extrai dados de uma NFCe dada sua URL de acesso.

crawler nfce python3

Last synced: 30 Dec 2024

https://github.com/sunsetmkt/bilibili-video-reply-crawler

Python爬虫获取Bilibili视频/专栏评论

bilibili crawler github-actions python python3 spider

Last synced: 14 Nov 2024

https://github.com/danhje/dead-link-crawler

An efficient, asynchronous crawler that identifies broken links on a given domain.

async broken-links crawler dead-links python python3

Last synced: 04 Nov 2024

https://github.com/gabrielguarisa/brdata

Brazilian financial market data sources

brasil crawler data finance

Last synced: 25 Nov 2024

https://github.com/aurelg/linkbak

linkbak is a web page archiver : it reads a list of links and dumps the corresponding pages in HTML and PDF.

archive backup crawler html pdf python3

Last synced: 05 Nov 2024

https://github.com/ptsochantaris/bloo

Your search engine on your device

crawler ios ios-app macos macos-app productivity search-engine spotlight spotlight-search swift testflight

Last synced: 07 Nov 2024

https://github.com/montferret/worker

Containerized Ferret worker

chrome crawler docker dsl ferret go hacktoberfest hacktoberfest2020 scraping scraping-websites service worker

Last synced: 14 Nov 2024

https://github.com/codingcrush/aiocrawler

Async crawler framework based on aiohttp and asyncio for running fast.

aiofiles aiohttp asyncio crawler uvloop

Last synced: 17 Nov 2024

https://github.com/maxgio92/krawler

A crawler for kernel releases distributed by the major Linux distributions.

crawler kernel linux scraping

Last synced: 28 Oct 2024

https://github.com/somnisomni/twitter-account-data-crawler

Crawl and track followers count of Twitter account

crawler crawling follower-count follower-tracker selenium selenium-python twitter twitter-api twitter-crawler twitter-crawling

Last synced: 21 Nov 2024

https://github.com/valmisson/ytubes

Search for videos, playlists, channels, movies. live and musics on youtube without api key.

channel crawler live movie nodejs playlist scraper search typescript videos youtube youtube-api youtube-music youtube-search ytube

Last synced: 11 Oct 2024