Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-01-05 00:06:18 UTC
JSON Representation

https://github.com/Hound-fm/podcatcher

Audio media crawler for lbry.

crawler lbry python

Last synced: 18 Nov 2024

https://github.com/markelog/map

Simple site map generator, supports couple reporters, depth levels and etc

crawler map sitemap spider

Last synced: 25 Nov 2024

https://github.com/oscarnevarezleal/ecommerce-crawler

Parallel ecommerce crawler using Docker and Puppeter on GCP

crawler gcp nodejs pubnub puppeteer

Last synced: 29 Nov 2024

https://github.com/jean-baptiste-camps/iiif-crawler

Interrogate IIIF servers and get images of manuscripts

crawler iiif iiif-image manuscripts

Last synced: 11 Oct 2024

https://github.com/52cik/creeper

简单爬虫引擎 (苦力怕)

crawler node-crawler

Last synced: 29 Nov 2024

https://github.com/dynesshely/everydaynews

A repo fetched most of news and infomation, where stored and organized them.

crawler data fetcher network news

Last synced: 22 Dec 2024

https://github.com/gabfl/sitecrawl

Simple Python module to crawl a website and extract URLs

crawl crawler crawler-python crawling-sites

Last synced: 13 Oct 2024

https://github.com/capturr/jsonld-extract

A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDom ...).

cashdom cheerio crawler crawling data extract extractor javascript jquery json jsonld metadata nodejs parser scraper scraping spider typescript

Last synced: 28 Oct 2024

https://github.com/eric2788/platformscrawler

多平台爬蟲 + 模塊化管理，用於搜集資料並經 redis pubsub 發送

bilibili crawler crawling pubsub redis twitter youtube

Last synced: 11 Oct 2024

https://github.com/bingxyz/tg-earthquake-warning

telegram 台灣地震報告廣播頻道

bash crawler telegram-bot-api

Last synced: 21 Nov 2024

https://github.com/nobodxbodon/chromecrawlerwildspider

Chrome Extension to crawl web pages by loading them into browser tabs parallelly.

chrome-extension crawler localstorage spider

Last synced: 30 Nov 2024

https://github.com/ajcerejeira/base.gov.pt

A crawler that fetches data from base.gov.pt

crawler csv python scrapy

Last synced: 06 Nov 2024

https://github.com/yggverse/yggstate

Yggdrasil Network Explorer

analytics crawler explorer geo-ip geo-location geolite2 mysql php search-engine sphinx spider yggdrasil yggdrasil-api yggdrasil-network yggdrasil-php-api yggdrasilctl yggstate

Last synced: 06 Nov 2024

https://github.com/sayyid5416/links-extractor

Extract links from any file or the website.

crawler extract-links extractor links-extraction scraper web-crawler web-scraper

Last synced: 28 Oct 2024

https://github.com/AmirAref/DivarCrawler

an script to crawl divar.ir and extract phone numbers

crawler scraper selenium

Last synced: 22 Nov 2024

https://github.com/twtrubiks/pttcrawlercontent

PTT Crawler Content on python PTT文章爬蟲

crawler gossiping ptt python

Last synced: 16 Nov 2024

https://github.com/exp-codes/jzone-crawler

QQ空间爬虫（Java版）

crawler programming

Last synced: 16 Dec 2024

https://github.com/chusiang/crawler-book-info

A crawler for quick parser the book information

book crawler python

Last synced: 07 Nov 2024

https://github.com/arshadkazmi42/blc

Broken link checker

blc broken-link-checker broken-link-finder bug-bounty bugbounty crawler python

Last synced: 28 Oct 2024

https://github.com/luizppa/web-crawler

A web crawler that collects and indexes web pages. Made with chilkat and gumbo parser.

chilkat cpp crawler webcrawler

Last synced: 28 Oct 2024

https://github.com/sweeticelolly/sao_title_bot

一个生成骚论文题目的机器人

chrome-dr chromedriver crawler generator language-learning language-model numpy python robot scholar scholarly-articles selenium selenium-webdriver

Last synced: 24 Nov 2024

https://github.com/busterc/crwlr

🕷a minimal puppeteer crawler api

crawl crawler crawling puppeteer spider walker

Last synced: 12 Dec 2024

https://github.com/integralist/go-web-crawler

A web crawler built in the Go programming language

concurrency crawler go golang web-crawler

Last synced: 11 Oct 2024

https://github.com/simin75simin/libgencrawl

crawl all books from a library genesis search

crawler free-software libgen python3 scraper

Last synced: 05 Nov 2024

https://github.com/rvegas/dota_crawler

Crawler for dotapedia. Fills a Mongo and a PG database with game data.

crawler dota dota2 flask mongodb postgresql python3 regex scrapy

Last synced: 01 Jan 2025

https://github.com/0memo07/web-crawler

Web Crawler with Python

beautifulsoup4 bs4 crawler crawlers crawling crawling-python web-crawler web-crawler-python web-crawling webcrawler

Last synced: 17 Nov 2024

https://github.com/giscafer/ziroom-crawler

自如友家租房，房源爬虫，房源状态监听，目的是抢房

crawler nodejs

Last synced: 17 Nov 2024

https://github.com/Antosser/web-crawler

Rust Web Crawler that finds every page, image, and script on a website (and downloads it)

crawler html rust seo web

Last synced: 24 Sep 2024

https://github.com/luckyzxl2016/go-spider

concurrent crawler golang spider

Last synced: 11 Oct 2024

https://github.com/hybridx/webscraper

webcrawler made from Beautiful soup

crawler flask google-dorks javascript python3 search-engine

Last synced: 13 Dec 2024

https://github.com/vshawn/tutiempo_crawler

a crawler for climate data on en.tutiempo.net

climate-data crawler tutiempo-crawler

Last synced: 19 Nov 2024

https://github.com/juliandavidmr/raptor

Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.

crawler kotlin mysql spider

Last synced: 09 Nov 2024

https://github.com/inishchith/python-scripts

Some Scripts & Projects

crawler python-script python3 scripts youtube

Last synced: 19 Dec 2024

https://github.com/floscha/genius-lyrics-crawler

A concurrent crawler to retrieve song lyrics from Genius

celery crawler fluentd genius lyrics mongodb python

Last synced: 09 Nov 2024

https://github.com/roccomuso/is-bing

Verify that a request is from Bing crawlers using Bing's DNS verification steps

bing bot check crawler dns ip js nodejs verify

Last synced: 17 Oct 2024

https://github.com/aprilnea/xjtlu

This is how to get all the network resources of XJTLU.

crawler gateway http-auth python spider web-crawler xjtlu

Last synced: 15 Nov 2024

https://github.com/kernelerr/pixivsync

Pixiv图片下载及同步工具

crawler pixiv pixiv-crawler python

Last synced: 19 Nov 2024

https://github.com/wenyalintw/job-scraper-bot

幫朋友做好玩的Telegram機器人，已部署到Heroku

amazon-web-services aws-s3 boto3 crawler google-drive google-drive-api heroku heroku-deployment python-telegram-bot scraper scraping scrapy telegram telegram-bot telegram-bot-api web-scraping

Last synced: 11 Nov 2024

https://github.com/ivan-alone/instastories-saver-cpp

Program to saving Instagram Stories - Rewritten to C++

api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories

Last synced: 19 Dec 2024

https://github.com/karambir/ugc-colleges

Python Script to extract college names from UGC, India website.

college crawler extract html-parser python python-script ugc

Last synced: 12 Dec 2024

https://github.com/cr0hn/feed-to-exporter

Get RSS Feed and export as Wordpress Post

crawler feed rss wordpress

Last synced: 07 Nov 2024

https://github.com/giscafer/airlevel-crawler

a demo of crawler for air-level.com

crawler java nodejs

Last synced: 17 Nov 2024

https://github.com/frectonz/rampilo

A telegram crawler

crawler rust telegram telegram-crawler

Last synced: 14 Nov 2024

https://github.com/hktalent/scrapysite

ScrapySite,go Web Crawler（spider）, scraping，intelligence gathering

crawler elasticsearch go scraping site spider web

Last synced: 19 Nov 2024

https://github.com/vinouno/BilibiliDanmuCrawler

一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目

crawler python

Last synced: 27 Oct 2024

https://github.com/holmofy/spring-spider

Spring Spider App Utility Library.

crawler java spider spring spring-spider

Last synced: 27 Oct 2024

https://github.com/zain-ul-din/lgu-crawler

LGU timetable Crawler

contribute crawler lahore-garrison-university lahore-garrison-university-timetable open-source

Last synced: 10 Dec 2024

https://github.com/librecodecoop/querido-diario-php

Brazilian government gazettes, accessible to everyone.

civic-tech crawler data-science gazette-crawler governments-gazettes govtech hacktoberfest open-data php php7 politics spider

Last synced: 29 Nov 2024

https://github.com/haxzie-xx/crode.js-node-web-crawler

Node.js Crawler built for open FTP sites for movie link collection.

crawler nodejs

Last synced: 19 Dec 2024

https://github.com/hxr16f/ss-grabber

Automation script for downloading user screenshots.

automation crawler downloader grabber lightshot screenshot script

Last synced: 27 Nov 2024

https://github.com/pjt3591oo/golang-crawler

golang으로 크롤러 만들기

crawler golang

Last synced: 26 Dec 2024

https://github.com/trudi-group/mc-crawler

A MobileCoin network crawler. Corresponding preprint available on arXiv (https://arxiv.org/pdf/2111.12364.pdf).

crawler mobilecoin rust

Last synced: 02 Dec 2024

https://github.com/pjt3591oo/rust-exchange-crawler

rust 공부겸 만들어보는 크롤러

crawler rust

Last synced: 26 Dec 2024

https://github.com/ernesto-jimenez/crawler

Easily crawl websites in Go.

crawler golang

Last synced: 25 Nov 2024

https://github.com/moqsien/scrapx

scrapy定制版; A customized and enhanced version of scrapy for managing hundreds or even thousands of spiders.

crawler framework pymongo scrapy spider

Last synced: 20 Nov 2024

https://github.com/sanmak/queue-web-crawler

This application is developed to crawl a website with queue that determines no of allowed concurrent connections and find all possible hyperlinks present within it and save it to CSV file.

async chai crawler csv hyperlinks mocha nodejs queue scrapper web

Last synced: 28 Nov 2024

https://github.com/surelle-ha/dogma

Dogma is a CLI tool that enables interaction with the GitHub API for the purpose of searching .env files with specified keywords. You can configure a GitHub token and use the crawler to search for keys in .env files across public repositories.

cli crawler github nodejs

Last synced: 10 Nov 2024

https://github.com/eished/tujigu_crawler

tujigu.com 图集谷 node.js 多线程爬虫 tujigu crawler

crawler node nodejs

Last synced: 02 Dec 2024

https://github.com/mirocow/yii2-crawler

Http concurrent crawler for Yii2

concurrency crawler guzzle yii2-extension

Last synced: 16 Nov 2024

https://github.com/coghost/iparse

To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support

crawler parser parser-library python xkcd yaml

Last synced: 09 Nov 2024

https://github.com/pjt3591oo/news-crawler

crawler data python

Last synced: 06 Nov 2024

https://github.com/vitorebatista/horoscopefree

The Astrology API Rest daily horoscope

crawler horoscope horoscope-crawler horoscopes-api

Last synced: 30 Nov 2024

https://github.com/iml1111/toonkor_collector

툰코 만화 수집기

crawler python

Last synced: 09 Dec 2024

https://github.com/mrrfv/webarchive

Crawls websites and saves found URLs to a file.

archive archiveteam archiving crawler crawling ia internet-archive scraper web-archiving web-scraping

Last synced: 27 Oct 2024

https://github.com/danielmorell/se_bot_checker

Validate search engine user agents and IP addresses.

crawler googlebot python search-engine spider

Last synced: 15 Oct 2024

https://github.com/alishahbazi81/jobcrawler

Job crawler robot which finds jobs on job board platforms like LinkedIn, Glassdoor, and indeed based on their post time and send them to a telegram channel

asp-net-core crawler jobs jobsearch telegram telegram-bot

Last synced: 11 Nov 2024

https://github.com/foolin/scrago

An simpe, fast, extensible crawl page framework for golang

crawler go scrago scrapy

Last synced: 09 Nov 2024

https://github.com/leelow/nightmare-screenshot-selector

👻 📷 A Nightmare plugin to easily take screenshots.

crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler

Last synced: 15 Nov 2024

https://github.com/birkhofflee/blizzard_forum.js

An unofficial Node.js API for Blizzard Forums. (works in 2019)

api crawler web

Last synced: 18 Nov 2024

https://github.com/cuerz/douban-top

Golang爬虫爬取豆瓣榜单

crawler douban golang goquery

Last synced: 08 Nov 2024

https://github.com/stopka/fedicrawl

Collect feeds to follow on Fediverse nodes.

crawler docker fediverse nodejs prisma typescript

Last synced: 05 Nov 2024

https://github.com/sayakie/pixiv-crawler

Crawls images from Pixiv 🚀

crawler nodejs pixiv typescript

Last synced: 28 Oct 2024

https://github.com/kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 14 Dec 2024

https://github.com/mcstreetguy/crawler

An advanced web-crawler written in PHP.

composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler

Last synced: 12 Oct 2024

https://github.com/vmdang/historycrawler

The OOP project collects historical data in Vietnam and displays

crawler gson java javafx jsoup

Last synced: 11 Oct 2024

https://github.com/hangyan/generate-cs-word-dict

Generate a word dict for CS from stackoverflow/github tags

crawler dict github python word

Last synced: 05 Dec 2024

https://github.com/doroudi/imdb-crawler

imdb.com movies crawler in scrapy

crawler data-mining python scrapy

Last synced: 12 Dec 2024

https://github.com/omerdogan3/kitapp-crawler

Web Crawler Application of KitApp - Gets data from booksellers & insert them into database.

book bookseller crawler mysql nodejs puppeteer scrapper-script web-crawler

Last synced: 13 Dec 2024

https://github.com/moehmeni/ezweb

Easy to use web page analyzer

analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www

Last synced: 05 Nov 2024

https://github.com/marzzzello/appstore_crawler

(mirror) download the IDs and metadata of all apps in the apple appstore

apple appstore crawler metadata scrapy

Last synced: 05 Nov 2024

https://github.com/dist1ll/hltv-rust

A client to fetch and parse data from HLTV.org

api crawler hltv parser rust

Last synced: 14 Oct 2024

https://github.com/robmch/mindfactory_crawling

A Python 3 Crawler for Mindfactory.de

crawler crawling data webcrawler webcrawling

Last synced: 17 Nov 2024

https://github.com/spencerlepine/readme-crawler

A Node.js web crawler to download README files and follow contained links. Fetch repositories from a valid GitHub URL

crawler javascript node nodejs readme scraper web-crawler webcrawer

Last synced: 13 Nov 2024

https://github.com/zurdi15/nbz

Bot to automate internet browsing

automation bot browser-automation browsermob-proxy crawler selenium testing web

Last synced: 15 Oct 2024

https://github.com/yjyoon-dev/nara-crawler

Crawler for National Archives Catalog

crawler python scrapy

Last synced: 20 Nov 2024

https://github.com/itszeeshan/crawlinit

A web crawler written in python3

appsec bugbounty bugbounty-tool bugbountytips crawler crawler-python enumeration infosec python recon reconnaissance scanner url web

Last synced: 12 Oct 2024

https://github.com/code-inside/sloader

Worker that loads and retrieves data from "slow" endpoints.

crawler drop json yml

Last synced: 16 Nov 2024

https://github.com/liyifeng1994/go-crawler

基于golang的分布式爬虫项目

crawler elastic elasticsearch golang

Last synced: 12 Nov 2024

https://github.com/leo9960/bilibili_live_danmu_crawler

b站直播的弹幕抓取

bilibili crawler danmu live

Last synced: 10 Nov 2024

https://github.com/elliotxx/readnewspaper

自动获取电子版报纸，方便每天阅读

crawler lxml newspaper pypdf2 python requests

Last synced: 24 Dec 2024

https://github.com/licoy/java-crawler

通过java使用jsoup爬虫框架爬取数据

crawler java jsoup

Last synced: 08 Dec 2024

https://github.com/vinitkumar/pycrawler

Crawler in Python 3.7, 3.8. 3.9. Pypy3

crawler python python35 python36 utils

Last synced: 28 Oct 2024

https://github.com/lon9/arxiv

For scraping arxiv.org

arxiv crawler golang

Last synced: 01 Dec 2024

https://github.com/tikazyq/github-crawler

Github repositories crawler

crawler scrapy

Last synced: 17 Dec 2024

https://github.com/manuel-lang/autonomous-semantic-search-engine

Submission for HackDataKIBots 2018 - Web crawler combined with document analysis

crawler hackathon machine-learning mannheim microsoft natural-language-processing natural-language-understanding nextiteration rnv semantic-search textract

Last synced: 13 Nov 2024

https://github.com/leomaurodesenv/smm-course-search

A package to searching courses - Super Mario Maker

bookmark-site crawler javascript json mario-game mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/t-rekttt/tlu-schedule

chatfuel crawler nodejs vuejs

Last synced: 09 Dec 2024

https://github.com/vivekg13186/easy_web_crawler

Web crawler around puppeteer to crawler ajax/java script enabled pages.

crawler spider web

Last synced: 09 Dec 2024

https://github.com/maicss/1024img

1024 image nodejs crawler

1024 crawler nodejs

Last synced: 31 Dec 2024

https://github.com/thiiagoms/dict-crawler

Simple crawler on UOL dictionary

beautifulsoup4 crawler dic python pythonic

Last synced: 15 Nov 2024

Crawler Awesome Lists

awesome-crawler 101 awesome-python-primer 68 awesome-digital-preservation 45 awesome-fingerprinting 48

Crawler Categories

2.6 机器学习 50 Research 31 Python 18 Replay tools 18 1.1 语言基础 14 Libraries & Projects 13 Fingerprinting Evasion 13 Sites 12 2.4 Web 前端 10 2.1 爬虫基础 9 3\. 数据库 8 2.5 数据分析 7 Java 7 Web archiving 7 Other digital objects 6 4\. 异步IO 6 Social Networks 4 Standards and specifications 4 2.3 Django 框架 4 2.2 Flask 框架 4