Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/dotenorio/freeloader-of-data

A simple crawler or scraper to get open graph and other meta data from any website.

crawler graph hacktoberfest meta-data open-graph scraper

Last synced: 25 Oct 2024

https://github.com/busterc/crwlr

🕷a minimal puppeteer crawler api

crawl crawler crawling puppeteer spider walker

Last synced: 12 Dec 2024

https://github.com/chusiang/crawler-book-info

A crawler for quick parser the book information

book crawler python

Last synced: 07 Nov 2024

https://github.com/bernabe9/render-it

Render any JavaScript content to create static sites ready for SEO

crawler javascript prerender prerenderio puppeteer render seo seo-tools server-side-rendering static-site static-site-generator

Last synced: 07 Nov 2024

https://github.com/s045pd/magicworld

环球网-神奇世界看看看爬虫

crawler python3 sanic telepot

Last synced: 07 Nov 2024

https://github.com/giscafer/ziroom-crawler

自如友家租房,房源爬虫,房源状态监听,目的是抢房

crawler nodejs

Last synced: 17 Nov 2024

https://github.com/akiosarkiz/manga-collector

The manga collector is a library designed to easily scrape manga content from various websites. This package is licensed under the MIT License and is fully test-covered

api crawler manga scraper

Last synced: 20 Nov 2024

https://github.com/nobodxbodon/chromecrawlerwildspider

Chrome Extension to crawl web pages by loading them into browser tabs parallelly.

chrome-extension crawler localstorage spider

Last synced: 30 Nov 2024

https://github.com/twtrubiks/pttcrawlercontent

PTT Crawler Content on python PTT文章爬蟲

crawler gossiping ptt python

Last synced: 16 Nov 2024

https://github.com/synacktraa/crawl

Web crawler designed to efficiently retrieve unique href, script and form links from a web application.

bash crawler regex shell web-spidering

Last synced: 26 Nov 2024

https://github.com/gabfl/sitecrawl

Simple Python module to crawl a website and extract URLs

crawl crawler crawler-python crawling-sites

Last synced: 13 Oct 2024

https://github.com/Antosser/web-crawler

Rust Web Crawler that finds every page, image, and script on a website (and downloads it)

crawler html rust seo web

Last synced: 24 Sep 2024

https://github.com/bitscoper/bitscoper_cyber_toolbox

A Flutter application consisting of TCP Port Scanner, Route Tracer, Pinger, File Hash Calculator, String Hash Calculator, Base Encoder, Morse Code Translator, Open Graph Protocol Data Extractor, Series URI Crawler, DNS Record Retriever, and WHOIS Retriever.

android calculator crawler cybersecurity dart decoder docker encoder extractor flutter github-action ios mac retriever scanner tracer translator web windows

Last synced: 05 Dec 2024

https://github.com/integralist/go-web-crawler

A web crawler built in the Go programming language

concurrency crawler go golang web-crawler

Last synced: 11 Oct 2024

https://github.com/52cik/creeper

简单爬虫引擎 (苦力怕)

crawler node-crawler

Last synced: 29 Nov 2024

https://github.com/AmirAref/DivarCrawler

an script to crawl divar.ir and extract phone numbers

crawler scraper selenium

Last synced: 22 Nov 2024

https://github.com/bugfishtm/bugfish-image-downloader

💾 Bugfish Image Downloader: Effortless web image downloads, subsite exploration, and HD selection. Windows app, .NET 4.5, no registry usage. Download now!

bugfish bugfish-software bugfishtm crawler downloader downloadmanager downloadtool gplv3 image imagedownloader imagedownloadertool imageprocessing portable-executable portableapps software utilityapp webscraping windows windows-desktop

Last synced: 06 Nov 2024

https://github.com/oscarnevarezleal/ecommerce-crawler

Parallel ecommerce crawler using Docker and Puppeter on GCP

crawler gcp nodejs pubnub puppeteer

Last synced: 29 Nov 2024

https://github.com/jacobsteves/crawlperl

A web crawler made with Perl. Great for grabbing or searching for data off the web, or ensuring that your own site files are secure and hidden.

crawler perl scripting web-crawler

Last synced: 27 Nov 2024

https://github.com/lon9/arxiv-crawler

Crawler for arxiv.org

arxiv crawler golang

Last synced: 01 Dec 2024

https://github.com/feedeo/youtube-channel-crawler

YouTube Channel :tv: Crawler

crawler youtube youtube-channel

Last synced: 11 Oct 2024

https://github.com/AmirAref/Torobot

an inline telegram robot to easy access and search in torob.com products from telegram.

crawler python python-telegram-bot scraper telegtam-bot

Last synced: 22 Nov 2024

https://github.com/simin75simin/libgencrawl

crawl all books from a library genesis search

crawler free-software libgen python3 scraper

Last synced: 05 Nov 2024

https://github.com/hangyan/generate-cs-word-dict

Generate a word dict for CS from stackoverflow/github tags

crawler dict github python word

Last synced: 05 Dec 2024

https://github.com/frectonz/rampilo

A telegram crawler

crawler rust telegram telegram-crawler

Last synced: 14 Nov 2024

https://github.com/dist1ll/hltv-rust

A client to fetch and parse data from HLTV.org

api crawler hltv parser rust

Last synced: 14 Oct 2024

https://github.com/hxr16f/ss-grabber

Automation script for downloading user screenshots.

automation crawler downloader grabber lightshot screenshot script

Last synced: 27 Nov 2024

https://github.com/karambir/ugc-colleges

Python Script to extract college names from UGC, India website.

college crawler extract html-parser python python-script ugc

Last synced: 12 Dec 2024

https://github.com/pjt3591oo/golang-crawler

golang으로 크롤러 만들기

crawler golang

Last synced: 26 Dec 2024

https://github.com/vinitkumar/pycrawler

Crawler in Python 3.7, 3.8. 3.9. Pypy3

crawler python python35 python36 utils

Last synced: 28 Oct 2024

https://github.com/pjt3591oo/rust-exchange-crawler

rust 공부겸 만들어보는 크롤러

crawler rust

Last synced: 26 Dec 2024

https://github.com/haxzie-xx/crode.js-node-web-crawler

Node.js Crawler built for open FTP sites for movie link collection.

crawler nodejs

Last synced: 19 Dec 2024

https://github.com/iml1111/toonkor_collector

툰코 만화 수집기

crawler python

Last synced: 09 Dec 2024

https://github.com/cuerz/douban-top

Golang爬虫 爬取豆瓣榜单

crawler douban golang goquery

Last synced: 08 Nov 2024

https://github.com/sanmak/queue-web-crawler

This application is developed to crawl a website with queue that determines no of allowed concurrent connections and find all possible hyperlinks present within it and save it to CSV file.

async chai crawler csv hyperlinks mocha nodejs queue scrapper web

Last synced: 28 Nov 2024

https://github.com/trudi-group/mc-crawler

A MobileCoin network crawler. Corresponding preprint available on arXiv (https://arxiv.org/pdf/2111.12364.pdf).

crawler mobilecoin rust

Last synced: 02 Dec 2024

https://github.com/floscha/genius-lyrics-crawler

A concurrent crawler to retrieve song lyrics from Genius

celery crawler fluentd genius lyrics mongodb python

Last synced: 09 Nov 2024

https://github.com/elliotxx/readnewspaper

自动获取电子版报纸,方便每天阅读

crawler lxml newspaper pypdf2 python requests

Last synced: 24 Dec 2024

https://github.com/omerdogan3/kitapp-crawler

Web Crawler Application of KitApp - Gets data from booksellers & insert them into database.

book bookseller crawler mysql nodejs puppeteer scrapper-script web-crawler

Last synced: 13 Dec 2024

https://github.com/coghost/iparse

To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support

crawler parser parser-library python xkcd yaml

Last synced: 09 Nov 2024

https://github.com/marzzzello/appstore_crawler

(mirror) download the IDs and metadata of all apps in the apple appstore

apple appstore crawler metadata scrapy

Last synced: 05 Nov 2024

https://github.com/ernesto-jimenez/crawler

Easily crawl websites in Go.

crawler golang

Last synced: 25 Nov 2024

https://github.com/alishahbazi81/jobcrawler

Job crawler robot which finds jobs on job board platforms like LinkedIn, Glassdoor, and indeed based on their post time and send them to a telegram channel

asp-net-core crawler jobs jobsearch telegram telegram-bot

Last synced: 11 Nov 2024

https://github.com/cr0hn/feed-to-exporter

Get RSS Feed and export as Wordpress Post

crawler feed rss wordpress

Last synced: 07 Nov 2024

https://github.com/lon9/arxiv

For scraping arxiv.org

arxiv crawler golang

Last synced: 01 Dec 2024

https://github.com/vitorebatista/horoscopefree

The Astrology API Rest daily horoscope

crawler horoscope horoscope-crawler horoscopes-api

Last synced: 30 Nov 2024

https://github.com/roccomuso/is-bing

Verify that a request is from Bing crawlers using Bing's DNS verification steps

bing bot check crawler dns ip js nodejs verify

Last synced: 17 Oct 2024

https://github.com/leelow/nightmare-screenshot-selector

👻 📷 A Nightmare plugin to easily take screenshots.

crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler

Last synced: 15 Nov 2024

https://github.com/aprilnea/xjtlu

This is how to get all the network resources of XJTLU.

crawler gateway http-auth python spider web-crawler xjtlu

Last synced: 15 Nov 2024

https://github.com/giscafer/airlevel-crawler

a demo of crawler for air-level.com

crawler java nodejs

Last synced: 17 Nov 2024

https://github.com/doroudi/imdb-crawler

imdb.com movies crawler in scrapy

crawler data-mining python scrapy

Last synced: 12 Dec 2024

https://github.com/vinouno/BilibiliDanmuCrawler

一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目

crawler python

Last synced: 27 Oct 2024

https://github.com/hktalent/scrapysite

ScrapySite,go Web Crawler(spider), scraping,intelligence gathering

crawler elasticsearch go scraping site spider web

Last synced: 19 Nov 2024

https://github.com/mirocow/yii2-crawler

Http concurrent crawler for Yii2

concurrency crawler guzzle yii2-extension

Last synced: 16 Nov 2024

https://github.com/leo9960/bilibili_live_danmu_crawler

b站直播的弹幕抓取

bilibili crawler danmu live

Last synced: 10 Nov 2024

https://github.com/leomaurodesenv/smm-course-search

A package to searching courses - Super Mario Maker

bookmark-site crawler javascript json mario-game mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/sayakie/pixiv-crawler

Crawls images from Pixiv 🚀

crawler nodejs pixiv typescript

Last synced: 28 Oct 2024

https://github.com/juliandavidmr/raptor

Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.

crawler kotlin mysql spider

Last synced: 09 Nov 2024

https://github.com/foolin/scrago

An simpe, fast, extensible crawl page framework for golang

crawler go scrago scrapy

Last synced: 09 Nov 2024

https://github.com/code-inside/sloader

Worker that loads and retrieves data from "slow" endpoints.

crawler drop json yml

Last synced: 16 Nov 2024

https://github.com/spencerlepine/readme-crawler

A Node.js web crawler to download README files and follow contained links. Fetch repositories from a valid GitHub URL

crawler javascript node nodejs readme scraper web-crawler webcrawer

Last synced: 13 Nov 2024

https://github.com/stopka/fedicrawl

Collect feeds to follow on Fediverse nodes.

crawler docker fediverse nodejs prisma typescript

Last synced: 05 Nov 2024

https://github.com/eished/tujigu_crawler

tujigu.com 图集谷 node.js 多线程爬虫 tujigu crawler

crawler node nodejs

Last synced: 02 Dec 2024

https://github.com/vmdang/historycrawler

The OOP project collects historical data in Vietnam and displays

crawler gson java javafx jsoup

Last synced: 11 Oct 2024

https://github.com/tikazyq/github-crawler

Github repositories crawler

crawler scrapy

Last synced: 17 Dec 2024

https://github.com/kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 14 Dec 2024

https://github.com/holmofy/spring-spider

Spring Spider App Utility Library.

crawler java spider spring spring-spider

Last synced: 27 Oct 2024

https://github.com/kernelerr/pixivsync

Pixiv图片下载及同步工具

crawler pixiv pixiv-crawler python

Last synced: 19 Nov 2024

https://github.com/robmch/mindfactory_crawling

A Python 3 Crawler for Mindfactory.de

crawler crawling data webcrawler webcrawling

Last synced: 17 Nov 2024

https://github.com/moqsien/scrapx

scrapy定制版; A customized and enhanced version of scrapy for managing hundreds or even thousands of spiders.

crawler framework pymongo scrapy spider

Last synced: 20 Nov 2024

https://github.com/surelle-ha/dogma

Dogma is a CLI tool that enables interaction with the GitHub API for the purpose of searching .env files with specified keywords. You can configure a GitHub token and use the crawler to search for keys in .env files across public repositories.

cli crawler github nodejs

Last synced: 10 Nov 2024

https://github.com/birkhofflee/blizzard_forum.js

An unofficial Node.js API for Blizzard Forums. (works in 2019)

api crawler web

Last synced: 18 Nov 2024

https://github.com/danielmorell/se_bot_checker

Validate search engine user agents and IP addresses.

crawler googlebot python search-engine spider

Last synced: 15 Oct 2024

https://github.com/yjyoon-dev/nara-crawler

Crawler for National Archives Catalog

crawler python scrapy

Last synced: 20 Nov 2024

https://github.com/liyifeng1994/go-crawler

基于golang的分布式爬虫项目

crawler elastic elasticsearch golang

Last synced: 12 Nov 2024

https://github.com/licoy/java-crawler

通过java使用jsoup爬虫框架爬取数据

crawler java jsoup

Last synced: 08 Dec 2024

https://github.com/exp-codes/python-crawler-template

Python 爬虫开发模板

crawler programming template

Last synced: 16 Dec 2024

https://github.com/hrvadl/goweekly

Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel

article chatgpt crawler go golang openai-api telegram telegram-bot

Last synced: 13 Oct 2024

https://github.com/rimiti/ping-urls

🏓 Ping URLs by batch.

cache crawler ping prerender prerendering seo

Last synced: 28 Dec 2024

https://github.com/tokenmill/crawling-framework-example

Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.

crawler crawling-framework elasticsearch storm-crawler

Last synced: 10 Nov 2024

https://github.com/waynechang65/baha-crawler

baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.

bahamut crawler javascript nodejs scraper spider webcrawler

Last synced: 19 Oct 2024

https://github.com/glutexo/onigumo

Parallel web scraping framework

crawler

Last synced: 20 Dec 2024

https://github.com/dylanhogg/legaldata

Provides access to Australian legal data

crawler data law lawtech legal legaltech

Last synced: 06 Dec 2024