Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-02-10 00:06:28 UTC
JSON Representation

https://github.com/sayakie/pixiv-crawler

Crawls images from Pixiv 🚀

crawler nodejs pixiv typescript

Last synced: 28 Oct 2024

https://github.com/robmch/mindfactory_crawling

A Python 3 Crawler for Mindfactory.de

crawler crawling data webcrawler webcrawling

Last synced: 17 Nov 2024

https://github.com/mcstreetguy/crawler

An advanced web-crawler written in PHP.

composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler

Last synced: 12 Oct 2024

https://github.com/elliotxx/readnewspaper

自动获取电子版报纸，方便每天阅读

crawler lxml newspaper pypdf2 python requests

Last synced: 24 Dec 2024

https://github.com/franjid/filmaffinity-crawler

Crawl and scrape films from filmaffinity.com (with nodejs)

crawler filmaffinity javascript node nodejs scraper

Last synced: 27 Jan 2025

https://github.com/leo9960/bilibili_live_danmu_crawler

b站直播的弹幕抓取

bilibili crawler danmu live

Last synced: 10 Nov 2024

https://github.com/lon9/arxiv

For scraping arxiv.org

arxiv crawler golang

Last synced: 29 Jan 2025

https://github.com/pjt3591oo/rust-exchange-crawler

rust 공부겸 만들어보는 크롤러

crawler rust

Last synced: 26 Dec 2024

https://github.com/pjt3591oo/golang-crawler

golang으로 크롤러 만들기

crawler golang

Last synced: 26 Dec 2024

https://github.com/hangyan/generate-cs-word-dict

Generate a word dict for CS from stackoverflow/github tags

crawler dict github python word

Last synced: 05 Dec 2024

https://github.com/yjyoon-dev/nara-crawler

Crawler for National Archives Catalog

crawler python scrapy

Last synced: 20 Nov 2024

https://github.com/manuel-lang/autonomous-semantic-search-engine

Submission for HackDataKIBots 2018 - Web crawler combined with document analysis

crawler hackathon machine-learning mannheim microsoft natural-language-processing natural-language-understanding nextiteration rnv semantic-search textract

Last synced: 13 Nov 2024

https://github.com/holmofy/spring-spider

Spring Spider App Utility Library.

crawler java spider spring spring-spider

Last synced: 27 Oct 2024

https://github.com/leomaurodesenv/smm-course-search

A package to searching courses - Super Mario Maker

bookmark-site crawler javascript json mario-game mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/haxzie-xx/crode.js-node-web-crawler

Node.js Crawler built for open FTP sites for movie link collection.

crawler nodejs

Last synced: 19 Dec 2024

https://github.com/alishahbazi81/jobcrawler

Job crawler robot which finds jobs on job board platforms like LinkedIn, Glassdoor, and indeed based on their post time and send them to a telegram channel

asp-net-core crawler jobs jobsearch telegram telegram-bot

Last synced: 11 Nov 2024

https://github.com/zurdi15/nbz

Bot to automate internet browsing

automation bot browser-automation browsermob-proxy crawler selenium testing web

Last synced: 15 Oct 2024

https://github.com/dist1ll/hltv-rust

A client to fetch and parse data from HLTV.org

api crawler hltv parser rust

Last synced: 14 Oct 2024

https://github.com/vinouno/BilibiliDanmuCrawler

一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目

crawler python

Last synced: 27 Oct 2024

https://github.com/aprilnea/xjtlu

This is how to get all the network resources of XJTLU.

crawler gateway http-auth python spider web-crawler xjtlu

Last synced: 15 Nov 2024

https://github.com/mrrfv/webarchive

Crawls websites and saves found URLs to a file.

archive archiveteam archiving crawler crawling ia internet-archive scraper web-archiving web-scraping

Last synced: 27 Oct 2024

https://github.com/birkhofflee/blizzard_forum.js

An unofficial Node.js API for Blizzard Forums. (works in 2019)

api crawler web

Last synced: 19 Jan 2025

https://github.com/aminehsan/crawler-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scarping

Last synced: 31 Jan 2025

https://github.com/fzdwx/go-pachong

go 爬虫，能根据一个入口url不断爬取。go web crawler, able to continuously crawl data according to an entry url

crawler go golang

Last synced: 08 Feb 2025

https://github.com/foolin/scrago

An simpe, fast, extensible crawl page framework for golang

crawler go scrago scrapy

Last synced: 05 Jan 2025

https://github.com/wangshouh/sdufelib_seat_crawler

SDUFE Library Reservation Seat Monitoring Crawler

crawler python

Last synced: 02 Feb 2025

https://github.com/wenyalintw/job-scraper-bot

幫朋友做好玩的Telegram機器人，已部署到Heroku

amazon-web-services aws-s3 boto3 crawler google-drive google-drive-api heroku heroku-deployment python-telegram-bot scraper scraping scrapy telegram telegram-bot telegram-bot-api web-scraping

Last synced: 11 Nov 2024

https://github.com/frectonz/rampilo

A telegram crawler

crawler rust telegram telegram-crawler

Last synced: 14 Nov 2024

https://github.com/giscafer/airlevel-crawler

a demo of crawler for air-level.com

crawler java nodejs

Last synced: 17 Nov 2024

https://github.com/juliandavidmr/raptor

Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.

crawler kotlin mysql spider

Last synced: 09 Nov 2024

https://github.com/leelow/nightmare-screenshot-selector

👻 📷 A Nightmare plugin to easily take screenshots.

crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler

Last synced: 15 Nov 2024

https://github.com/spencerlepine/readme-crawler

A Node.js web crawler to download README files and follow contained links. Fetch repositories from a valid GitHub URL

crawler javascript node nodejs readme scraper web-crawler webcrawer

Last synced: 13 Nov 2024

https://github.com/vitorebatista/horoscopefree

The Astrology API Rest daily horoscope

crawler horoscope horoscope-crawler horoscopes-api

Last synced: 30 Nov 2024

https://github.com/cr0hn/feed-to-exporter

Get RSS Feed and export as Wordpress Post

crawler feed rss wordpress

Last synced: 07 Nov 2024

https://github.com/natlee/myanimelist-comment-crawler

Crawl all reviews and infomation of Anime works on MyAnimeList. ;)

anime crawler data-analysis data-mining data-science kaggle kaggle-dataset myanimelist python requests scrapy-crawler sqlite

Last synced: 21 Jan 2025

https://github.com/coghost/iparse

To extract HTML/json content identified by CSS selectors(with bs4) with yaml config support

crawler parser parser-library python xkcd yaml

Last synced: 09 Nov 2024

https://github.com/code-inside/sloader

Worker that loads and retrieves data from "slow" endpoints.

crawler drop json yml

Last synced: 16 Nov 2024

https://github.com/floscha/genius-lyrics-crawler

A concurrent crawler to retrieve song lyrics from Genius

celery crawler fluentd genius lyrics mongodb python

Last synced: 09 Nov 2024

https://github.com/trudi-group/mc-crawler

A MobileCoin network crawler. Corresponding preprint available on arXiv (https://arxiv.org/pdf/2111.12364.pdf).

crawler mobilecoin rust

Last synced: 02 Dec 2024

https://github.com/YektaDev/Krawler

A configurable HTML Crawler written in Kotlin (JVM), powered by Coroutines, Kotlin Serialization (JSON), Ktor Client, Exposed, and SQLite.

crawl crawler crawlers crawling

Last synced: 06 Feb 2025

https://github.com/tokenmill/crawling-framework-example

Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.

crawler crawling-framework elasticsearch storm-crawler

Last synced: 06 Jan 2025

https://github.com/joshuaquek/docusite-to-pdf

Provide a URL and this will generate multiple PDF documents of the whole site within the bounds of the URL path. This code repo is for educational purposes only.

crawler documentation-generator html2pdf pdf pdf-converter pdf-document pdf-generation scraper

Last synced: 12 Jan 2025

https://github.com/mikirasora/osuplayedbeatmapscrawler

A crawler that fetch and download osu!beatmaps which you had played

crawler osu

Last synced: 01 Jan 2025

https://github.com/agmmnn/nis-scraper

Scrapy script to scrape nisanyansozluk.com

cli crawler python scraper

Last synced: 21 Dec 2024

https://github.com/hrvadl/goweekly

Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel

article chatgpt crawler go golang openai-api telegram telegram-bot

Last synced: 13 Oct 2024

https://github.com/a-x-/scian

Simple cian stat

cian crawler static-site

Last synced: 11 Jan 2025

https://github.com/alexmili/reachable

Check if a URL exists and is reachable

crawler health-check monitoring reachability webscraping

Last synced: 10 Dec 2024

https://github.com/ribeirogab/technology-insights

Program with the aim of using the data from Stack Overflow Insights 2020 and generating informative graphs.

crawler python scraping typescript

Last synced: 19 Nov 2024

https://github.com/dnlzrgz/winzig

A tiny search engine for personal use.

async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3

Last synced: 05 Nov 2024

https://github.com/igaozp/jobwitcher

JobWitcher 招聘网站爬虫合集

crawler python3 redis scrapy spider

Last synced: 27 Dec 2024

https://github.com/igeligel/TeamFortressOutpostApi

:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.

bot bot-framework crawler steam steam-api steambot teamfortress2

Last synced: 13 Nov 2024

https://github.com/simoninithomas/news-crawler-parse-backend

This is a crawler made with Scrapy.py to crawl french news articles and send them in your Parse.com backend

crawler news parse scrapy

Last synced: 17 Jan 2025

https://github.com/spa5k/quick-scraper

An easy, lightweight scraper built using typescript for good developer experience.

crawler dx easy-to-use esbuild scraper typescript

Last synced: 13 Nov 2024

https://github.com/developerdavi/meli-crawler

Basic web crawler API for getting products from MercadoLibre (BRL | MLB)

api crawler meli-crawler mercadolibre mercadolibre-sdk mercadolivre mercadolivre-sdk nextjs now products react zeit

Last synced: 25 Nov 2024

https://github.com/jmkim/stock-crawler

Universal Stock Crawler

crawler stock stock-market yahoo-finance

Last synced: 26 Jan 2025

https://github.com/ceylonai/apps-article-reader

📚 A powerful desktop app that extracts and analyzes web content using LLaMA AI. Features real-time processing, keyword extraction, and smart summarization. Built with Python + Tkinter.

ai crawler gpt ollama openai

Last synced: 15 Jan 2025

https://github.com/vaibhavpandeyvpz/cbse-scraper

This script scrapes information about schools affiliated with CBSE for a given state.

cbse crawler data schools scraper

Last synced: 09 Nov 2024

https://github.com/xanke/nscan

NodeJs 网页采集器

crawler javascript nodejs

Last synced: 29 Jan 2025

https://github.com/erikjiang/book_crawler

:lizard: book_crawler

crawler douban golang

Last synced: 28 Nov 2024

https://github.com/thaddeusjiang/campcat

キャンプ場予約情報監視 Bot

bot crawler telegram

Last synced: 25 Oct 2024

https://github.com/huzecong/film-spider

Spiders crawling for film listing websites.

crawler

Last synced: 11 Jan 2025

https://github.com/reycn/china-drug-trials-crawler

A web crawler for Chinadrugtrials.org.cn, written in Python 3.6+.

china crawler drug python scraper

Last synced: 12 Jan 2025

https://github.com/feliz-szk/berserk

Berserk: Crawler to increase web traffic(based on tor and privoxy)

anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser

Last synced: 12 Jan 2025

https://github.com/mouday/freeipproxy

通过抓取免费代理ip维护一个有效的proxy代理池

crawler proxy python spider

Last synced: 26 Jan 2025

https://github.com/mouday/httpserver

用于爬虫请求头测试的简单服务器，使用Python + Flask

crawler flask python spider

Last synced: 26 Jan 2025

https://github.com/yakuza8/coronavirus-timeseries-predictor

Timeseries analyzer for coronavirus with recurrent neural network

asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper

Last synced: 24 Jan 2025

https://github.com/t-rekttt/tlu-schedule

chatfuel crawler nodejs vuejs

Last synced: 09 Dec 2024

https://github.com/mrmarble/mineseek

Minecraft server scanner

crawler minecraft minecraft-server scanner slp

Last synced: 17 Jan 2025

https://github.com/pyaesoneaungrgn/2d-crawler

2D crawler for set.or.th

2d 2d-crawler crawler myanmar php

Last synced: 09 Nov 2024

https://github.com/ozansz/github-crawler

A basic utility for crawling users and e-mails of users

crawler github python python3

Last synced: 02 Feb 2025

https://github.com/igeligel/teamfortressoutpostapi

:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.

bot bot-framework crawler steam steam-api steambot teamfortress2

Last synced: 19 Nov 2024

https://github.com/fanyong920/crawlitem-puppeteer

puppeteer抓取商品的例子

chromnium crawler javascript nodejs puppeteer scrapy

Last synced: 23 Dec 2024

https://github.com/mmqnym/etherscan_tracker

Show how to tacker wallet on etherscan.io

crawler ethereum python

Last synced: 18 Jan 2025

https://github.com/vivekg13186/easy_web_crawler

Web crawler around puppeteer to crawler ajax/java script enabled pages.

crawler spider web

Last synced: 04 Feb 2025

https://github.com/archan937/webhead

An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.

api cookies crawler fetch file-uploads forms headless json node redirects scraper spider traversing

Last synced: 10 Nov 2024

https://github.com/capturr/price-extract

Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.

amount crawler crawling currencies currency extract extractor javascript nodejs parser parsing price scraper scraping spider typescript

Last synced: 07 Jan 2025

https://github.com/ktont/curlas

a nodejs spider tool

chrome-extension crawler spider

Last synced: 13 Jan 2025

https://github.com/indatawetrust/reporter

Crawler queue creation tool for paging

crawler

Last synced: 13 Dec 2024

https://github.com/v-braun/hero-scrape

Find the hero (main) image of an URL

crawler fastimage hero hero-image opengraph webscraping

Last synced: 15 Jan 2025

https://github.com/oxylabs/web-crawler

Web Crawler is a tool used to discover target URLs, select the relevant content, and have it delivered in bulk. It crawls websites in real-time and at scale to quickly deliver all content or only the data you need based on your chosen criteria.

api crawler github-python scraper web-crawler web-crawler-python web-scraping web-scraping-api webscraping

Last synced: 17 Nov 2024

https://github.com/arshadkazmi42/scraplink

Scraplink library, for scraping links and images url from a webpage

crawler mongdb nodejs scraplink url web

Last synced: 28 Oct 2024

https://github.com/rimiti/ping-urls

🏓 Ping URLs by batch.

cache crawler ping prerender prerendering seo

Last synced: 28 Dec 2024

https://github.com/sauerbraten/chef

Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.

crawler extinfo go sauerbraten spy stalker

Last synced: 14 Nov 2024

https://github.com/roccomuso/is-duckduck

Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo

crawler duckduck duckduckbot duckduckgo ip js nodejs verify web

Last synced: 07 Jan 2025

https://github.com/bitebait/curry

🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.

api brasil crawler currency-exchange-rates go golang paraguay webcrawler

Last synced: 14 Nov 2024

https://github.com/hctilg/pinterest-crawler

Downloads all images suitable for search

crawler pinterest

Last synced: 07 Nov 2024

https://github.com/achannarasappa/locust-cli

Developer tools to accelerate development of Locust jobs

cli crawler headless-chrome puppeteer scraper

Last synced: 19 Jan 2025

https://github.com/qin2dim/istockphoto-go

📸 Gracefully download dataset from iStockPhoto.

colly crawler istockphoto

Last synced: 28 Dec 2024

https://github.com/obaskly/kikfriender.com-bot

A multifunctional bot that increases your likes and hotness points, as well as adding good positive feedback. It can also flag an account from your choice as fake and add negative feedback. Moreover, it can check a given wordlist and print out kik usernames and store them in a new text file.

ai artificial-intelligence bot checker chrome crawl crawler crawling kik proxies proxy scraper scraping selenium wordlist