Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-02-05 00:06:37 UTC
JSON Representation

https://github.com/zawlinnnaing/my-wiki-crawler

A simple program for crawling Burmese wikipedia using Media wiki API.

crawler myanmar-tools python wikipedia-api

Last synced: 25 Dec 2024

https://github.com/jjeffcaii/ok-spider

a simple web crawler like scrapy

crawler nodejs scrapy spider

Last synced: 25 Dec 2024

https://github.com/tylpk1216/favorite-youtube-to-video

Download your favorite youtube video in PHP

crawler php tool youtube

Last synced: 26 Jan 2025

https://github.com/mohitk05/drstrange

A simple breadth-first search web crawler

bfs crawler

Last synced: 01 Feb 2025

https://github.com/949886/pixiv-crawler

Pixiv illustration info crawler to local MySQL database.

crawler mysql pixiv

Last synced: 28 Dec 2024

https://github.com/bandie91/extip

Fetch external IP from known ext. ip providers

address cli crawler external ip ipv4-address parallel

Last synced: 03 Jan 2025

https://github.com/zzzzer91/match_spider

某菠菜网站爬虫，该网站已倒闭:disappointed_relieved:

crawler python

Last synced: 10 Jan 2025

https://github.com/zzzzer91/crash

通用多线程爬虫框架。

crawler framework python

Last synced: 10 Jan 2025

https://github.com/spider-rs/spider-clients

Clients to use with the hosted spider service - spider.cloud

ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping

Last synced: 05 Nov 2024

https://github.com/fulcrum6378/twitter_profile_exporter

A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.

crawler exporter profile social-media sqlite twitter twitter-api

Last synced: 03 Jan 2025

https://github.com/tormol/zenphoto-dl

A script for recursively downloading all pictures from zenphoto-based photo albums.

crawler python-script

Last synced: 30 Jan 2025

https://github.com/brianmacintosh/wikicrawler

Sandbox project for manipulating Wikimedia wikis

c-sharp crawler mediawiki-bot wikipedia-bot

Last synced: 30 Dec 2024

https://github.com/tatamiya/gas-new-books-crawler

Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)

crawler gas

Last synced: 21 Jan 2025

https://github.com/tylpk1216/new-taipei-parkinfo

Find the available parking in New Taipei, Taiwan.

crawler golang goverment-data

Last synced: 26 Jan 2025

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 06 Jan 2025

https://github.com/jnbdz/xtamia-crawler

(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux

crawler electron foundation foundation-css javascript scraper vuejs xtamia

Last synced: 10 Jan 2025

https://github.com/kahsolt/tieba-dl

A simple image crawler/downloader for Baidu tieba.

baidu-tieba crawler image-crawler tieba

Last synced: 03 Jan 2025

https://github.com/naem1023/comic-crawler

Comic crawler.

beautifulsoup crawler python3

Last synced: 26 Jan 2025

https://github.com/reineimi/va2crawl

Website crawler, validator and SEO optimizer

crawler seo-optimization seotools validator website-crawler

Last synced: 10 Jan 2025

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 28 Dec 2024

https://github.com/ayoubzulfiqar/spidy

The DART Libraray for Data Crawling & Scrapping

crawler dart flutter scraper scraping spider

Last synced: 03 Jan 2025

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 21 Dec 2024

https://github.com/timpletin/comming-soon

Coming Soon Page - Simple and clean design fully responsive on all screen, Count the days, hours, minutes and seconds for coming event

crawler css java javaweb nextjs nextjs-boilerplate nextjs-typescript nextjs14-typescript object-detection paypal python tailwindui tensorflow typescript

Last synced: 21 Jan 2025

https://github.com/mach1el/openproject-crawler

Scraping data on OpenProject

crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling

Last synced: 10 Jan 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 29 Dec 2024

https://github.com/diego3/python-apps

crawler python webserver

Last synced: 10 Jan 2025

https://github.com/rmncldyo/google-reverse-image-search

A simple python wrapper designed for leveraging Google's search by image capabilities to perform reverse image searches programatically.

beautifulsoup beautifulsoup4 crawler google google-image google-image-crawler google-image-scraper google-image-search google-images google-reverse-image-crawler google-reverse-image-scraper google-reverse-image-search image image-search python python3 requests reverse-image-search scraper search-by-image

Last synced: 04 Jan 2025

https://github.com/yukihirai0505/streamcrawler

akka stream × crawler

akka-streams crawler elasticsearch instagram sbt scala

Last synced: 13 Jan 2025

https://github.com/martinkennelly/websitesearchcrawler

Website Crawler

crawler java website

Last synced: 02 Feb 2025

https://github.com/marceloneppel/crawler

Simple web crawler developed in Go.

crawler go golang web-crawler

Last synced: 30 Jan 2025

https://github.com/ronniery/crawler.synom

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml

Last synced: 21 Dec 2024

https://github.com/cold-bin/jwzx-mail

use golang to construct cqupt-jwzx crawler application

crawler golang

Last synced: 11 Jan 2025

https://github.com/lin-jun-xiang/python-crawler

Using CloudScraper, Requests, API, Thread, Async... for scrape the data

async cloudscraper crawler multithreading python requests scraper selenium

Last synced: 21 Dec 2024

https://github.com/massongit/ibaraki-univ-circle-crawler

Crawls official circles in Ibaraki University from university's website

crawler python

Last synced: 30 Jan 2025

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 04 Jan 2025

https://github.com/zhqiang1989/youtube-graph-collector

A demo in python on how to collect youtube video engagement graph data

crawler graph video youtube

Last synced: 11 Jan 2025

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具，可以自定义抽取自己所需要的字段内容，简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 21 Dec 2024

https://github.com/monumentality/ifiend

Check latest YouTube uploads without leaving the comfort of your terminal.

crawler headless-chrome terminal-based youtube yt-dlp

Last synced: 11 Jan 2025

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 20 Dec 2024

https://github.com/georgynet/crawler

Web Crawler

crawler go golang web-crawler

Last synced: 04 Jan 2025

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 11 Jan 2025

https://github.com/hackthedev/botnet

Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!

botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web

Last synced: 23 Jan 2025

https://github.com/thomas-rothe/symfonywebcrawler

PHP project for helping in SEO

crawler docker php php8 seo sitemap-xml symfony7

Last synced: 17 Jan 2025

https://github.com/kimseogyu/crawling-music-ranks

음원순위 크롤링 코드

crawler jest typescript

Last synced: 21 Dec 2024

https://github.com/josepedrodias/naivebot

attempt to mimic googlebot behaviour in nodejs with nightmarejs

crawler googlebot nightmarejs nodejs robots

Last synced: 21 Jan 2025

https://github.com/qqxs/usda_pomological_watercolors

爬取美国农业部果树水彩的数据

crawler koa2 nodejs watercolors

Last synced: 18 Jan 2025

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 10 Jan 2025

https://github.com/stephanebruckert/gocrawl

Crawl every pages and assets of a web domain

crawler python

Last synced: 21 Dec 2024

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 19 Jan 2025

https://github.com/billy0402/scrapy-tutorial

A learning project from the book 'Scrapy一本就精通'.

course crawler docker mongodb mysql proxy python redis scrapy splash sqlite ubuntu

Last synced: 14 Jan 2025

https://github.com/billy0402/python-application

A learning project from the book 'Python 技術者們'.

course crawler matplotlib opencv pandas python requests selenium sklearn

Last synced: 14 Jan 2025

https://github.com/billy0402/tibame-python-data-analysis

A learning project from TibaMe Python data analysis course.

ai course crawler jupyter-notebook matplotlib pandas python requests

Last synced: 14 Jan 2025

https://github.com/tetreum/xupopter_client

Simple interface to manage Xupopter recipes aswell as it's runners.

crawler scrapper scrapping webscraper

Last synced: 17 Dec 2024

https://github.com/tetreum/xupopter_runner

Executes crawling recipes coming from Xupopter Chrome Extension.

crawler scrapper scrapping webscraper

Last synced: 17 Dec 2024

https://github.com/mirusu400/berryz-dl

Batch download berryz webshare files recursively!

berryz berryz-webshare crawler downloader scraper

Last synced: 26 Dec 2024

https://github.com/marcosvbras/twitton

A simple Python library to make Twitter Search API easily to use

crawler crawling python spider twitter twitter-api

Last synced: 01 Feb 2025

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 14 Dec 2024

https://github.com/jonasrenault/pubchem-api-crawler

Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.

chemistry crawler molecular-formula pubchem python

Last synced: 27 Jan 2025

https://github.com/coding-dream/aspider

A spider run on Android Platform

crawler jsoup spider

Last synced: 11 Jan 2025

https://github.com/estavadormir/scrappist

A web scrapper that takes an URL/URLs and converts into a PDF.

bun cli crawler pdf-generation

Last synced: 11 Jan 2025

https://github.com/danielvigaru/easyreach

crawler for faster amazon reach

amazon crawler python

Last synced: 01 Jan 2025

https://github.com/dnknth/robot.py

Simple web spider

crawler curio python

Last synced: 23 Jan 2025

https://github.com/limdongjin/bill-scraper

Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러

crawler python scraper

Last synced: 12 Jan 2025

https://github.com/kasperomari/simplecrawlerapi

A simple RESTful API that takes a URL and returns all the links in a specific depth.

crawler flask-api flask-restful

Last synced: 12 Jan 2025

https://github.com/fritz-c/itunes-stats

Fetch info on podcasts, etc. from iTunes RSS data

crawler itunes

Last synced: 02 Jan 2025

https://github.com/wilmsn/simple_deye_crawler

A simple crawler to get data from the Deye Inverter using the status webpage

crawler deye fhem inverter shell-script

Last synced: 18 Jan 2025

https://github.com/sajjadanwar0/booking.com-scraping

Scraping booking.com using Selenium and Beautiful Soup

crawler data python scraping selenium

Last synced: 14 Jan 2025

https://github.com/edumucelli/rubybikes

A set of Bike Sharing System parsers in Ruby

bike-sharing crawler ruby

Last synced: 24 Dec 2024

https://github.com/k0nxt3d/web-scrapers

Web Scraping Scripts in PhP and Bash

bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget

Last synced: 12 Jan 2025

https://github.com/xoraus/revieworacle

The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.

ai crawler datascience machinelearning scrappy selenium-webdriver

Last synced: 13 Jan 2025

https://github.com/flaribbit/pixiv-favorites-list

爬取P站收藏夹保存为json格式

crawler pixiv python

Last synced: 27 Jan 2025

https://github.com/iyowei/fs-deep-walk

专注于深度扫描指定磁盘位置。

crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker

Last synced: 29 Dec 2024

https://github.com/tetreum/price-crawler

Article price crawler

crawler nodejs

Last synced: 17 Dec 2024

https://github.com/jenting/compare-drugstore-price

Compare price between cosmeceutical shops

cosmed crawler golang poya side-project watsons

Last synced: 01 Feb 2025

https://github.com/igor-karpukhin/web-crawler

Web site crawler

crawler go website

Last synced: 03 Feb 2025

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 23 Oct 2024

https://github.com/wcygan/crawler

web crawler

crawler crawling tokio tokio-rs web-crawler

Last synced: 12 Jan 2025

https://github.com/g-ongenae/morphalou-crawler

A Crawler for CNRTL's Morphologie words

crawler french lexical-databases list-of-words words

Last synced: 15 Oct 2024

https://github.com/kernelerr/pixivurls

An awesome tool to get Pixiv image URLs.

crawler downloader pixiv

Last synced: 19 Jan 2025

https://github.com/shamsher31/crawler

Simple site crawler that extracts all the URL links from the given website

crawler

Last synced: 12 Jan 2025

https://github.com/splorg/sage

A scraper to get every quote from a book off of Goodreads.

books crawler datamining goodreads goodreads-data python scraper scrapy webcrawling webscraping

Last synced: 21 Jan 2025

https://github.com/moj124/web_crawler

The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.

crawler crawler-python links-spider

Last synced: 20 Jan 2025

https://github.com/waived/google-drive-crawler

Proxy-based crawler to expose public (shared) Google Drive links

crawler crawler-python file-crawler google-drive-api shared-folders web-spider

Last synced: 01 Feb 2025

https://github.com/yyj08070631/web-spider

一个网络蜘蛛

crawler spider webspider

Last synced: 01 Feb 2025

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 12 Jan 2025

https://github.com/radityaharya/sitesweeper

Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file

crawler pdf python website-crawler

Last synced: 01 Feb 2025

https://github.com/ryu1kn/procedural-page-crawler

Page Crawler. Tell it where to go and what to look for.

crawler npm-package scraper

Last synced: 03 Feb 2025

https://github.com/igapyon/selecrawler

Simple selenium based web crawler

chrome crawler java selenium web

Last synced: 06 Jan 2025

https://github.com/dalthviz/csapp

Crawler-Scrapper for the playstore

crawler csapp keyword nlp playstore rating review scrapper

Last synced: 12 Jan 2025

https://github.com/jurooravec/knwldg

Datasets, scrapers, pipelines

companies crawler data dataset non-profit-organizations scraper scrapy

Last synced: 12 Jan 2025

https://github.com/vaenow/crawler-chromeless

A chromeless crawler for coursera

chromeless coursera crawler puppeteer

Last synced: 13 Dec 2024

https://github.com/ndoolan360/go-crawler

A simple web crawling program written in Go in an afternoon. 🕷️🕸️

afternoon-project crawler scraper

Last synced: 18 Jan 2025

https://github.com/m1/smap

smap is a site-mapping engine written in Go.

crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling

Last synced: 10 Dec 2024

https://github.com/vaenow/chromeless-coursera-caption

Chromeless crawler coursera video's caption / subtitle

caption chromeless coursera crawler crx subtitle

Last synced: 13 Dec 2024

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 27 Jan 2025

https://github.com/apurvsikka/mediaverse

MediaVerse is a versatile search engine for various media types such as anime, books and drama

anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv

Last synced: 03 Feb 2025

https://github.com/ssv445/js-rendering-proxy-docker

JS Rendering Proxy API to Handle JS Website in Your Crawler.

crawler proxy puppeteer

Last synced: 18 Jan 2025

https://github.com/tomfran/crawler

A web crawler written in Rust

bloom-filter crawler rust simhash

Last synced: 06 Jan 2025

https://github.com/ronierisonmaciel/crawler

Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.

beautifulsoup4 crawler crawling python python3

Last synced: 30 Jan 2025

https://github.com/intina47/ee_error

implementation of a web crawler using c++

cpp crawler curl gumbo libcurl stanford-nlp web

Last synced: 01 Feb 2025

Crawler Awesome Lists

awesome-crawler 101 awesome-python-primer 68 awesome-fingerprinting 48 awesome-digital-preservation 45

Crawler Categories

2.6 机器学习 50 Research 31 Replay tools 18 Python 18 1.1 语言基础 16 Libraries & Projects 13 Fingerprinting Evasion 13 Sites 12 2.4 Web 前端 10 2.1 爬虫基础 9 3\. 数据库 8 Web archiving 7 Java 7 2.5 数据分析 7 Other digital objects 6 4\. 异步IO 6 2.3 Django 框架 4 Standards and specifications 4 Social Networks 4 2.2 Flask 框架 4