Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-02-06 00:06:18 UTC
JSON Representation

https://github.com/tylpk1216/new-taipei-parkinfo

Find the available parking in New Taipei, Taiwan.

crawler golang goverment-data

Last synced: 26 Jan 2025

https://github.com/tetreum/xupopter_runner

Executes crawling recipes coming from Xupopter Chrome Extension.

crawler scrapper scrapping webscraper

Last synced: 17 Dec 2024

https://github.com/gabrielolobo/crawley

This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.

crawler poetry python scrapping

Last synced: 11 Jan 2025

https://github.com/gnehs/twse-financial-ratios-crawler

透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊，並自動計算平均

crawler nodejs

Last synced: 26 Dec 2024

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 11 Jan 2025

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 26 Dec 2024

https://github.com/sahaavi/web-scraping

Learn Web-Scraping using BeautifulSoup, Selenium and Scrapy with hands on projects!

beautifulsoup4 crawler headless-mode pagination scrapy selenium spider splash web-scraper web-scraping

Last synced: 26 Dec 2024

https://github.com/lsongdev/node-crawler

simple crawler

crawler node-crawler

Last synced: 02 Jan 2025

https://github.com/guanbinrui/img-crawler

A image crawler.

crawler

Last synced: 26 Dec 2024

https://github.com/mohammadreza-mohammadi94/python-webscraper-projects

A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.

bs4 crawler object-oriented-programming python requests scrapy webscraping

Last synced: 26 Dec 2024

https://github.com/ggteixeira/motorcycle-simulator

A toy project that fetches prices from motorcycles from OLX and does some calculations for those who want to buy them..

crawler motorcycle olx scraper

Last synced: 11 Jan 2025

https://github.com/naem1023/comic-crawler

Comic crawler.

beautifulsoup crawler python3

Last synced: 26 Jan 2025

https://github.com/mikiw/reactweb3

Ethereum transaction crawler in ReactJs.

blockchain crawler ethereum

Last synced: 10 Jan 2025

https://github.com/theabbie/shopcrawler

Crawler for Discovering Product URLs on E-commerce Websites (assignment)

crawler

Last synced: 17 Jan 2025

https://github.com/nowshad-sust/corona

A simple data endpoint for coronavirus updates

api corona coronavirus-updates crawler dcoker-compose excel nodejs

Last synced: 23 Jan 2025

https://github.com/tetreum/xupopter_client

Simple interface to manage Xupopter recipes aswell as it's runners.

crawler scrapper scrapping webscraper

Last synced: 17 Dec 2024

https://github.com/bandie91/extip

Fetch external IP from known ext. ip providers

address cli crawler external ip ipv4-address parallel

Last synced: 03 Jan 2025

https://github.com/zzzzer91/match_spider

某菠菜网站爬虫，该网站已倒闭:disappointed_relieved:

crawler python

Last synced: 10 Jan 2025

https://github.com/zzzzer91/crash

通用多线程爬虫框架。

crawler framework python

Last synced: 10 Jan 2025

https://github.com/martinkennelly/websitesearchcrawler

Website Crawler

crawler java website

Last synced: 02 Feb 2025

https://github.com/fulcrum6378/twitter_profile_exporter

A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.

crawler exporter profile social-media sqlite twitter twitter-api

Last synced: 03 Jan 2025

https://github.com/tormol/zenphoto-dl

A script for recursively downloading all pictures from zenphoto-based photo albums.

crawler python-script

Last synced: 30 Jan 2025

https://github.com/flaribbit/pixiv-favorites-list

爬取P站收藏夹保存为json格式

crawler pixiv python

Last synced: 27 Jan 2025

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 23 Oct 2024

https://github.com/billy0402/tibame-python-data-analysis

A learning project from TibaMe Python data analysis course.

ai course crawler jupyter-notebook matplotlib pandas python requests

Last synced: 14 Jan 2025

https://github.com/jnbdz/xtamia-crawler

(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux

crawler electron foundation foundation-css javascript scraper vuejs xtamia

Last synced: 10 Jan 2025

https://github.com/kahsolt/tieba-dl

A simple image crawler/downloader for Baidu tieba.

baidu-tieba crawler image-crawler tieba

Last synced: 03 Jan 2025

https://github.com/reineimi/va2crawl

Website crawler, validator and SEO optimizer

crawler seo-optimization seotools validator website-crawler

Last synced: 10 Jan 2025

https://github.com/billy0402/python-application

A learning project from the book 'Python 技術者們'.

course crawler matplotlib opencv pandas python requests selenium sklearn

Last synced: 14 Jan 2025

https://github.com/ayoubzulfiqar/spidy

The DART Libraray for Data Crawling & Scrapping

crawler dart flutter scraper scraping spider

Last synced: 03 Jan 2025

https://github.com/billy0402/scrapy-tutorial

A learning project from the book 'Scrapy一本就精通'.

course crawler docker mongodb mysql proxy python redis scrapy splash sqlite ubuntu

Last synced: 14 Jan 2025

https://github.com/mach1el/openproject-crawler

Scraping data on OpenProject

crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling

Last synced: 10 Jan 2025

https://github.com/diego3/python-apps

crawler python webserver

Last synced: 10 Jan 2025

https://github.com/rmncldyo/google-reverse-image-search

A simple python wrapper designed for leveraging Google's search by image capabilities to perform reverse image searches programatically.

beautifulsoup beautifulsoup4 crawler google google-image google-image-crawler google-image-scraper google-image-search google-images google-reverse-image-crawler google-reverse-image-scraper google-reverse-image-search image image-search python python3 requests reverse-image-search scraper search-by-image

Last synced: 04 Jan 2025

https://github.com/marceloneppel/crawler

Simple web crawler developed in Go.

crawler go golang web-crawler

Last synced: 30 Jan 2025

https://github.com/cold-bin/jwzx-mail

use golang to construct cqupt-jwzx crawler application

crawler golang

Last synced: 11 Jan 2025

https://github.com/massongit/ibaraki-univ-circle-crawler

Crawls official circles in Ibaraki University from university's website

crawler python

Last synced: 30 Jan 2025

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 04 Jan 2025

https://github.com/zhqiang1989/youtube-graph-collector

A demo in python on how to collect youtube video engagement graph data

crawler graph video youtube

Last synced: 11 Jan 2025

https://github.com/monumentality/ifiend

Check latest YouTube uploads without leaving the comfort of your terminal.

crawler headless-chrome terminal-based youtube yt-dlp

Last synced: 11 Jan 2025

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 11 Jan 2025

https://github.com/thomas-rothe/symfonywebcrawler

PHP project for helping in SEO

crawler docker php php8 seo sitemap-xml symfony7

Last synced: 17 Jan 2025

https://github.com/georgynet/crawler

Web Crawler

crawler go golang web-crawler

Last synced: 04 Jan 2025

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 28 Dec 2024

https://github.com/kernelerr/pixivurls

An awesome tool to get Pixiv image URLs.

crawler downloader pixiv

Last synced: 19 Jan 2025

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 19 Jan 2025

https://github.com/stephanebruckert/gocrawl

Crawl every pages and assets of a web domain

crawler python

Last synced: 21 Dec 2024

https://github.com/coding-dream/aspider

A spider run on Android Platform

crawler jsoup spider

Last synced: 11 Jan 2025

https://github.com/estavadormir/scrappist

A web scrapper that takes an URL/URLs and converts into a PDF.

bun cli crawler pdf-generation

Last synced: 11 Jan 2025

https://github.com/limdongjin/bill-scraper

Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러

crawler python scraper

Last synced: 12 Jan 2025

https://github.com/kasperomari/simplecrawlerapi

A simple RESTful API that takes a URL and returns all the links in a specific depth.

crawler flask-api flask-restful

Last synced: 12 Jan 2025

https://github.com/wilmsn/simple_deye_crawler

A simple crawler to get data from the Deye Inverter using the status webpage

crawler deye fhem inverter shell-script

Last synced: 18 Jan 2025

https://github.com/k0nxt3d/web-scrapers

Web Scraping Scripts in PhP and Bash

bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget

Last synced: 12 Jan 2025

https://github.com/moj124/web_crawler

The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.

crawler crawler-python links-spider

Last synced: 20 Jan 2025

https://github.com/wcygan/crawler

web crawler

crawler crawling tokio tokio-rs web-crawler

Last synced: 12 Jan 2025

https://github.com/shamsher31/crawler

Simple site crawler that extracts all the URL links from the given website

crawler

Last synced: 12 Jan 2025

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 12 Jan 2025

https://github.com/dalthviz/csapp

Crawler-Scrapper for the playstore

crawler csapp keyword nlp playstore rating review scrapper

Last synced: 12 Jan 2025

https://github.com/jurooravec/knwldg

Datasets, scrapers, pipelines

companies crawler data dataset non-profit-organizations scraper scrapy

Last synced: 12 Jan 2025

https://github.com/ndoolan360/go-crawler

A simple web crawling program written in Go in an afternoon. 🕷️🕸️

afternoon-project crawler scraper

Last synced: 18 Jan 2025

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 10 Jan 2025

https://github.com/yyj08070631/web-spider

一个网络蜘蛛

crawler spider webspider

Last synced: 01 Feb 2025

https://github.com/ssv445/js-rendering-proxy-docker

JS Rendering Proxy API to Handle JS Website in Your Crawler.

crawler proxy puppeteer

Last synced: 18 Jan 2025

https://github.com/qqxs/usda_pomological_watercolors

爬取美国农业部果树水彩的数据

crawler koa2 nodejs watercolors

Last synced: 18 Jan 2025

https://github.com/ronierisonmaciel/crawler

Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.

beautifulsoup4 crawler crawling python python3

Last synced: 30 Jan 2025

https://github.com/kbychkov/simplecrawler-app

The GUI for Simplecrawler

crawler simplecrawler spider

Last synced: 18 Jan 2025

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 21 Dec 2024

https://github.com/beckkramer/puppeteer-traverse

Puppeteer utility to easily run a function you define per route on a set of routes.

crawler crawling nodejs puppeteer

Last synced: 19 Jan 2025

https://github.com/josepedrodias/naivebot

attempt to mimic googlebot behaviour in nodejs with nightmarejs

crawler googlebot nightmarejs nodejs robots

Last synced: 21 Jan 2025

https://github.com/datamine/twitter-name-and-shame

Crawler to find Twitter accounts following more than a million users

crawler flask python python-2 twitter

Last synced: 19 Jan 2025

https://github.com/amirsorouri00/crawler

Page-Rank Public python2 projects whice have been turned into python3.

crawler page-rank python

Last synced: 19 Jan 2025

https://github.com/grayhat12/grawler

A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.

crawler scraping scraping-websites scrapper scrapy-crawler

Last synced: 01 Feb 2025

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 19 Jan 2025

https://github.com/cseas/crawler

Recursive web crawler

crawler python seed-webpage

Last synced: 27 Dec 2024

https://github.com/juliocesarscheidt/stock-trader

aws-alb aws-ecs aws-xray crawler flask github-actions mongodb python rabbitmq terraform

Last synced: 24 Jan 2025

https://github.com/onetail/crawler-with-kafka-docker

homework to crawler and anaylsis

analysis crawler kafka-docker

Last synced: 24 Jan 2025

https://github.com/onetail/applenews

simple crawler

crawler simple

Last synced: 24 Jan 2025

https://github.com/indrasaputra/sulong

Simple application that crawls a specific fundraising website and notifies users if there is a new project

bot crawler go golang telegram telegram-bot

Last synced: 19 Jan 2025

https://github.com/dpbm/opendatasus-crawler

A simple crawler using puppeteer

brazil chrome crawler csv datasus nodejs opendatasus pdf puppeteer screenshot sus

Last synced: 19 Jan 2025

https://github.com/lilchen96/pokemon-crawler

Crawl JSON-formatted data for Pokémon, based on the PokeAPI.

crawler pokemon

Last synced: 19 Jan 2025

https://github.com/avsbharadwaj/web_crawler

A basic web crawler that prints out the links and description present on a website rescursively

crawler web

Last synced: 19 Jan 2025

https://github.com/lightbeem3296/scrap-www.floridabar.org

automation crawler csv playwriht python scraper selenium xlsx

Last synced: 19 Jan 2025

https://github.com/rayc2045/ghibli-crawler

Automatically download 1,178 studio Ghibli's work photos

axios crawler ghibli node node-js nodejs puppeteer rest-api restful restful-api

Last synced: 26 Jan 2025

https://github.com/triekai/review-radar

An intelligent tool that analyzes Google Maps reviews to detect potential fake reviews and suspicious patterns.

crawler google-maps nextjs openai react

Last synced: 24 Jan 2025

https://github.com/tsaohucn/crawler_fb_user_group

This is crawler use selenium for facebook user groups

crawler facebook-user-groups rails ruby

Last synced: 20 Jan 2025

https://github.com/khanof89/twitter_scraper

Scrape tweet details from user profile using selenium

crawler scraper selenium twitter twitter-bot

Last synced: 10 Jan 2025

https://github.com/timpletin/comming-soon

Coming Soon Page - Simple and clean design fully responsive on all screen, Count the days, hours, minutes and seconds for coming event

crawler css java javaweb nextjs nextjs-boilerplate nextjs-typescript nextjs14-typescript object-detection paypal python tailwindui tensorflow typescript

Last synced: 21 Jan 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 29 Dec 2024

https://github.com/usethisname1419/connectioncrawler

crawls a website and checks for connections

connection crawler http-headers reporting website-analyzer

Last synced: 26 Jan 2025

https://github.com/byjrk/pal-dos-datebase

crawler python

Last synced: 25 Jan 2025

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 20 Dec 2024

https://github.com/kimseogyu/crawling-music-ranks

음원순위 크롤링 코드

crawler jest typescript

Last synced: 21 Dec 2024

https://github.com/jayzhan211/python-crawler-startups

python crawler learning

crawler python

Last synced: 25 Jan 2025

https://github.com/aminehsan/datamining-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scraping

Last synced: 31 Jan 2025

https://github.com/frostming/daily-wallpaper

A small crawler to get wallpapers from Unsplash

crawler python requests unsplash wallpaper

Last synced: 25 Jan 2025

https://github.com/jianlizh429/crawler

crawler spider

Last synced: 25 Jan 2025

https://github.com/vivekg13186/lucas

A web crawler

crawler crawler-engine crawling-framework java

Last synced: 04 Feb 2025

https://github.com/fredcodee/pexel.com-image-scrapper

download images from pexel.com

crawler image python selenium

Last synced: 08 Jan 2025

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 27 Jan 2025

https://github.com/apurvsikka/mediaverse

MediaVerse is a versatile search engine for various media types such as anime, books and drama

anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv

Last synced: 03 Feb 2025

Crawler Awesome Lists

awesome-crawler 101 awesome-python-primer 68 awesome-fingerprinting 48 awesome-digital-preservation 45

Crawler Categories

2.6 机器学习 50 Research 31 Replay tools 18 Python 18 1.1 语言基础 16 Libraries & Projects 13 Fingerprinting Evasion 13 Sites 12 2.4 Web 前端 10 2.1 爬虫基础 9 3\. 数据库 8 Web archiving 7 Java 7 2.5 数据分析 7 Other digital objects 6 4\. 异步IO 6 2.3 Django 框架 4 Standards and specifications 4 Social Networks 4 2.2 Flask 框架 4