Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2024-11-18 00:06:04 UTC
JSON Representation

https://github.com/jamesjarvis/web-graph

Experiment with web scraping

colly crawler database golang web-graph

Last synced: 15 Oct 2024

https://github.com/keizerzilla/search4dwango9

My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8

crawler datamining doom-wad

Last synced: 05 Nov 2024

https://github.com/zfael/scrape-it-all

Modular web scraper for Node.JS

crawler scraper scraping scraping-websites web-scraping

Last synced: 05 Nov 2024

https://github.com/fritz-c/itunes-stats

Fetch info on podcasts, etc. from iTunes RSS data

crawler itunes

Last synced: 08 Nov 2024

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 07 Nov 2024

https://github.com/sajjadanwar0/booking.com-scraping

Scraping booking.com using Selenium and Beautiful Soup

crawler data python scraping selenium

Last synced: 14 Nov 2024

https://github.com/eghuro/crawlcheck

Extensible web crawler

configuration crawler http plugin python robots-txt sitemap

Last synced: 13 Nov 2024

https://github.com/frobware/grawler

Web Crawler

crawler go

Last synced: 15 Nov 2024

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 27 Oct 2024

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 15 Nov 2024

https://github.com/s3rgeym/wscrap

Command line web scraping tool.

crawler scraping

Last synced: 05 Nov 2024

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 16 Nov 2024

https://github.com/sbstjn/tatort

Query information for upcoming Tatort shows

crawler node nodejs tatort

Last synced: 09 Nov 2024

https://github.com/tungct/tngtcrawler

Crawler using Scrapy

crawler python scrapy

Last synced: 14 Nov 2024

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 31 Oct 2024

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具，可以自定义抽取自己所需要的字段内容，简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 04 Nov 2024

https://github.com/filipsedivy/tachometer-check

🚘 MDČR - kontrola tachometru

crawler czech-republic mdcr

Last synced: 05 Nov 2024

https://github.com/orshahar91/crawler

Simple Web Crawler

crawler crawling-websites image-crawler java servlets webcrawler

Last synced: 07 Nov 2024

https://github.com/tungct/facebookcrawler

crawler graphapi python2

Last synced: 14 Nov 2024

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 16 Nov 2024

https://github.com/guanbinrui/img-crawler

A image crawler.

crawler

Last synced: 26 Oct 2024

https://github.com/wingkwong/daily_weather_temperature_in_hong_kong

Crawling daily weather temperature in Hong Kong

crawler hongkong python temperature

Last synced: 06 Nov 2024

https://github.com/twknab/django_ajax_web_crawler

Web crawler which retrieves all links on any page. Python & Django-powered.

beautifulsoup4 crawler django-application

Last synced: 06 Nov 2024

https://github.com/ndoolan360/go-crawler

A simple web crawling program written in Go in an afternoon. 🕷️🕸️

afternoon-project crawler scraper

Last synced: 17 Nov 2024

https://github.com/andresayac/cuevana3

Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.

crawler cuevana3 php scraper

Last synced: 31 Oct 2024

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 02 Nov 2024

https://github.com/edumucelli/rubybikes

A set of Bike Sharing System parsers in Ruby

bike-sharing crawler ruby

Last synced: 06 Nov 2024

https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp

Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.

anglesharp crawler minhaentrada

Last synced: 08 Nov 2024

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 15 Oct 2024

https://github.com/matheusfaustino/phrawl

Phrawl: A web crawling framework in PHP (or it seems so)

crawler crawling crawling-framework php scraper wip

Last synced: 07 Nov 2024

https://github.com/krishpranav/gozap

⚡️ Multiple target ZAP Scanning made in go

cli crawler go go-crawler golang zap

Last synced: 15 Oct 2024

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 07 Nov 2024

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 08 Nov 2024

https://github.com/peaky-xd/peakys-hub

Open Source

api cpanel crack crawler python scrap scrapy sqli

Last synced: 08 Nov 2024

https://github.com/danielvigaru/easyreach

crawler for faster amazon reach

amazon crawler python

Last synced: 08 Nov 2024

https://github.com/bruce-lee-ly/crawler

Several fun crawler cases implemented in Python.

crawler python

Last synced: 15 Nov 2024

https://github.com/cold-bin/jwzx-mail

use golang to construct cqupt-jwzx crawler application

crawler golang

Last synced: 12 Nov 2024

https://github.com/jarircse16/bot_detection_firewall

Detects and Blocks generic crawlers from your website.

bot crawler php

Last synced: 08 Nov 2024

https://github.com/bing-su/arcalive-crawler-python

아카라이브 크롤러

crawler python

Last synced: 09 Nov 2024

https://github.com/bennettdams/vace-it-crawler

Python (Scrapy) crawler to access data of FACEIT.com

crawler python scrapy

Last synced: 14 Nov 2024

https://github.com/tom-draper/wiki-crawl

A game of path finding through Wikipedia topics.

api crawler crawlers crawling crawling-python game pathfinding python requests wiki wikipedia wikipedia-api wikipedia-search

Last synced: 08 Nov 2024

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 14 Nov 2024

https://github.com/danielemoraschi/sitemap-app

Sitemap generator command line application using dmoraschi/sitemap-common library

crawler php php-library sitemap sitemap-generator

Last synced: 08 Nov 2024

https://github.com/danielemoraschi/go-sitemap-app

crawler golang sitemap sitemap-generator

Last synced: 08 Nov 2024

https://github.com/danielemoraschi/sitemap-common

Simple PHP Sitemap generator and crawler library.

crawler php php-library php-sitemap-generator sitemap

Last synced: 08 Nov 2024

https://github.com/hsiehbocheng/usa-tourist-recommend

crawler mongodb python tableau

Last synced: 14 Nov 2024

https://github.com/russellsteadman/netscrape

A Node.js framework for creating good bots

bot crawler crawling exclusion rfc9309 scraper scraping web-scraping

Last synced: 09 Nov 2024

https://github.com/mohammadreza-mohammadi94/python-webscraper-projects

Webscraper and crawlers projects

crawler object-oriented-programming python webscraping

Last synced: 07 Nov 2024

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 14 Nov 2024

https://github.com/bandie91/extip

Fetch external IP from known ext. ip providers

address cli crawler external ip ipv4-address parallel

Last synced: 09 Nov 2024

https://github.com/humbertodias/go-nie-crawler

Simple crawler that extract some useful informations from sede.administracionespublicas.gob.es.

crawler golang

Last synced: 14 Nov 2024

https://github.com/fulcrum6378/twitter_profile_exporter

A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.

crawler exporter profile social-media sqlite twitter twitter-api

Last synced: 09 Nov 2024

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 12 Nov 2024

https://github.com/der3318/daily-pixiv

Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations

crawler line-notify pixiv workflow

Last synced: 14 Nov 2024

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 25 Oct 2024

https://github.com/tetreum/puppeteer-for-crawling

Daily use crawling methods for puppeteer

crawler crawling puppeteer

Last synced: 21 Oct 2024

https://github.com/kahsolt/tieba-dl

A simple image crawler/downloader for Baidu tieba.

baidu-tieba crawler image-crawler tieba

Last synced: 09 Nov 2024

https://github.com/kahsolt/qzone_mood_dumper

Dump your qzone mood(说说) history to local SQL database storage

crawler dumper qzone-mood

Last synced: 09 Nov 2024

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 28 Oct 2024

https://github.com/fmind/fincrawl

Crawl documents, metadata, and files from financial institutions

crawler documents finance python scrapy

Last synced: 06 Nov 2024

https://github.com/naem1023/comic-crawler

Comic crawler.

beautifulsoup crawler python3

Last synced: 13 Oct 2024

https://github.com/martinius96/web-scraper

Web scraper on ESP8266 board in client mode. Postprocessing in PHP with regular expressions.

arduino bot code crawler esp32 esp8266 html mysql php php7 robot scraper source web

Last synced: 09 Nov 2024

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 07 Nov 2024

https://github.com/gesiscss/github_traffic_crawler

Retrieve the data information from the repositories (insight, usage, commits)

crawler github traffic

Last synced: 09 Nov 2024

https://github.com/ayoubzulfiqar/spidy

The DART Libraray for Data Crawling & Scrapping

crawler dart flutter scraper scraping spider

Last synced: 09 Nov 2024

https://github.com/genfuture/cryptocurrency-scraper

Cryptocurrency Data Crawler 🚀 High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, social metrics, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools.

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 16 Nov 2024

https://github.com/bradsec/gofindfiles

Crawl websites attempting to find and download files with matching file types. For use as OSINT or RECON intelligence collection tool.

crawler osint osint-tool recon scraper web-scraper

Last synced: 10 Nov 2024

https://github.com/bockstaller/europarl-crawler

Crawler for the documents published by the European Parliament

crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union

Last synced: 10 Nov 2024

https://github.com/ceylonai/apps-article-reader

📚 A powerful desktop app that extracts and analyzes web content using LLaMA AI. Features real-time processing, keyword extraction, and smart summarization. Built with Python + Tkinter.

ai crawler gpt ollama openai

Last synced: 15 Nov 2024

https://github.com/joeri-abbo/python-credly-scraper

This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an

badges crawler credly data-extraction json organizations python python3 requests-library skills web-crawling

Last synced: 15 Nov 2024

https://github.com/lencx/hero-crawler

⚔️ Hero Info(King Of Glory)

crawler hero

Last synced: 10 Nov 2024

https://github.com/tech-espm/misc-webbot

This project is aimed on creating personal assistants for replying messages about specifics issues.

classification-model crawler nlp

Last synced: 12 Nov 2024

https://github.com/tisfeng/bing-dict

A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.

bing-dictionary command-line crawler nodejs

Last synced: 09 Nov 2024

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 18 Nov 2024

https://github.com/capturr/json-deep-equal

Check if json objects contains the same values (ignoring arrays order).

array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript

Last synced: 10 Nov 2024

https://github.com/snwfdhmp/3gm-bot

Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.

3gm-bot crawler game-bot task-automation web-crawling

Last synced: 15 Nov 2024

https://github.com/datvodinh/laptop-price-prediction

An End to End Data Science Project about Laptop Price Prediction

crawler ensemble-learning scrapy selenium xgboost

Last synced: 17 Nov 2024

https://github.com/tryagi/firecrawl

Generated C# SDK based on official Firecrawl OpenAPI specification

ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk

Last synced: 14 Oct 2024

https://github.com/brianmacintosh/wikicrawler

Sandbox project for manipulating Wikimedia wikis

c-sharp crawler mediawiki-bot wikipedia-bot

Last synced: 08 Nov 2024

https://github.com/shaoxiongdu/skyeye

一个基于SpringBoot的全网热点爬虫项目，原始热搜数据会入库，分词统计会存入Redis。方便之后的数据分析。

crawler crawlers mysql redis spring spring-boot

Last synced: 16 Nov 2024

https://github.com/davelongdev/link-report-crawler

A web crawler using Node.js that crawls a site and returns a report showing all internal links.

crawler crawling javascript seo seo-tools

Last synced: 09 Nov 2024

https://github.com/diego3/python-apps

crawler python webserver

Last synced: 12 Nov 2024

https://github.com/stephanebruckert/gocrawl

Crawl every pages and assets of a web domain

crawler python

Last synced: 03 Nov 2024

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 09 Nov 2024

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler