Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-12 00:06:36 UTC
- JSON Representation
https://github.com/raspi/scrapy-kuntavaalit2021-sanoma
Fetch Sanoma kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/antoniowd/crawly
Un web crawler para explorar la web en busca de determinada informacion (email, telefonos, etc...)
crawler got jsdom nodejs webcrawler webscraping
Last synced: 06 Feb 2025
https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen
Fetch Keskisuomalainen kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 15 Jan 2025
https://github.com/rabattkarte/free-domain-scanner
crawler dns domain domain-name domain-names go golang scanner whois
Last synced: 17 Jan 2025
https://github.com/dnlzrgz/excursionist
Scrapy-powered flight price crawler.
crawler crawlers crawling flight flights playwright scraper scraping-websites scrapy travel traveling
Last synced: 23 Dec 2024
https://github.com/shaharashe/url-crawler
crawler design-patterns http-requests java
Last synced: 06 Jan 2025
https://github.com/daviddavo/blogspot-crawler
Crawler for blogspot and blogger with beautifulsoup
Last synced: 23 Jan 2025
https://github.com/moe131/webcrawler
Python web crawler designed to scrape websites
crawler crawling-python python python-crawler scraping simhash web-crawler
Last synced: 23 Dec 2024
https://github.com/tryagi/firecrawl
Generated C# SDK based on official Firecrawl OpenAPI specification
ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk
Last synced: 14 Oct 2024
https://github.com/solracsf/perplexitybot-ips
Collected PerplexityBot IPs
bots crawler ip ipset perplexity
Last synced: 09 Feb 2025
https://github.com/vivekg13186/lucas
A web crawler
crawler crawler-engine crawling-framework java
Last synced: 04 Feb 2025
https://github.com/fredcodee/pexel.com-image-scrapper
download images from pexel.com
Last synced: 08 Jan 2025
https://github.com/madret/selenium_crawler
Selenium Webcrawler based on the chromedriver.
chromedriver crawler human-like selenium selenium-webdriver webcrawler
Last synced: 15 Jan 2025
https://github.com/abdymm/abtelegrambot-sample
sample using Telegram Bot
crawler football php scheduler telegram-bot webhook
Last synced: 10 Jan 2025
https://github.com/marcosvbras/twitton
A simple Python library to make Twitter Search API easily to use
crawler crawling python spider twitter twitter-api
Last synced: 01 Feb 2025
https://github.com/webdevcave/directory-crawler-php
Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.
crawler crawling directory path php php-library
Last synced: 09 Nov 2024
https://github.com/octcarp/sustech_cs209a-java2_f24_proj
(Spring Boot + Vue3) Stack Overflow data crawling and visualization: Our project of CS209A 2024 Fall: Computer System Design and Applications A (a.k.a. Java 2), SUSTech. Taught by Yida Tao @yidatao .
crawler spring-boot stackexchange sustech visualization
Last synced: 01 Jan 2025
https://github.com/keizerzilla/ssh-hunter
Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).
Last synced: 23 Dec 2024
https://github.com/keizerzilla/search4dwango9
My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8
Last synced: 23 Dec 2024
https://github.com/cristiangreco/gcrawler
A simple (not concurrent) web crawler written in Java.
Last synced: 23 Dec 2024
https://github.com/sanhphanvan96/php-training-crawler
Simple php crawler for training purpose
crawler docker docker-compose nginx php php-fpm
Last synced: 10 Jan 2025
https://github.com/jenting/compare-drugstore-price
Compare price between cosmeceutical shops
cosmed crawler golang poya side-project watsons
Last synced: 01 Feb 2025
https://github.com/rkeytech/scrapy-adventure
A Scrapy Framework project on scraping a site with pagination, extracting item information, and outputing it to a csv file or inserting it to a database.
beginner-friendly beginner-project crawler csv database python python3 scrapy scrapy-framework spider web-scraping
Last synced: 10 Feb 2025
https://github.com/tinoco/ticapsoriginal_div2png
Ticapsoriginal programmatically div design to png generator of html code from url
beutifulsoup crawler data design div2png generated-art generator html2image parse programmatically-layout pycodestyle python requests ticapsoriginal url urllib
Last synced: 09 Jan 2025
https://github.com/devindon/movie-crawler
Movie crawler for douban.com, pianku.tv, etc.
Last synced: 02 Feb 2025
https://github.com/cls1991/gank.io-go
A simple crawler for fetching pictures from http://gank.io, implemented in golang.
crawler gankio goquery pictures
Last synced: 10 Jan 2025
https://github.com/tiennhm/crawl-sanfoundry-mcqs
Sanfoundry MQCS Crawler
beautifulsoup4 bs4 crawler csv flask python
Last synced: 27 Jan 2025
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 17 Nov 2024
https://github.com/capturr/json-deep-equal
Check if json objects contains the same values (ignoring arrays order).
array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript
Last synced: 07 Jan 2025
https://github.com/wafflecomposite/yggdrasil-crawler-python
Small Yggdrasil network crawler with CLI, written in Python3
crawler mesh-networks no-dependencies python python3 yggdrasil yggdrasil-api yggdrasil-network
Last synced: 23 Jan 2025
https://github.com/waived/google-drive-crawler
Proxy-based crawler to expose public (shared) Google Drive links
crawler crawler-python file-crawler google-drive-api shared-folders web-spider
Last synced: 01 Feb 2025
https://github.com/jefftriplett/pholcidae-demo
:spider: A Pholcidae demo for crawling/spidering a website
crawler csv pholcidae python scrapper scrapy-crawler spider toml
Last synced: 10 Jan 2025
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 08 Jan 2025
https://github.com/lukas-bear/awesome-web-scraping
Best scraping tools collection in town. Find everything you need for scraping, crawling, and processing data from the web
anti-bot bot captcha crawler go java javascript network nodejs perl php proxies proxy proxy-server python ruby rust tools webscraping xml
Last synced: 07 Feb 2025
https://github.com/wingkwong/daily_weather_temperature_in_hong_kong
Crawling daily weather temperature in Hong Kong
crawler hongkong python temperature
Last synced: 24 Dec 2024
https://github.com/tisfeng/bing-dict
A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.
bing-dictionary command-line crawler nodejs
Last synced: 03 Jan 2025
https://github.com/ryoii/hook
A declarative Java crawler framework
crawler declarative java java-crawler-framework jdk11
Last synced: 24 Jan 2025
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 11 Jan 2025
https://github.com/bradsec/gofindfiles
Crawl websites attempting to find and download files with matching file types. For use as OSINT or RECON intelligence collection tool.
crawler osint osint-tool recon scraper web-scraper
Last synced: 07 Jan 2025
https://github.com/radityaharya/sitesweeper
Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file
crawler pdf python website-crawler
Last synced: 01 Feb 2025
https://github.com/earelin/jwraith
A Java clone of the Wraith website comparison tool.
crawler screenshots screenshots-comparison selenium webtest
Last synced: 19 Dec 2024
https://github.com/jofaval/open-graph-visualizer
Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol
crawler javascript opengraph scraping web web-scraping
Last synced: 04 Feb 2025
https://github.com/bockstaller/europarl-crawler
Crawler for the documents published by the European Parliament
crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union
Last synced: 06 Jan 2025
https://github.com/pranavj1001/webcrawler
A simple Web Crawler
crawler java javascript nodejs web-crawler
Last synced: 02 Feb 2025
https://github.com/mindfiredigital/deepscanbot
It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.
bot crawl crawler go golang google webcrawler
Last synced: 28 Dec 2024
https://github.com/leonardopinho/instagramfeed
Image list based on a tag for the Instagram feed.
Last synced: 02 Feb 2025
https://github.com/jesseokeya/linkedin-scraper
Selenium webDriver used to get information from linkedIn
chromedriver crawler linkedin os python scraper selenium-webdriver
Last synced: 25 Dec 2024
https://github.com/snwfdhmp/3gm-bot
Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.
3gm-bot crawler game-bot task-automation web-crawling
Last synced: 15 Jan 2025
https://github.com/twknab/django_ajax_web_crawler
Web crawler which retrieves all links on any page. Python & Django-powered.
beautifulsoup4 crawler django-application
Last synced: 25 Dec 2024
https://github.com/lillyschramm/spiegel.de-miner
A bot that automatically saves any posts created at Spiegel.de
Last synced: 01 Jan 2025
https://github.com/ekojs/web-crawler
Web Crawler untuk mengambil judul penelitian pada Google Scholar
Last synced: 08 Jan 2025
https://github.com/oleksandr-moik/spring-boot-web-crawler
Web Crawler app on Spring Boot. Getting categories and relevant news category.
crawler gradle java spring-boot
Last synced: 02 Feb 2025
https://github.com/der3318/daily-pixiv
Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations
crawler line-notify pixiv workflow
Last synced: 13 Jan 2025
https://github.com/tylpk1216/new-taipei-parkinfo
Find the available parking in New Taipei, Taiwan.
Last synced: 26 Jan 2025
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 21 Jan 2025
https://github.com/tylpk1216/favorite-youtube-to-video
Download your favorite youtube video in PHP
Last synced: 26 Jan 2025
https://github.com/jackfsuia/chats-crawler
Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。
crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser
Last synced: 13 Jan 2025
https://github.com/arman-aminian/divar-text-exploring
The first practice of Dr. Asgari's NLP lesson - Data Exploration
crawler natural-language-processing nlp preprocessing scrapy
Last synced: 08 Jan 2025
https://github.com/berecat/selenium_facebook_scraper
A simple python3 script used to download a users's friend list from facebook.
automation crawler facebook facebook-scraper webscraper
Last synced: 08 Jan 2025
https://github.com/grayhat12/grawler
A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.
crawler scraping scraping-websites scrapper scrapy-crawler
Last synced: 01 Feb 2025
https://github.com/smikodanic/dex8-sdk
DEX8 SDK is software development kit for DEX8.com platform.
crawler crawler-engine data-extraction dex8 scraper scraping-websites spider
Last synced: 26 Dec 2024
https://github.com/terminaldweller/crawley
A creepy crawler that runs as a sleepy daemon.
Last synced: 26 Dec 2024
https://github.com/pmuens/crawler
Multi-threaded Web crawler with support for custom fetching and persisting logic
crawler crawler-engine rust rust-lang web-crawler web-crawling
Last synced: 26 Dec 2024
https://github.com/krishpranav/gozap
⚡️ Multiple target ZAP Scanning made in go
cli crawler go go-crawler golang zap
Last synced: 01 Feb 2025
https://github.com/viko16/hatcher
🐣[WIP] Provides APIs by simple configuration.
api api-server cli crawler koa-middleware nodejs spider
Last synced: 26 Jan 2025
https://github.com/basemax/css-properties
The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.
crawler css css-properties css-property css3
Last synced: 14 Jan 2025
https://github.com/gabrielolobo/crawley
This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.
crawler poetry python scrapping
Last synced: 11 Jan 2025
https://github.com/gnehs/twse-financial-ratios-crawler
透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均
Last synced: 26 Dec 2024
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 11 Jan 2025
https://github.com/patrickschababerle/schabbi-webscraper
Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites.
crawler puppeteer scraper scraping web-crawler
Last synced: 09 Feb 2025
https://github.com/ma-pony/playwright-spider-utils
Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.
crawl crawler playwright python scrapy selenium spider spiderman
Last synced: 08 Feb 2025
https://github.com/sahaavi/web-scraping
Learn Web-Scraping using BeautifulSoup, Selenium and Scrapy with hands on projects!
beautifulsoup4 crawler headless-mode pagination scrapy selenium spider splash web-scraper web-scraping
Last synced: 26 Dec 2024