Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
![](https://explore-feed.github.com/topics/crawler/crawler.png)
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-12 00:06:36 UTC
- JSON Representation
https://github.com/raspi/scrapy-kuntavaalit2021-almamedia
Fetch Almamedia kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/radityaharya/sitesweeper
Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file
crawler pdf python website-crawler
Last synced: 01 Feb 2025
https://github.com/raspi/scrapy-kuntavaalit2021-sanoma
Fetch Sanoma kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/antoniowd/crawly
Un web crawler para explorar la web en busca de determinada informacion (email, telefonos, etc...)
crawler got jsdom nodejs webcrawler webscraping
Last synced: 06 Feb 2025
https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen
Fetch Keskisuomalainen kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 15 Jan 2025
https://github.com/solracsf/perplexitybot-ips
Collected PerplexityBot IPs
bots crawler ip ipset perplexity
Last synced: 09 Feb 2025
https://github.com/rabattkarte/free-domain-scanner
crawler dns domain domain-name domain-names go golang scanner whois
Last synced: 17 Jan 2025
https://github.com/dnlzrgz/excursionist
Scrapy-powered flight price crawler.
crawler crawlers crawling flight flights playwright scraper scraping-websites scrapy travel traveling
Last synced: 23 Dec 2024
https://github.com/truongdd03/searchengine
A search engine written in c++.
cpp crawler search search-engine
Last synced: 20 Dec 2024
https://github.com/daviddavo/blogspot-crawler
Crawler for blogspot and blogger with beautifulsoup
Last synced: 23 Jan 2025
https://github.com/moe131/webcrawler
Python web crawler designed to scrape websites
crawler crawling-python python python-crawler scraping simhash web-crawler
Last synced: 23 Dec 2024
https://github.com/shaharashe/url-crawler
crawler design-patterns http-requests java
Last synced: 06 Jan 2025
https://github.com/vivekg13186/lucas
A web crawler
crawler crawler-engine crawling-framework java
Last synced: 04 Feb 2025
https://github.com/fredcodee/pexel.com-image-scrapper
download images from pexel.com
Last synced: 08 Jan 2025
https://github.com/madret/selenium_crawler
Selenium Webcrawler based on the chromedriver.
chromedriver crawler human-like selenium selenium-webdriver webcrawler
Last synced: 15 Jan 2025
https://github.com/webdevcave/directory-crawler-php
Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.
crawler crawling directory path php php-library
Last synced: 09 Nov 2024
https://github.com/abdymm/abtelegrambot-sample
sample using Telegram Bot
crawler football php scheduler telegram-bot webhook
Last synced: 10 Jan 2025
https://github.com/leonardopinho/instagramfeed
Image list based on a tag for the Instagram feed.
Last synced: 02 Feb 2025
https://github.com/octcarp/sustech_cs209a-java2_f24_proj
(Spring Boot + Vue3) Stack Overflow data crawling and visualization: Our project of CS209A 2024 Fall: Computer System Design and Applications A (a.k.a. Java 2), SUSTech. Taught by Yida Tao @yidatao .
crawler spring-boot stackexchange sustech visualization
Last synced: 01 Jan 2025
https://github.com/keizerzilla/ssh-hunter
Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).
Last synced: 23 Dec 2024
https://github.com/keizerzilla/search4dwango9
My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8
Last synced: 23 Dec 2024
https://github.com/cristiangreco/gcrawler
A simple (not concurrent) web crawler written in Java.
Last synced: 23 Dec 2024
https://github.com/sanhphanvan96/php-training-crawler
Simple php crawler for training purpose
crawler docker docker-compose nginx php php-fpm
Last synced: 10 Jan 2025
https://github.com/tinoco/ticapsoriginal_div2png
Ticapsoriginal programmatically div design to png generator of html code from url
beutifulsoup crawler data design div2png generated-art generator html2image parse programmatically-layout pycodestyle python requests ticapsoriginal url urllib
Last synced: 09 Jan 2025
https://github.com/ariefrahmansyah/crawler
Simple website crawler using Go programming language.
Last synced: 01 Feb 2025
https://github.com/devindon/movie-crawler
Movie crawler for douban.com, pianku.tv, etc.
Last synced: 02 Feb 2025
https://github.com/vhdm/twitter-hashtag-crawler
Twitter hashtag crawler by selenium, without using the Twitter API ;)
Last synced: 05 Jan 2025
https://github.com/tiennhm/crawl-sanfoundry-mcqs
Sanfoundry MQCS Crawler
beautifulsoup4 bs4 crawler csv flask python
Last synced: 27 Jan 2025
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 17 Nov 2024
https://github.com/cls1991/gank.io-go
A simple crawler for fetching pictures from http://gank.io, implemented in golang.
crawler gankio goquery pictures
Last synced: 10 Jan 2025
https://github.com/capturr/json-deep-equal
Check if json objects contains the same values (ignoring arrays order).
array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript
Last synced: 07 Jan 2025
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 08 Jan 2025
https://github.com/tisfeng/bing-dict
A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.
bing-dictionary command-line crawler nodejs
Last synced: 03 Jan 2025
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 11 Jan 2025
https://github.com/wafflecomposite/yggdrasil-crawler-python
Small Yggdrasil network crawler with CLI, written in Python3
crawler mesh-networks no-dependencies python python3 yggdrasil yggdrasil-api yggdrasil-network
Last synced: 23 Jan 2025
https://github.com/docongminh/vinbdi-crawler
crawl data using scrapy + bs4
bs4-requests crawler scrapy splash
Last synced: 28 Dec 2024
https://github.com/jefftriplett/pholcidae-demo
:spider: A Pholcidae demo for crawling/spidering a website
crawler csv pholcidae python scrapper scrapy-crawler spider toml
Last synced: 10 Jan 2025
https://github.com/lukas-bear/awesome-web-scraping
Best scraping tools collection in town. Find everything you need for scraping, crawling, and processing data from the web
anti-bot bot captcha crawler go java javascript network nodejs perl php proxies proxy proxy-server python ruby rust tools webscraping xml
Last synced: 07 Feb 2025
https://github.com/wingkwong/daily_weather_temperature_in_hong_kong
Crawling daily weather temperature in Hong Kong
crawler hongkong python temperature
Last synced: 24 Dec 2024
https://github.com/bradsec/gofindfiles
Crawl websites attempting to find and download files with matching file types. For use as OSINT or RECON intelligence collection tool.
crawler osint osint-tool recon scraper web-scraper
Last synced: 07 Jan 2025
https://github.com/luickk/vulnerability-crawler
Small python program meant to analyze random sites found on google for any vulnerabilities!
Last synced: 28 Dec 2024
https://github.com/jofaval/open-graph-visualizer
Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol
crawler javascript opengraph scraping web web-scraping
Last synced: 04 Feb 2025
https://github.com/grayhat12/grawler
A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.
crawler scraping scraping-websites scrapper scrapy-crawler
Last synced: 01 Feb 2025
https://github.com/snwfdhmp/3gm-bot
Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.
3gm-bot crawler game-bot task-automation web-crawling
Last synced: 15 Jan 2025
https://github.com/ekojs/web-crawler
Web Crawler untuk mengambil judul penelitian pada Google Scholar
Last synced: 08 Jan 2025
https://github.com/krishpranav/gozap
⚡️ Multiple target ZAP Scanning made in go
cli crawler go go-crawler golang zap
Last synced: 01 Feb 2025
https://github.com/izh318/genie-music-artist-album-crawler
지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.
Last synced: 28 Dec 2024
https://github.com/ma-pony/playwright-spider-utils
Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.
crawl crawler playwright python scrapy selenium spider spiderman
Last synced: 08 Feb 2025
https://github.com/zigai/crawlwright
Web crawling framework powered by Playwright
crawler crawling playwright python scraping wrighter
Last synced: 02 Feb 2025
https://github.com/jesseokeya/linkedin-scraper
Selenium webDriver used to get information from linkedIn
chromedriver crawler linkedin os python scraper selenium-webdriver
Last synced: 25 Dec 2024
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 21 Jan 2025
https://github.com/twknab/django_ajax_web_crawler
Web crawler which retrieves all links on any page. Python & Django-powered.
beautifulsoup4 crawler django-application
Last synced: 25 Dec 2024
https://github.com/lillyschramm/spiegel.de-miner
A bot that automatically saves any posts created at Spiegel.de
Last synced: 01 Jan 2025
https://github.com/arman-aminian/divar-text-exploring
The first practice of Dr. Asgari's NLP lesson - Data Exploration
crawler natural-language-processing nlp preprocessing scrapy
Last synced: 08 Jan 2025
https://github.com/patrickschababerle/schabbi-webscraper
Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites.
crawler puppeteer scraper scraping web-crawler
Last synced: 09 Feb 2025
https://github.com/tryagi/firecrawl
Generated C# SDK based on official Firecrawl OpenAPI specification
ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk
Last synced: 14 Oct 2024
https://github.com/berecat/selenium_facebook_scraper
A simple python3 script used to download a users's friend list from facebook.
automation crawler facebook facebook-scraper webscraper
Last synced: 08 Jan 2025
https://github.com/basemax/css-properties
The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.
crawler css css-properties css-property css3
Last synced: 14 Jan 2025
https://github.com/thamindur/ir-project
Search Engine for Sri Lankan MPs
crawler elasticsearch python scraping search-engine
Last synced: 09 Feb 2025
https://github.com/ri0n/unboxer
MP4 crawler and extractor
crawler extractor mp4 object-oriented-design qt
Last synced: 13 Jan 2025
https://github.com/smikodanic/dex8-sdk
DEX8 SDK is software development kit for DEX8.com platform.
crawler crawler-engine data-extraction dex8 scraper scraping-websites spider
Last synced: 26 Dec 2024
https://github.com/terminaldweller/crawley
A creepy crawler that runs as a sleepy daemon.
Last synced: 26 Dec 2024
https://github.com/pmuens/crawler
Multi-threaded Web crawler with support for custom fetching and persisting logic
crawler crawler-engine rust rust-lang web-crawler web-crawling
Last synced: 26 Dec 2024
https://github.com/tsaohucn/crawler_fb_page
This is crawler use selenium for facebook pages
crawler facebook-page rails ruby selenium
Last synced: 20 Jan 2025
https://github.com/davelongdev/link-report-crawler
A web crawler using Node.js that crawls a site and returns a report showing all internal links.
crawler crawling javascript seo seo-tools
Last synced: 02 Jan 2025
https://github.com/gabrielolobo/crawley
This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.
crawler poetry python scrapping
Last synced: 11 Jan 2025
https://github.com/gnehs/twse-financial-ratios-crawler
透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均
Last synced: 26 Dec 2024
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 11 Jan 2025
https://github.com/sbstjn/tatort
Query information for upcoming Tatort shows
Last synced: 05 Jan 2025
https://github.com/hvtuananh/twitter_crawler
Daemon to call and get tweets from Twitter Public Stream API
crawler java streaming-api tweets twitter twitter-crawler
Last synced: 23 Oct 2024
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 21 Jan 2025
https://github.com/jauharibill/animeindo-crawler
this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage
Last synced: 29 Dec 2024
https://github.com/luminovrym/crawler-tools-js
Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web
crawler crawler-js data js web-scraping
Last synced: 02 Jan 2025