Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
![](https://explore-feed.github.com/topics/crawler/crawler.png)
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-11 00:06:38 UTC
- JSON Representation
https://github.com/mohammadreza-mohammadi94/python-webscraper-projects
A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.
bs4 crawler object-oriented-programming python requests scrapy webscraping
Last synced: 26 Dec 2024
https://github.com/ggteixeira/motorcycle-simulator
A toy project that fetches prices from motorcycles from OLX and does some calculations for those who want to buy them..
crawler motorcycle olx scraper
Last synced: 11 Jan 2025
https://github.com/jarircse16/bot_detection_firewall
Detects and Blocks generic crawlers from your website.
Last synced: 30 Dec 2024
https://github.com/fredcodee/pexel.com-image-scrapper
download images from pexel.com
Last synced: 08 Jan 2025
https://github.com/mikiw/reactweb3
Ethereum transaction crawler in ReactJs.
Last synced: 10 Jan 2025
https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler
StackOverFlow Tag Generator Using a WebCrawler.
Last synced: 22 Dec 2024
https://github.com/dpbm/opendatasus-crawler
A simple crawler using puppeteer
brazil chrome crawler csv datasus nodejs opendatasus pdf puppeteer screenshot sus
Last synced: 19 Jan 2025
https://github.com/theabbie/shopcrawler
Crawler for Discovering Product URLs on E-commerce Websites (assignment)
Last synced: 17 Jan 2025
https://github.com/lilchen96/pokemon-crawler
Crawl JSON-formatted data for Pokémon, based on the PokeAPI.
Last synced: 19 Jan 2025
https://github.com/thamindur/ir-project
Search Engine for Sri Lankan MPs
crawler elasticsearch python scraping search-engine
Last synced: 09 Feb 2025
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 14 Jan 2025
https://github.com/nowshad-sust/corona
A simple data endpoint for coronavirus updates
api corona coronavirus-updates crawler dcoker-compose excel nodejs
Last synced: 23 Jan 2025
https://github.com/ri0n/unboxer
MP4 crawler and extractor
crawler extractor mp4 object-oriented-design qt
Last synced: 13 Jan 2025
https://github.com/vivekg13186/lucas
A web crawler
crawler crawler-engine crawling-framework java
Last synced: 04 Feb 2025
https://github.com/rcmilan/ex-web-scraping
Web Scraping com F#
crawler f-sharp fsharp fsharp-data scraper web-scraping xplot
Last synced: 17 Jan 2025
https://github.com/mg98/ipfs-replicate
Replicate IPFS' distributed data structure locally, based on network traces.
crawler dag ipfs redisgraph scraper
Last synced: 29 Jan 2025
https://github.com/ekojs/web-crawler
Web Crawler untuk mengambil judul penelitian pada Google Scholar
Last synced: 08 Jan 2025
https://github.com/avsbharadwaj/web_crawler
A basic web crawler that prints out the links and description present on a website rescursively
Last synced: 19 Jan 2025
https://github.com/russellsteadman/netscrape
A Node.js framework for creating good bots
bot crawler crawling exclusion rfc9309 scraper scraping web-scraping
Last synced: 03 Jan 2025
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 08 Jan 2025
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 28 Dec 2024
https://github.com/bandie91/extip
Fetch external IP from known ext. ip providers
address cli crawler external ip ipv4-address parallel
Last synced: 03 Jan 2025
https://github.com/iomarmochtar/imagecrawler
Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+
Last synced: 25 Dec 2024
https://github.com/zzzzer91/match_spider
某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:
Last synced: 10 Jan 2025
https://github.com/erickj3/strike-api
this is a web scraping api with nestsj
api crawler flow nestjs scraping typescript
Last synced: 24 Jan 2025
https://github.com/lightbeem3296/scrap-www.floridabar.org
automation crawler csv playwriht python scraper selenium xlsx
Last synced: 19 Jan 2025
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 03 Jan 2025
https://github.com/tsaohucn/crawler_fb_page
This is crawler use selenium for facebook pages
crawler facebook-page rails ruby selenium
Last synced: 20 Jan 2025
https://github.com/tormol/zenphoto-dl
A script for recursively downloading all pictures from zenphoto-based photo albums.
Last synced: 30 Jan 2025
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 29 Jan 2025
https://github.com/luciopaiva/dicio-crawler
Node.js crawler for dicio.com.br.
Last synced: 11 Feb 2025
https://github.com/yuchenq/comp90055-project
This is the lastest version of my project belong to Comp90055.
couchdb crawler data-visualization python3 textblob tweepy
Last synced: 19 Jan 2025
https://github.com/triekai/review-radar
An intelligent tool that analyzes Google Maps reviews to detect potential fake reviews and suspicious patterns.
crawler google-maps nextjs openai react
Last synced: 24 Jan 2025
https://github.com/tonystrawberry/tcj-nihongo-crawler
🤖 Scraper for personal usage
crawler scraper selenium selenium-webdriver
Last synced: 14 Jan 2025
https://github.com/docongminh/vinbdi-crawler
crawl data using scrapy + bs4
bs4-requests crawler scrapy splash
Last synced: 28 Dec 2024
https://github.com/juangesino/ah-bonus-crawler
React + Express application that crawls Albert Heijn's promotions.
crawler crawling express expressjs headless-chrome nodejs react reactjs
Last synced: 23 Jan 2025
https://github.com/eneax/web-crawler
A web crawler built in Node.js
crawler javascript nodejs web-crawler
Last synced: 22 Dec 2024
https://github.com/tsaohucn/crawler_fb_user_group
This is crawler use selenium for facebook user groups
crawler facebook-user-groups rails ruby
Last synced: 20 Jan 2025
https://github.com/jnbdz/xtamia-crawler
(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux
crawler electron foundation foundation-css javascript scraper vuejs xtamia
Last synced: 10 Jan 2025
https://github.com/waived/pastebin-ripper
Scrape all pastes from pastebin page + sub-pages
crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper
Last synced: 29 Jan 2025
https://github.com/kahsolt/tieba-dl
A simple image crawler/downloader for Baidu tieba.
baidu-tieba crawler image-crawler tieba
Last synced: 03 Jan 2025
https://github.com/sbstjn/tatort
Query information for upcoming Tatort shows
Last synced: 05 Jan 2025
https://github.com/reineimi/va2crawl
Website crawler, validator and SEO optimizer
crawler seo-optimization seotools validator website-crawler
Last synced: 10 Jan 2025
https://github.com/n3d1117/sisop17
Esercizio per esame di Sistemi Operativi - 2017
crawler html java parser semaphores synchronization thread-safety threading
Last synced: 19 Dec 2024
https://github.com/hvtuananh/twitter_crawler
Daemon to call and get tweets from Twitter Public Stream API
crawler java streaming-api tweets twitter twitter-crawler
Last synced: 23 Oct 2024
https://github.com/orshahar91/crawler
Simple Web Crawler
crawler crawling-websites image-crawler java servlets webcrawler
Last synced: 28 Dec 2024
https://github.com/mattmoony/webcrawler.py
A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍
beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler
Last synced: 19 Jan 2025
https://github.com/lesterrry/campfire
Shock-drop watching utility
crawler parser web-crawler web-parser
Last synced: 07 Jan 2025
https://github.com/bwh1270/allrecipes-scraper
crawler food-computing scraper scraping scrapy
Last synced: 24 Jan 2025
https://github.com/sanskar107/c-subject-predictor
Predicts topic of a code.
Last synced: 21 Jan 2025
https://github.com/yaoshanliang/linkedinspider
Crawl job information from LinkedIn for data analysis
big-data crawler python social-network-analysis
Last synced: 05 Feb 2025
https://github.com/zawlinnnaing/my-wiki-crawler
A simple program for crawling Burmese wikipedia using Media wiki API.
crawler myanmar-tools python wikipedia-api
Last synced: 25 Dec 2024
https://github.com/zenoyang/webcrawler
一些爬虫代码
crawler scrapy spider web-crawler
Last synced: 17 Jan 2025
https://github.com/mach1el/openproject-crawler
Scraping data on OpenProject
crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling
Last synced: 10 Jan 2025
https://github.com/robin98sun/structured-web-data-crawler
crawler multi-thread structured-web-data
Last synced: 23 Jan 2025
https://github.com/nextlevelshit/node-crawl
Webcrawler for nodejs
crawl crawler javascript nodejs
Last synced: 20 Jan 2025
https://github.com/lig8t555/ecommerce
MERN Stack Ecommerce Store | Running In Production | MVP
baidu-tieba baotu bootstrap crawler douban-music ecommerce-platform fofa mongoose quanjing redux shopping-cart shopping-cart-solution stripe taobao-spider
Last synced: 29 Jan 2025
https://github.com/rmncldyo/google-reverse-image-search
A simple python wrapper designed for leveraging Google's search by image capabilities to perform reverse image searches programatically.
beautifulsoup beautifulsoup4 crawler google google-image google-image-crawler google-image-scraper google-image-search google-images google-reverse-image-crawler google-reverse-image-scraper google-reverse-image-search image image-search python python3 requests reverse-image-search scraper search-by-image
Last synced: 04 Jan 2025
https://github.com/949886/pixiv-crawler
Pixiv illustration info crawler to local MySQL database.
Last synced: 28 Dec 2024
https://github.com/basemax/crawler-news-currency-gold-coins
PHP Crawler to get Persian news related to currency coin and gold.
crawler crawler-php crawler-testing currency currency-exchange-rates gold php php-crawler
Last synced: 09 Feb 2025
https://github.com/tinoco/ticapsoriginal_website_score_overview
Ticapsoriginal website sitemaps checker score overview
advertools beautifulsoup behave bs4 chart crawler linkbuilding matplotlib metrics metrics-visualization parser python requests score sitemaps ticapsoriginal tqdm unittesting urllib
Last synced: 09 Jan 2025
https://github.com/claudio-code/nap-web-crawler
Created It crawler to find broken links in docs of framework and languages
Last synced: 05 Feb 2025
https://github.com/moojing/coinmarketcap-crypto-crawler
A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.
Last synced: 07 Feb 2025
https://github.com/spider-rs/spider-clients
Clients to use with the hosted spider service - spider.cloud
ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping
Last synced: 05 Nov 2024
https://github.com/brianmacintosh/wikicrawler
Sandbox project for manipulating Wikimedia wikis
c-sharp crawler mediawiki-bot wikipedia-bot
Last synced: 30 Dec 2024
https://github.com/tatamiya/gas-new-books-crawler
Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)
Last synced: 21 Jan 2025
https://github.com/bingxyz/btcethcrawler
telegram 比特幣、乙太幣廣播頻道
bash bash-script crawler telegram-bot
Last synced: 22 Jan 2025
https://github.com/roc41d/http-web-crawler
Http web crawler with Nodejs + TDD
crawler http javascript jest jest-test nodejs webcrawler
Last synced: 21 Jan 2025
https://github.com/basemax/crawleryjc
This PHP crawler is designed to scrape news articles and categories from the YJC.ir news agency website. It provides a way to extract valuable data from the website for further analysis or any other purpose.
crawler crawler-php database database-news ir ir-yjc iran news news-database news-yjc php php-crawler yjc yjc-ir yjc-news
Last synced: 09 Feb 2025
https://github.com/grayhat12/grawler
A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.
crawler scraping scraping-websites scrapper scrapy-crawler
Last synced: 01 Feb 2025
https://github.com/apexcaptain/allergy-alert
오늘 날짜를 기준으로 모 대학의 학교 홈페이지에서 제공하는 식당 정보를 Crawling하여 회관별/메뉴 분류 별로 메뉴들과 메뉴 별 알러지 유발 식품에 대한 정보를 알려줍니다.
crawler docker expressjs puppeteer reactjs sqlite typescript
Last synced: 29 Jan 2025