Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
![](https://explore-feed.github.com/topics/crawler/crawler.png)
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-13 00:06:45 UTC
- JSON Representation
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 24 Dec 2024
https://github.com/tetreum/xupopter_chrome_extension
Extension to easily create crawling recipes
crawler scrapper scrapping webscraper
Last synced: 09 Feb 2025
https://github.com/longluo/spider
My Python Spider / Crawler
crawler python spider twitter weibo weibo-crawler weibo-spider
Last synced: 06 Jan 2025
https://github.com/sahaavi/web-scraping
Learn Web-Scraping using BeautifulSoup, Selenium and Scrapy with hands on projects!
beautifulsoup4 crawler headless-mode pagination scrapy selenium spider splash web-scraper web-scraping
Last synced: 26 Dec 2024
https://github.com/mohammadreza-mohammadi94/python-webscraper-projects
A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.
bs4 crawler object-oriented-programming python requests scrapy webscraping
Last synced: 26 Dec 2024
https://github.com/thiiagoms/car-stealth
REST API to all cars that were stolen
Last synced: 16 Jan 2025
https://github.com/ggteixeira/motorcycle-simulator
A toy project that fetches prices from motorcycles from OLX and does some calculations for those who want to buy them..
crawler motorcycle olx scraper
Last synced: 11 Jan 2025
https://github.com/bockstaller/europarl-crawler
Crawler for the documents published by the European Parliament
crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union
Last synced: 06 Jan 2025
https://github.com/iamtonmoy0/sitemap-crawler
site map crawler with golang and goquery
Last synced: 05 Jan 2025
https://github.com/mikiw/reactweb3
Ethereum transaction crawler in ReactJs.
Last synced: 10 Jan 2025
https://github.com/jayzhan211/python-crawler-startups
python crawler learning
Last synced: 25 Jan 2025
https://github.com/theabbie/shopcrawler
Crawler for Discovering Product URLs on E-commerce Websites (assignment)
Last synced: 17 Jan 2025
https://github.com/mindfiredigital/deepscanbot
It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.
bot crawl crawler go golang google webcrawler
Last synced: 28 Dec 2024
https://github.com/bruce-lee-ly/crawler
Several fun crawler cases implemented in Python.
Last synced: 16 Jan 2025
https://github.com/brianbruggeman/vax
A vaccination signup tool
covid-19 crawler signup vaccination
Last synced: 16 Jan 2025
https://github.com/nowshad-sust/corona
A simple data endpoint for coronavirus updates
api corona coronavirus-updates crawler dcoker-compose excel nodejs
Last synced: 23 Jan 2025
https://github.com/timzatko/fiit-vinf-1
School project - data crawling, storing using ElasticSearch and visualisation.
Last synced: 08 Feb 2025
https://github.com/jul10l1r4/objetive
This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*
bigdata crawler data-science security-tools web
Last synced: 20 Jan 2025
https://github.com/oleksandr-moik/spring-boot-web-crawler
Web Crawler app on Spring Boot. Getting categories and relevant news category.
crawler gradle java spring-boot
Last synced: 02 Feb 2025
https://github.com/notreeceharris/webstalker
🕸 A Powerful Relational Web Crawler
Last synced: 14 Jan 2025
https://github.com/aminehsan/datamining-divar.ir
Analyzing and Extracting Insights from Ads on 'divar.ir'
crawler data-mining data-science divar-ir scraping
Last synced: 31 Jan 2025
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 29 Jan 2025
https://github.com/russellsteadman/netscrape
A Node.js framework for creating good bots
bot crawler crawling exclusion rfc9309 scraper scraping web-scraping
Last synced: 03 Jan 2025
https://github.com/yuchenq/comp90055-project
This is the lastest version of my project belong to Comp90055.
couchdb crawler data-visualization python3 textblob tweepy
Last synced: 19 Jan 2025
https://github.com/mustafadalga/website-crawler
Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.
crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling
Last synced: 18 Jan 2025
https://github.com/kestarumper/imagecrawler
Downloads images from given URL
Last synced: 06 Jan 2025
https://github.com/xoraus/revieworacle
The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.
ai crawler datascience machinelearning scrappy selenium-webdriver
Last synced: 13 Jan 2025
https://github.com/muhfalihr/pyxdtelebot
PyXDTeleBot is a Telegram bot created using the Python programming language, specifically designed to facilitate the seamless sharing of media such as photos and videos from Twitter user posts.
crawler crawling crawling-python crontab python3 telegram-bot telegram-bot-api twitter twitter-api x
Last synced: 12 Feb 2025
https://github.com/bandie91/extip
Fetch external IP from known ext. ip providers
address cli crawler external ip ipv4-address parallel
Last synced: 03 Jan 2025
https://github.com/zzzzer91/match_spider
某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:
Last synced: 10 Jan 2025
https://github.com/juangesino/ah-bonus-crawler
React + Express application that crawls Albert Heijn's promotions.
crawler crawling express expressjs headless-chrome nodejs react reactjs
Last synced: 23 Jan 2025
https://github.com/eneax/web-crawler
A web crawler built in Node.js
crawler javascript nodejs web-crawler
Last synced: 22 Dec 2024
https://github.com/tssujt/async-crawler-sample
A simple crawler sample based on asyncio~
Last synced: 22 Jan 2025
https://github.com/ymdarake/otenki-crawler
Yet another weather data scraper.
Last synced: 16 Jan 2025
https://github.com/shentengtu/cht-yp-crawler
Simple Crawler of www.iyp.com.tw.
crawler node-js nodejs yellow-pages yellowpages
Last synced: 11 Jan 2025
https://github.com/waived/pastebin-ripper
Scrape all pastes from pastebin page + sub-pages
crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper
Last synced: 29 Jan 2025
https://github.com/kartikmehta8/pycrawler
PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.
Last synced: 16 Jan 2025
https://github.com/pinpox/go-random-downloader
Download Html using "Random Page"
Last synced: 28 Jan 2025
https://github.com/iyowei/fs-deep-walk
专注于深度扫描指定磁盘位置。
crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker
Last synced: 29 Dec 2024
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 03 Jan 2025
https://github.com/jeanluc162/prnt-sc-crawler
Crawler for the Website prnt.sc
crawler net5 net50 prntsc screenshots
Last synced: 16 Jan 2025
https://github.com/tonystrawberry/tcj-nihongo-crawler
🤖 Scraper for personal usage
crawler scraper selenium selenium-webdriver
Last synced: 14 Jan 2025
https://github.com/tormol/zenphoto-dl
A script for recursively downloading all pictures from zenphoto-based photo albums.
Last synced: 30 Jan 2025
https://github.com/andresayac/cuevana3
Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.
Last synced: 10 Feb 2025
https://github.com/bwh1270/allrecipes-scraper
crawler food-computing scraper scraping scrapy
Last synced: 24 Jan 2025
https://github.com/sanskar107/c-subject-predictor
Predicts topic of a code.
Last synced: 21 Jan 2025
https://github.com/yaoshanliang/linkedinspider
Crawl job information from LinkedIn for data analysis
big-data crawler python social-network-analysis
Last synced: 05 Feb 2025
https://github.com/zahraarshia/cti_crawl
This cyber threat intelligence crawler can be used to gather information from various sources, including open-source and commercial feeds.
crawler cti cyber-news-bot cyber-threat-intelligence mongodb python scrapy sqlite3 web-scraper
Last synced: 09 Jan 2025
https://github.com/machinecyc/lotteryinsight
Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.
crawler data docker lottery mysql-database python3 taiwan
Last synced: 01 Feb 2025
https://github.com/ariefrahmansyah/crawler
Simple website crawler using Go programming language.
Last synced: 01 Feb 2025
https://github.com/phanletrunghieu/webcrawler
A web crawler with Spring MVC
crawler java servlet spring-mvc springframework
Last synced: 28 Jan 2025
https://github.com/robin98sun/structured-web-data-crawler
crawler multi-thread structured-web-data
Last synced: 23 Jan 2025
https://github.com/basemax/crawler-news-currency-gold-coins
PHP Crawler to get Persian news related to currency coin and gold.
crawler crawler-php crawler-testing currency currency-exchange-rates gold php php-crawler
Last synced: 09 Feb 2025
https://github.com/tinoco/ticapsoriginal_website_score_overview
Ticapsoriginal website sitemaps checker score overview
advertools beautifulsoup behave bs4 chart crawler linkbuilding matplotlib metrics metrics-visualization parser python requests score sitemaps ticapsoriginal tqdm unittesting urllib
Last synced: 09 Jan 2025
https://github.com/jnbdz/xtamia-crawler
(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux
crawler electron foundation foundation-css javascript scraper vuejs xtamia
Last synced: 10 Jan 2025
https://github.com/lig8t555/ecommerce
MERN Stack Ecommerce Store | Running In Production | MVP
baidu-tieba baotu bootstrap crawler douban-music ecommerce-platform fofa mongoose quanjing redux shopping-cart shopping-cart-solution stripe taobao-spider
Last synced: 29 Jan 2025
https://github.com/kahsolt/tieba-dl
A simple image crawler/downloader for Baidu tieba.
baidu-tieba crawler image-crawler tieba
Last synced: 03 Jan 2025
https://github.com/r3c0ger/douban-movie-top250-crawler
Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.
beautifulsoup4 crawler lxml python3 spider
Last synced: 09 Jan 2025
https://github.com/der3318/daily-pixiv
Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations
crawler line-notify pixiv workflow
Last synced: 13 Jan 2025
https://github.com/jlenon7/sef_automation
📑 Crawler that automatically enrol in open vacancies in SEF website.
athenna crawler esm nodejs playwright portugal residence sef typescript
Last synced: 13 Dec 2024
https://github.com/claudio-code/nap-web-crawler
Created It crawler to find broken links in docs of framework and languages
Last synced: 05 Feb 2025
https://github.com/tetreum/puppeteer-for-crawling
Daily use crawling methods for puppeteer
Last synced: 04 Feb 2025
https://github.com/reineimi/va2crawl
Website crawler, validator and SEO optimizer
crawler seo-optimization seotools validator website-crawler
Last synced: 10 Jan 2025
https://github.com/zhanziyuan/webdownloader
Download elements from the specified website.
crawler downloader image image-downloader python python-crawler web
Last synced: 08 Jan 2025
https://github.com/raspi/scrapy-kuntavaalit2021-almamedia
Fetch Almamedia kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/raspi/scrapy-kuntavaalit2021-sanoma
Fetch Sanoma kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/roc41d/http-web-crawler
Http web crawler with Nodejs + TDD
crawler http javascript jest jest-test nodejs webcrawler
Last synced: 21 Jan 2025
https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen
Fetch Keskisuomalainen kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/apexcaptain/allergy-alert
오늘 날짜를 기준으로 모 대학의 학교 홈페이지에서 제공하는 식당 정보를 Crawling하여 회관별/메뉴 분류 별로 메뉴들과 메뉴 별 알러지 유발 식품에 대한 정보를 알려줍니다.
crawler docker expressjs puppeteer reactjs sqlite typescript
Last synced: 29 Jan 2025
https://github.com/zaneh/ocw-crawler
Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.
crawler kimurai mit ocw opencourseware spider
Last synced: 15 Jan 2025
https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez
Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.
beautifulsoup crawler immigration web
Last synced: 21 Jan 2025
https://github.com/basemax/crawleryjc
This PHP crawler is designed to scrape news articles and categories from the YJC.ir news agency website. It provides a way to extract valuable data from the website for further analysis or any other purpose.
crawler crawler-php database database-news ir ir-yjc iran news news-database news-yjc php php-crawler yjc yjc-ir yjc-news
Last synced: 09 Feb 2025
https://github.com/eklem/vinmonopolet-crawler
Crawling Vinmonopolet-data and indexing it to a norch search index
crawler dataset javascript norch search-engine
Last synced: 01 Feb 2025
https://github.com/solracsf/perplexitybot-ips
Collected PerplexityBot IPs
bots crawler ip ipset perplexity
Last synced: 09 Feb 2025
https://github.com/jyasskin/pbot-crawler
Crawler for PBOT's website to show what has changed.
Last synced: 28 Jan 2025
https://github.com/vivekg13186/lucas
A web crawler
crawler crawler-engine crawling-framework java
Last synced: 04 Feb 2025
https://github.com/fredcodee/pexel.com-image-scrapper
download images from pexel.com
Last synced: 08 Jan 2025
https://github.com/mach1el/openproject-crawler
Scraping data on OpenProject
crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling
Last synced: 10 Jan 2025
https://github.com/madret/selenium_crawler
Selenium Webcrawler based on the chromedriver.
chromedriver crawler human-like selenium selenium-webdriver webcrawler
Last synced: 15 Jan 2025
https://github.com/chenbingwei1201/threads_scraper
A Python package for scraping Threads posts.
chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites
Last synced: 05 Feb 2025
https://github.com/pranavj1001/webcrawler
A simple Web Crawler
crawler java javascript nodejs web-crawler
Last synced: 02 Feb 2025