Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-26 00:06:25 UTC
- JSON Representation
https://github.com/spaceemotion/goodreads-browser
Custom crawler + interface to have better filtering and sorting of the goodreads database 📚🔍
Last synced: 26 Dec 2024
https://github.com/ewertoncodes/mind-crawler
A simple api written in Rails to extract quotations from the Quotes to Scrape site.
Last synced: 23 Jan 2025
https://github.com/omkarcloud/dentalkart-scraper
🚀 SCRAPE 1000'S OF PRODUCTS FROM DENTALKART 🤖
beautifulsoup crawler crawling crawling-framework crawling-python dentalkart dentalkart-product-scraper dentalkart-scraper dentalkart-scraping node-crawler scraper scraping scraping-framework scraping-python selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 02 Jan 2025
https://github.com/shunk031/lineblogscraper
Scraper for LINE Blog in Scrapy
crawler lineblog scraper scrapy
Last synced: 10 Jan 2025
https://github.com/yjg30737/pyqt-google-image-crawler
Crawling image files from Google search result with Python and icrawler
beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application
Last synced: 03 Jan 2025
https://github.com/gabrielrf/bsbdf
Telegram Public Channel
crawler python telegram telegram-channel telegraph
Last synced: 13 Jan 2025
https://github.com/imkrunalkanojiya/seo-checker
Resolve your SEO related issue by using SEO Checker Rest API
crawler nodejs rest-api seo seo-crawler seo-free seo-optimization seo-tools
Last synced: 03 Jan 2025
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 11 Jan 2025
https://github.com/systemfsoftware/youtube-autocomplete-scraper
YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.
actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api
Last synced: 11 Jan 2025
https://github.com/tufayellus/linkedin-cv-downloader
A Python based GUI automation software for downloading bulk LinkedIn CV / LinkedIn Resume from a list of profile links
crawler digital-marketing email-marketing email-scraper leads linkedin-bot linkedin-cv linkedin-cv-downloader linkedin-download linkedin-downloader linkedin-resume linkedin-resume-downloader linkedin-scraper scrape-emails scrape-websites scraper scraper-engine
Last synced: 23 Jan 2025
https://github.com/idanhoro/nasa-heat-maps-prediction
In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms
beautifulsoup crawler machine-learning pillow prediction python sklearn
Last synced: 20 Jan 2025
https://github.com/highbreed/web-crawler
A web crawler script that crawls the target website and lists its links
Last synced: 13 Jan 2025
https://github.com/fernandod1/yahoo-finance-scraper
This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.
crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api
Last synced: 12 Jan 2025
https://github.com/madis/flatcrawl
Clojure app for crawling apartment information from http://kv.ee
clojure crawler real-estate webapp
Last synced: 12 Jan 2025
https://github.com/zabuzard/songcrawler
Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.
command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler
Last synced: 12 Jan 2025
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 24 Jan 2025
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 27 Dec 2024
https://github.com/0000xffff/webgrab
web page: crawler / file scanner / downloader
crawler download downloader scrape scraper webcrawler
Last synced: 19 Jan 2025
https://github.com/telanflow/scrago
A micro crawler framework. achieved by GOLANG.
crawler go micro-framework spider
Last synced: 19 Jan 2025
https://github.com/ph-7/gettermails
GetterMails, Scraper
bot crawler email php python retrieve-web-page scrape scraper scraping scraping-websites scrapper webdriver
Last synced: 19 Jan 2025
https://github.com/thiiagoms/dict-crawler
Simple crawler on UOL dictionary
beautifulsoup4 crawler dic python pythonic
Last synced: 16 Jan 2025
https://github.com/litingyes/cobweb
Collect, store and distribute meaningful static data
apis bing-image bing-wallpapers crawler image random-image
Last synced: 05 Dec 2024
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 11 Nov 2024
https://github.com/chunkingz/youtubelinks-scraper
A python script that scrapes Youtube links from a predefined website of choice.
crawler python scraper spider websitescraper youtube
Last synced: 02 Jan 2025
https://github.com/thomashirtz/douban-crawler
A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.
Last synced: 25 Dec 2024
https://github.com/arghyadipchak/craww
Gemini (protocol) crawler written in Rust
crawler gemini gemini-protocol rust
Last synced: 04 Jan 2025
https://github.com/gnaneshkunal/book-miner
Web crawler for Book reviews (Goodreads)
Last synced: 16 Dec 2024
https://github.com/alatiera/ellinofreneia-crawler
Crawler of ellinofreneianet.gr for offline content consumption
Last synced: 01 Jan 2025
https://github.com/donuts-are-good/araknnid
GO GO TINY SPIDER!
crawler hacktoberfest search-engine spider
Last synced: 28 Dec 2024
https://github.com/tsoliangwu0130/ex-dividend-date-notification
crawler email-notification python3 stock-market vanguard
Last synced: 08 Jan 2025
https://github.com/zhs007/lottery-crawler
基于jarvis-task的爬虫,主要用来爬取lottery数据。
Last synced: 03 Jan 2025
https://github.com/tsoliangwu0130/ptt-search
A simple Python script to fetch PTT post from the command line.
Last synced: 08 Jan 2025
https://github.com/leomaurodesenv/smm-maker-profile
A package to fetching the maker profile - Super Mario Maker
crawler javascript json mario-maker nodejs
Last synced: 02 Nov 2024
https://github.com/developerjosh/gogo-crawler
The tool kit for making an anime website with a database full of anime
crawler crawler-js gogoanime gogoanime-api gogoanime-scraper
Last synced: 17 Jan 2025
https://github.com/im-perativa/public_crawler
A collection of crawler project for Indonesia dataset
crawler indonesia indonesia-api scrapy
Last synced: 25 Jan 2025
https://github.com/droiddevgeeks/nodelearning
This is node learning demo. It has covered all basics of node.
crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign
Last synced: 13 Jan 2025
https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper
Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.
console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox
Last synced: 01 Dec 2024
https://github.com/chen0040/ios-stock-tracker
Stock tracker implemented using Objective-C for iOS
crawler ios-app objective-c stock-prices
Last synced: 16 Dec 2024
https://github.com/0xpr03/clantool
CF Management & Data Analysis Tool, crawler backend in rust
backend-server crawler data-analysis rust
Last synced: 02 Jan 2025
https://github.com/hanifdwyputras/se-scraper
Search Engine scraper with PHP
crawler scraper seo seo-crawler
Last synced: 06 Dec 2024
https://github.com/projectx3193275578/prjctxx8264
A simple, open-source, easy to use, and free download manager for malware samples.
crawler downloader malware manager samples
Last synced: 05 Jan 2025
https://github.com/pxlrbt/website-diff
Utility tool that bundles a crawler and BackstopJS for visual regression testing.
backstopjs crawler visual-regression-testing
Last synced: 26 Jan 2025
https://github.com/m-osource/cassiopeiabot
C++ multithread Linux Web Crawler
algorithm berkeleydb bot cassiopeia cplusplus crawler download engine hashing html-parser information-retrieval link-analysis multithread open-source regex search web web-crawler webcrawler www
Last synced: 08 Jan 2025
https://github.com/orafaelfragoso/itunes-crawler
Retrieves information about an artist by crawling the iTunes API and iTunes Page
Last synced: 19 Dec 2024
https://github.com/mohabmes/matool
A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }
cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web
Last synced: 08 Jan 2025
https://github.com/anjackson/scrapy-url-frontier
A Scrapy module for URL Frontier integration
crawler frontier scrapy spider
Last synced: 05 Jan 2025
https://github.com/igorbrizack/web-scraper
Aplicação de raspagem de dados HTML, construída em python.
crawler pytest python3 scraper
Last synced: 26 Jan 2025
https://github.com/dizys/weibo-crawler
A nodejs weibo crawler
crawler nodejs typescript weibo-spider
Last synced: 27 Dec 2024
https://github.com/mkfsn/chronos
A light cron-like container service - create cron job easily.
Last synced: 22 Jan 2025
https://github.com/raphaelalmeidamartins/python-tech-news
Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course
crawler crawler-python data-science pytest python
Last synced: 18 Jan 2025
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 31 Dec 2024
https://github.com/redco/goose-phantom-environment
Environment for Goose parser which allows to run it in PhantomJS
crawler environment goose goose-parser nodejs parse parser phantomjs scraper
Last synced: 22 Dec 2024
https://github.com/maxgio92/package-crawler
A package crawler for most known Linux distros
Last synced: 26 Jan 2025
https://github.com/sefinek/niedlascamu.pl-tracker
Śledzenie zmian na stronie niedlascamu.pl.
crawl crawler niedlascamu tracker tracking
Last synced: 07 Dec 2024
https://github.com/maxiroellplenty/gs-robot
NodeJs tool to scrap gelbe-seiten
axios cheerio crawler gelbe-seiten nodejs scraper yargs
Last synced: 23 Jan 2025
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 14 Jan 2025
https://github.com/teal33t/base_crawler
Simple scaffold for selenium based crawler bots
crawler scaffold-template selenium selenium-python
Last synced: 23 Jan 2025
https://github.com/devidw/google-untitled-spam-spider
A spam spider which is targeting 'Untitled' spam pages from the Google search results.
crawler crawling crawling-algorithm crawling-python crawling-sites crawling-tool google-untitled python python3 spam spam-detection spammer untitled untitled-spam
Last synced: 07 Dec 2024
https://github.com/hamidrabedi/digikala-crawler
a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls
crawler digikala digikala-crawler django python scraper
Last synced: 14 Dec 2024
https://github.com/pjullrich/link-crawler
Python Crawler that reports broken links on a given website and its sup-pages
asyncio breadth-first-search broken-links crawler python
Last synced: 23 Jan 2025
https://github.com/christopher-besch/therapy_search
Compute Call Times from arztsuche-bw into a Calendar.
appointments calendar crawler gatsby therapy time-management typescript
Last synced: 28 Dec 2024
https://github.com/rogerluo410/gcrawler
Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.
Last synced: 02 Jan 2025
https://github.com/sinkaroid/webnovelcrawler
Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.
Last synced: 23 Dec 2024
https://github.com/piopi/behatcrawler
A Behat extension that crawls links on a website and executes user-defined function on each one of them.
behat behat-extension crawler php selenium-webdriver
Last synced: 19 Dec 2024
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 17 Dec 2024
https://github.com/efishery/wpi-kkp-crawler
This is crawler for fisheries price on wpi.kkp.go.id
Last synced: 02 Jan 2025
https://github.com/dylanhogg/cloud-products
A package for getting cloud products and product descriptions from a cloud provider website.
aws cloud-products crawler data text-processing
Last synced: 23 Jan 2025
https://github.com/j-hoplin/naver_news_headtopic_news_scraper
네이버 뉴스에서 헤드라인 뉴스 스크레이핑
Last synced: 11 Dec 2024
https://github.com/orsinium-labs/gpcc
Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)
Last synced: 17 Jan 2025
https://github.com/jimmy-ly00/dhe-prime-grabber
Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.
certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3
Last synced: 29 Dec 2024
https://github.com/kahsolt/allchan
An image crawler for xChan(4chan/8ch/...) image board.
4chan 4chan-downloader 8chan crawler image-crawler
Last synced: 03 Jan 2025
https://github.com/greatdrake/contributecounter
crawl Wikipedia for contributers
Last synced: 14 Dec 2024
https://github.com/enansari/guess-price-car
Car price estimation based on the information of a car sales site | final project of Maktabkhooneh | حدس قیمت خودرو با ماشین لرنینگ | پروژه نهایی مکتبخونه
crawler jadi machine-learning maktabkhoone maktabkhooneh python
Last synced: 09 Jan 2025
https://github.com/h4r5h1t/crawlytics
A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.
appsec crawler crawler-python mechanicalsoup security security-tools webcrawler
Last synced: 28 Dec 2024
https://github.com/schbenedikt/web-crawler
A simple web crawler using Python that stores the metadata of each web page in a database.
crawler database mariadb mysql python python-crawler web
Last synced: 08 Nov 2024
https://github.com/zhoudaxia233/unilogo
A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.
Last synced: 15 Dec 2024
https://github.com/abdus/scrape-web
A simple web scrapper for Node.js
crawler web-scraping web-scrapper
Last synced: 03 Dec 2024
https://github.com/richecr/pyhltv
Repository to extract information from the HLTV website.
crawler csgo hacktoberfest hltv hltv-api python3
Last synced: 20 Jan 2025
https://github.com/khoinguyen2k/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 17 Jan 2025
https://github.com/yjg30737/pyqt-wikipedia-crawler
Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI
beautifulsoup4 crawler pyqt pyqt5 wikipedia
Last synced: 03 Jan 2025
https://github.com/trixsec/zeuscrawler
The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.
crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper
Last synced: 21 Dec 2024
https://github.com/maxmindlin/swarm
Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.
Last synced: 06 Dec 2024
https://github.com/cseas/shares-monitor
Web crawler to fetch and monitor shares details.
crawler python python3 scraper scraping-websites shares
Last synced: 27 Dec 2024
https://github.com/woorim960/nate.com-comments-crawler
nate.com-comments-crawler
chromedriver crawler python3 selenium
Last synced: 28 Dec 2024