Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-09 00:09:38 UTC
- JSON Representation
https://github.com/nazanin1369/searchengine
Implementing a search engine using Java, AngularJS and Elastic search
angularjs crawler elasticsearch java search-engine
Last synced: 07 Jan 2025
https://github.com/nirjharlo/complete-google-seo-scan
WordPress Plugin with inbuilt SEO crawler
crawl-pages crawler seotools web-crawler web-spider wordpress wordpress-plugin
Last synced: 27 Oct 2024
https://github.com/gill-singh-a/crawler
A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found
crawler multithreading osint python python3 requests scraper
Last synced: 09 Nov 2024
https://github.com/sangupta/shopify-burst-crawler
Simple crawler to download meta information for all stock pics from Shopify Burst website
burst crawler java shopify stock-photos
Last synced: 08 Nov 2024
https://github.com/congcoi123/crawler-sheis
A small crawler for getting data from the website: https://sheis.vn
crawler webcrawler webcrawling webscraper webscraping
Last synced: 31 Dec 2024
https://github.com/akagi201/spy
A lightweight distributed web crawler
crawler distributed lightweight nsq
Last synced: 08 Jan 2025
https://github.com/litingyes/cobweb
Collect, store and distribute meaningful static data
apis bing-image bing-wallpapers crawler image random-image
Last synced: 05 Dec 2024
https://github.com/ysh329/stock-newspaper-crawler
[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).
corpus crawled-data crawler database stock-newspaper-crawler
Last synced: 16 Dec 2024
https://github.com/linkspreed/twig
Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more
crawler engine search search-engine web5
Last synced: 03 Jan 2025
https://github.com/z3ntl3/redeye
Crawl real and new user agents from the most major 2 databases.
crawler header ua user-agents useragents
Last synced: 16 Dec 2024
https://github.com/shunk031/lineblogscraper
Scraper for LINE Blog in Scrapy
crawler lineblog scraper scrapy
Last synced: 12 Nov 2024
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 24 Nov 2024
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 11 Nov 2024
https://github.com/eduardosbcabral/desafio-tecnico-mp
Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.
Last synced: 13 Nov 2024
https://github.com/camara94/crawlers
Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere
crawler python scraping scrapy spider
Last synced: 23 Dec 2024
https://github.com/telanflow/scrago
A micro crawler framework. achieved by GOLANG.
crawler go micro-framework spider
Last synced: 18 Nov 2024
https://github.com/darealfreak/figure-tracker
application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops
crawler figure-tracker monitoring
Last synced: 11 Dec 2024
https://github.com/marvnc/pixiv-dump
Pixiv Encyclopedia DB Dumps, updated daily
crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping
Last synced: 20 Dec 2024
https://github.com/Anakeyn/website-contextual-links
Récupération des liens contextuels d'un site Web avec R.
Last synced: 24 Nov 2024
https://github.com/zekrotja/r34-crawler
A simple CLI tool to fetch and download images from rule34.xxx
crawler go rest-api rule34 worker-pool xml
Last synced: 17 Dec 2024
https://github.com/foufou-exe/yspeed
Yspeed is a library that scrapes the Speedtest site
crawler python rich scraper scraping selenium selenium-python speedtest
Last synced: 08 Jan 2025
https://github.com/manojahi/is-there-any-song-reference-in-article
It will tell if there are any songs references in article from a website.
crawler lyrics-search python webscraping
Last synced: 01 Jan 2025
https://github.com/ph-7/gettermails
GetterMails, Scraper
bot crawler email php python retrieve-web-page scrape scraper scraping scraping-websites scrapper webdriver
Last synced: 18 Nov 2024
https://github.com/tbarnes94/fortnite-weapons-bot
A bot that returns fortnite weapon statistics based on input from Discord users. Written in TypeScript.
crawler discord discord-bot discord-js typescript2
Last synced: 05 Dec 2024
https://github.com/fernandod1/yahoo-finance-scraper
This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.
crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api
Last synced: 12 Nov 2024
https://github.com/nava45/simplempcrawler
Simple Multiprocessing Crawler in python
crawler multiprocessing python
Last synced: 05 Jan 2025
https://github.com/ging-dev/sitemap-crawler
Collect links through the sitemap.xml or robots.txt
crawler php php8 sitemap sitemap-crawler
Last synced: 18 Nov 2024
https://github.com/jofaval/webscraping
WebScraper providing tools to scrape tons of websites with the same base
crawler e-commerce python scraper webscraper webscraping
Last synced: 09 Dec 2024
https://github.com/qianbinbin/moebooru-crawler
Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .
Last synced: 17 Dec 2024
https://github.com/sebi75/lightweight-sitemapper
A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies
Last synced: 21 Dec 2024
https://github.com/omkarcloud/multiple-account-generation-template
🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING MULTIPLE ACCOUNTS ON A WEBSITE. 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 02 Jan 2025
https://github.com/tsonglew/spidreat
Article Spider with Python & Node.js :beetle:
Last synced: 19 Dec 2024
https://github.com/polakosz/smf-scraper
You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:
crawler csharp forum machines php scraper simple simplemachines smf
Last synced: 18 Dec 2024
https://github.com/aicore/app_info_extracter
This application would be used to extract information about apps from the internet
android appreview apps crawler googleplaystore
Last synced: 13 Nov 2024
https://github.com/yidas/tw-stock-crawler-php
PHP Crawler for Taiwan Stock Data (台股資料爬蟲)
crawler stock taiwan taiwan-stock-information taiwan-stock-market
Last synced: 29 Oct 2024
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 31 Oct 2024
https://github.com/jiannei/github-trending
Github trending crawling based on lumen.
crawler github-trending lumen php
Last synced: 09 Nov 2024
https://github.com/becky-dai/flower-knowledge-graph-visualization
A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目
crawler css django echarts html js knowledge-graph neo4j python
Last synced: 21 Dec 2024
https://github.com/natshah/natshah-crawler
Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.
crawler database filter natshah-crawler
Last synced: 14 Dec 2024
https://github.com/kapitanluffy/sunny-crawler
That moment when I tried learning things about "Big Data" and "Inverted Indexes"
big-data crawler inverted-index php search
Last synced: 14 Dec 2024
https://github.com/santhoshse7en/alcoholics-anonymous
Research Project to analyse the knowledge about Alcoholics Anonymous in public
aa-meetings alcoholics alcoholics-anonymous anonymous bs4 crawler data-extraction-and-pre-processing google-search-using-python news-crawler newspaper3k python the-hindu web-scraping without-api
Last synced: 14 Nov 2024
https://github.com/fabrix-app/spool-scraper
Spool: Webscraper
cheerio crawler fabrix nodejs scraping spools typescript webscraper
Last synced: 14 Nov 2024
https://github.com/anzo52/jcrawl
Java web crawler
crawler java java-web-crawler web web-crawler
Last synced: 01 Jan 2025
https://github.com/codeforequity-at/botium-crawler
Botium Crawler - Like a Website Crawler, just for Conversation Flows
Last synced: 20 Oct 2024
https://github.com/xiantang/mini_scrapy
模仿scrapy的轻量级爬虫框架
crawler python3 requets scrapy
Last synced: 06 Dec 2024
https://github.com/imthaghost/gocloneold
Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.
Last synced: 19 Dec 2024
https://github.com/spaceemotion/goodreads-browser
Custom crawler + interface to have better filtering and sorting of the goodreads database 📚🔍
Last synced: 26 Dec 2024
https://github.com/sean2077/leetcode_anki
Leetcode Anki card factory.
anki crawler leetcode leetcode-anki scrapy
Last synced: 12 Nov 2024
https://github.com/buaadreamer/buaastar
北航星球网站 北航2021年夏季学期Python英文课大作业
crawler css flask html javascript python
Last synced: 05 Jan 2025
https://github.com/Juphex/SupremeBot
Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.
android chrome crawler kivy python3 webscraping windows
Last synced: 23 Oct 2024
https://github.com/zabuzard/mplogger
Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.
bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api
Last synced: 19 Dec 2024
https://github.com/wangyihang/acw-sc-v2-py
Python requests.HTTPAdapter for `acw_sc__v2`
Last synced: 05 Jan 2025
https://github.com/coverified/spider
A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)
akka crawler graphql hacktoberfest microservice spider
Last synced: 25 Dec 2024
https://github.com/mrmarble/mineseek
Minecraft server scanner
crawler minecraft minecraft-server scanner slp
Last synced: 17 Nov 2024
https://github.com/gabrielrf/bsbdf
Telegram Public Channel
crawler python telegram telegram-channel telegraph
Last synced: 13 Dec 2024
https://github.com/omkarcloud/dentalkart-scraper
🚀 SCRAPE 1000'S OF PRODUCTS FROM DENTALKART 🤖
beautifulsoup crawler crawling crawling-framework crawling-python dentalkart dentalkart-product-scraper dentalkart-scraper dentalkart-scraping node-crawler scraper scraping scraping-framework scraping-python selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 02 Jan 2025
https://github.com/keosariel/ramby
Ramby is a simple way to setup a webscraper
beautifulsoup crawler python3 webscraping
Last synced: 06 Dec 2024
https://github.com/nakabonne/staticcollector
Application to analyze static files of competing sites
Last synced: 14 Dec 2024
https://github.com/joelkoen/wls
Easily crawl multiple sitemaps and list URLs
Last synced: 07 Nov 2024
https://github.com/elektrostudios/fhm-crawler-freehardmusic.com
Crawls download urls of albums from freehardmusic.com website
albums crawl crawler crawling desktop-app desktop-application dotnet music web-crawler web-crawling web-scraper web-scraping webcrawler webcrawling webscraper webscraping windows windows-app windowsapp winforms
Last synced: 01 Dec 2024
https://github.com/yjg30737/pyqt-google-image-crawler
Crawling image files from Google search result with Python and icrawler
beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application
Last synced: 03 Jan 2025
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 19 Dec 2024
https://github.com/tikazyq/colly-crawlers
Crawlers using Golang-based web crawling framework Colly
Last synced: 02 Jan 2025
https://github.com/rudrakshi99/web_crawler
A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.
Last synced: 22 Nov 2024
https://github.com/kokseen1/chii
A minimal marketplace bot maker.
auction automation bidding bot carousell crawler ecommerce marketplace python python-telegram-bot scraper telegram telegram-bot web-scraping yahoo yahoo-auction
Last synced: 13 Nov 2024
https://github.com/erikmueller/jazmax
Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator
crawler data-science efficiency green heatpump jaz
Last synced: 01 Dec 2024
https://github.com/kluhan/kraken
Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.
celery crawler google-play-store python web-crawling
Last synced: 15 Dec 2024
https://github.com/krishpranav/spider
A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely
crawler ruby spider web-crawler web-scraping
Last synced: 06 Dec 2024
https://github.com/e73b025/simple-python-url-crawler
Super simple Python3 website URL scraper/crawler. Multi-threaded.
crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple
Last synced: 11 Nov 2024
https://github.com/ericz99/go-crawler
Simple lightweight crawler, that will find all endpoints on any website.
Last synced: 30 Nov 2024
https://github.com/truethari/fcrawler
Python application that can be used to copy files of a given file type from a folder directory.
copy copy-files crawl crawler crawler-python file files
Last synced: 07 Jan 2025
https://github.com/gnujoow/crawl-repo
crawling github's repositories basic info
crawler github github-api python3
Last synced: 14 Dec 2024
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 27 Dec 2024
https://github.com/raspi/scrapy-kuntavaalit2021-yle
Fetch YLE kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/joshuaquek/docusite-to-pdf
Provide a URL and this will generate multiple PDF documents of the whole site within the bounds of the URL path. This code repo is for educational purposes only.
crawler documentation-generator html2pdf pdf pdf-converter pdf-document pdf-generation scraper
Last synced: 13 Nov 2024
https://github.com/panyanyany/vps_spider
VPS Spider powering https://findallvps.com
Last synced: 12 Nov 2024
https://github.com/krishealty/whoknows
All in One Advanced and Detailed Web Scanner with over 1000 plug-ins.
bug-bounty bypass crawler enumeration ethical-hacking footprinting hacking hacking-tool intelligence-gathering javascript offensive-security osint pentesting pentesting-tools security-tools subdomain-enumeration vulnerability-analysis vulnerability-detection web-application-security web-reconnaissance
Last synced: 07 Jan 2025
https://github.com/tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
askfm chromedriver corpus-builder crawler selenium
Last synced: 25 Dec 2024
https://github.com/amirsorouri00/dsl-se
This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is
container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine
Last synced: 18 Nov 2024
https://github.com/jovijovi/ether-crawler
A transaction crawler for the Ethereum ecosystem.
blockchain crawler ether ethereum transaction
Last synced: 15 Nov 2024