Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-16 00:06:25 UTC
- JSON Representation
https://github.com/krishealty/whoknows
All in One Advanced and Detailed Web Scanner with over 1000 plug-ins.
bug-bounty bypass crawler enumeration ethical-hacking footprinting hacking hacking-tool intelligence-gathering javascript offensive-security osint pentesting pentesting-tools security-tools subdomain-enumeration vulnerability-analysis vulnerability-detection web-application-security web-reconnaissance
Last synced: 07 Jan 2025
https://github.com/suddi/fundscraper
Collection of web crawlers to scrape fund data using Scrapy
Last synced: 11 Oct 2024
https://github.com/arghyadipchak/craww
Gemini (protocol) crawler written in Rust
crawler gemini gemini-protocol rust
Last synced: 04 Jan 2025
https://github.com/tufayellus/linkedin-cv-downloader
A Python based GUI automation software for downloading bulk LinkedIn CV / LinkedIn Resume from a list of profile links
crawler digital-marketing email-marketing email-scraper leads linkedin-bot linkedin-cv linkedin-cv-downloader linkedin-download linkedin-downloader linkedin-resume linkedin-resume-downloader linkedin-scraper scrape-emails scrape-websites scraper scraper-engine
Last synced: 23 Nov 2024
https://github.com/tsoliangwu0130/ex-dividend-date-notification
crawler email-notification python3 stock-market vanguard
Last synced: 08 Jan 2025
https://github.com/tsoliangwu0130/ptt-search
A simple Python script to fetch PTT post from the command line.
Last synced: 08 Jan 2025
https://github.com/opda0887/bahamut-crawler-to-gmail
發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.
Last synced: 27 Nov 2024
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 31 Dec 2024
https://github.com/altescy/mincrawler
A minimal web crawler.
configurable crawler python scraping
Last synced: 27 Nov 2024
https://github.com/donuts-are-good/araknnid
GO GO TINY SPIDER!
crawler hacktoberfest search-engine spider
Last synced: 28 Dec 2024
https://github.com/zephyrpersonal/github-trending-crawler
transform github-trending repos to json data
cheerio crawler fetch github node repository spider trending
Last synced: 28 Nov 2024
https://github.com/pxlrbt/website-diff
Utility tool that bundles a crawler and BackstopJS for visual regression testing.
backstopjs crawler visual-regression-testing
Last synced: 28 Nov 2024
https://github.com/victorhuu/amazonmovieintegration
本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据
crawler data-warehouse movies pandas scrapy xpath
Last synced: 28 Nov 2024
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 28 Nov 2024
https://github.com/danielemoraschi/go-sitemap-common
Simple GO sitemap generator and crawler.
crawler golang sitemap sitemap-generator
Last synced: 31 Dec 2024
https://github.com/khilnani/spidey.py
Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.
cli crawler python scaper web-spider
Last synced: 02 Dec 2024
https://github.com/ccrashzer0/web_crawler
A python based web crawler
crawler internet python python3 webcrawler
Last synced: 28 Nov 2024
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
crawler middleware nuget webscraping
Last synced: 28 Nov 2024
https://github.com/soulyma/web_crawler
A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.
beautifulsoup4 crawler csv data json python structured-data
Last synced: 13 Dec 2024
https://github.com/nelcifranmagalhaes/web_crawler
A web crawler for all Naruto characters
anime beautifulsoup characters crawler naruto python
Last synced: 03 Dec 2024
https://github.com/flavien-hugs/scrapy-test
Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.
crawler python scraping scrapy
Last synced: 09 Dec 2024
https://github.com/pnguyen215/instagram-crawler
Instagram Crawler is a Python script to download posts from a specified Instagram account.
crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler
Last synced: 12 Jan 2025
https://github.com/pierlauro/mdbubing
From WARC records to MongoDB documents
bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving
Last synced: 09 Dec 2024
https://github.com/xcrypt0r/xcrawler
✂️ A crawling example for maplestory with various languages using multi-threading
crawler crawling multithreading parsing regexp
Last synced: 09 Jan 2025
https://github.com/estroz/seekret
Seekret is a sensitive data crawler for GitHub repositories
Last synced: 25 Dec 2024
https://github.com/fa7ad/aiub-notes-dl
Download all notes from AIUB's portal
Last synced: 24 Oct 2024
https://github.com/aminehsan/crawler-divar.ir
Analyzing and Extracting Insights from Ads on 'divar.ir'
crawler data-mining data-science divar-ir scarping
Last synced: 04 Dec 2024
https://github.com/buren/site_health
Crawl a site and check various health indicators
Last synced: 28 Oct 2024
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 23 Dec 2024
https://github.com/spraakbanken/svt-crawler
Programme for crawling SVT's API for news articles and converting the data to XML.
Last synced: 29 Nov 2024
https://github.com/chen0040/ios-stock-tracker
Stock tracker implemented using Objective-C for iOS
crawler ios-app objective-c stock-prices
Last synced: 16 Dec 2024
https://github.com/mahmoudgalalz/pupt
A starter for web crawling using Puppeteer
Last synced: 05 Jan 2025
https://github.com/stevieflyer/quokka
An easy-to-use web crawler framework, supporting parallel crawling without a line of code and headless running.
crawler parallel web-automation
Last synced: 14 Dec 2024
https://github.com/sinkaroid/webnovelcrawler
Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.
Last synced: 23 Dec 2024
https://github.com/davideferre/covid19-data-crawler-ita
Covid 19 italian data crawler
coronavirus covid19 crawler hacktoberfest hacktoberfest2021 python
Last synced: 11 Jan 2025
https://github.com/j-hoplin/naver_news_headtopic_news_scraper
네이버 뉴스에서 헤드라인 뉴스 스크레이핑
Last synced: 11 Dec 2024
https://github.com/dizys/weibo-crawler
A nodejs weibo crawler
crawler nodejs typescript weibo-spider
Last synced: 27 Dec 2024
https://github.com/trixsec/zeuscrawler
The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.
crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper
Last synced: 21 Dec 2024
https://github.com/bitscoper/bitscoper_crawler
Crawls the titles of webpages in series by number and creates a list of the available links.
Last synced: 05 Dec 2024
https://github.com/somnisomni/trawler-csharp
The successor of https://github.com/somnisomni/twitter-account-data-crawler, written in .NET C#
crawler crawling csharp dotnet follower-tracker selenium selenium-csharp twitter twitter-crawler twitter-crawling twitter-scraper
Last synced: 05 Jan 2025
https://github.com/anjackson/scrapy-url-frontier
A Scrapy module for URL Frontier integration
crawler frontier scrapy spider
Last synced: 05 Jan 2025
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 17 Dec 2024
https://github.com/iarsham/scrapify
Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.
403-bypass arkose cloudflare crawler golang http-client scraper
Last synced: 12 Dec 2024
https://github.com/ilsonlasmar/inovamind
Desafio Inovamind - Crawler em Ruby on Rails com Sidekiq + Redis
Last synced: 10 Jan 2025
https://github.com/dingpingzhang/papermedia
A scrapy-based crawler for crawling paper media.
Last synced: 22 Dec 2024
https://github.com/mdazlaanzubair/amazon-scraper-api
A web scraper to crawl on amazon to extract products information and return in JSON format.
amazon crawler expressjs json-api nodejs webscraping
Last synced: 10 Jan 2025
https://github.com/leomaurodesenv/smm-maker-profile
A package to fetching the maker profile - Super Mario Maker
crawler javascript json mario-maker nodejs
Last synced: 02 Nov 2024
https://github.com/tsaohucn/crawler_fb_group
This is crawler use selenium for facebook groups
crawler facebook-groups rails ruby
Last synced: 19 Nov 2024
https://github.com/afuntw/misc-crawler
some small crawler for specific website
Last synced: 12 Jan 2025
https://github.com/gnaneshkunal/book-miner
Web crawler for Book reviews (Goodreads)
Last synced: 16 Dec 2024
https://github.com/bkdev98/ebooks-crawler
Ebooks crawler for personal purpose using ReactJS.
crawler material-ui nodejs reactjs
Last synced: 01 Jan 2025
https://github.com/knourian/freelancer.com-category-scrapping
Scrapping Categories from Freelancer.com Using scrapy with number of project for each category
crawler freelancer python3 scrapy web-crawler
Last synced: 05 Jan 2025
https://github.com/chunkingz/youtubelinks-scraper
A python script that scrapes Youtube links from a predefined website of choice.
crawler python scraper spider websitescraper youtube
Last synced: 02 Jan 2025
https://github.com/dimo414/pycrawl
Simple Python web crawler, primarily designed for inspecting and diagnosing your own website
Last synced: 18 Dec 2024
https://github.com/scrwdrv/siege-crawler
This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.
benchmark cli crawler ddos debug siege tool
Last synced: 18 Dec 2024
https://github.com/m-osource/cassiopeiabot
C++ multithread Linux Web Crawler
algorithm berkeleydb bot cassiopeia cplusplus crawler download engine hashing html-parser information-retrieval link-analysis multithread open-source regex search web web-crawler webcrawler www
Last synced: 08 Jan 2025
https://github.com/amirespahbodi/url_crawler
url crawler
crawler fastapi pydantic python3 sqlalchemy
Last synced: 02 Jan 2025
https://github.com/arshadkazmi42/gh-crawl
Crawler for Github repositories. Finds all the broken links from the repositories
bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python
Last synced: 21 Dec 2024
https://github.com/shiritai/wallpaper_master
My first individual project!
crawler file-explorer javafx-application maven-shade mini-system wallpaper wallpaper-master
Last synced: 01 Jan 2025
https://github.com/skylightqp/namu2csv
A namuwiki crawler that converts header to csv file for kartrider wiki
Last synced: 08 Dec 2024
https://github.com/vindecodex/automated-crawler-wget
Using wget to crawl site
Last synced: 01 Jan 2025
https://github.com/lykmapipo/producthunt-python-scrapy-scraper
Python Scrapy spiders that scrapes data from producthunt.com
crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper
Last synced: 21 Dec 2024
https://github.com/vietdoo/sg-property-hub
SG Property Hub is a comprehensive platform for managing and analyzing property data.
airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping
Last synced: 13 Dec 2024
https://github.com/rogerluo410/gcrawler
Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.
Last synced: 02 Jan 2025
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 25 Dec 2024
https://github.com/h4r5h1t/crawlytics
A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.
appsec crawler crawler-python mechanicalsoup security security-tools webcrawler
Last synced: 28 Dec 2024
https://github.com/efishery/wpi-kkp-crawler
This is crawler for fisheries price on wpi.kkp.go.id
Last synced: 02 Jan 2025
https://github.com/jimmy-ly00/dhe-prime-grabber
Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.
certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3
Last synced: 29 Dec 2024
https://github.com/yjg30737/pyqt-wikipedia-crawler
Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI
beautifulsoup4 crawler pyqt pyqt5 wikipedia
Last synced: 03 Jan 2025
https://github.com/lukasherz/22fs-sc-twitter-crawler
used for a research project in social computing @ uzh (fs22)
crawler crawling database twitter twitter-api-v2
Last synced: 25 Dec 2024
https://github.com/tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
askfm chromedriver corpus-builder crawler selenium
Last synced: 25 Dec 2024
https://github.com/alatiera/ellinofreneia-crawler
Crawler of ellinofreneianet.gr for offline content consumption
Last synced: 01 Jan 2025
https://github.com/hantang/list-movies-top
豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取
Last synced: 07 Jan 2025
https://github.com/richecr/pyhltv
Repository to extract information from the HLTV website.
crawler csgo hacktoberfest hltv hltv-api python3
Last synced: 19 Nov 2024
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
Last synced: 30 Dec 2024
https://github.com/0xpr03/clantool
CF Management & Data Analysis Tool, crawler backend in rust
backend-server crawler data-analysis rust
Last synced: 02 Jan 2025
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 14 Jan 2025