Projects in Awesome Lists tagged with webcrawling
A curated list of projects in awesome lists tagged with webcrawling .
https://github.com/internetarchive/heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
heritrix java warc webcrawling
Last synced: 15 May 2025
https://github.com/scrapinghub/scrapyrt
HTTP API for Scrapy spiders
crawler crawling hacktoberfest hacktoberfest2021 python scraper scrapy twisted webcrawler webcrawling
Last synced: 15 May 2025
https://github.com/jaeksoft/opensearchserver
Open-source Enterprise Grade Search Engine Software
crawler custom-search enterprise indexing java lucene ocr opensearchserver search search-engine synonyms webcrawler webcrawling
Last synced: 04 Apr 2025
https://github.com/mehmetozkaya/dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping
Last synced: 11 May 2025
https://github.com/mehmetozkaya/DotnetCrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping
Last synced: 18 Apr 2025
https://github.com/dedsecinside/gotor
This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.
cli command-line command-line-tool docker go golang golang-server hacktoberfest http-server information-extraction osint osint-tools rest-api service tor torbot webcrawler webcrawling webscraping
Last synced: 09 Apr 2025
https://github.com/feddelegrand7/ralger
ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.
dataextraction r rstats webcrawling webscraper-website webscraping
Last synced: 06 Apr 2025
https://github.com/voliveirajr/seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
asp-net python scraper scraping scraping-websites scrapper scrapy selenium selenium-webdriver webcrawler webcrawling
Last synced: 11 Oct 2025
https://github.com/andersonkrs/malheatmap
An extension for tracking your activities on myanimelist.net
myanimelist rails ruby webcrawling
Last synced: 01 Feb 2026
https://github.com/aavache/llmwebcrawler
A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.
api distributed-computing fastapi huggingface large-language-models llm machine-learning milvus nlp pydantic python rag ray raylib transformer vector-database webcrawler webcrawling
Last synced: 23 Oct 2025
https://github.com/datawizard1337/ARGUS
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
crawling python scraping scrapy scrapyd webcrawling webscraping
Last synced: 20 Mar 2025
https://github.com/flickz/newspaperjs
News extraction and scraping. Article Parsing
crawler news news-aggregator nodejs scraper webcrawling webscraping
Last synced: 02 Jun 2026
https://github.com/crawler-commons/url-frontier
API definition, resources and reference implementation of URL Frontiers
grpc url-frontier urlfrontier web-crawlers webcrawling
Last synced: 14 Jan 2026
https://github.com/galarzaa90/tibia.py
API to parse tibia.com content into python objects.
beautifulsoup crawling-python python python3 tibia webcrawling
Last synced: 06 Apr 2025
https://github.com/marcel0024/cococrawler
An declarative and easy to use web crawler and scraper in C#
cococrawler crawler crawling-tool csharp dotnet dotnetcore scraper scraping-tool webcrawler webcrawler-csharp webcrawling webscraper
Last synced: 10 Apr 2025
https://github.com/dhyeythumar/search-engine
Application made with Node.js and Python.
beautifulsoup4 express-js express-session lemmatization mysql2 natural nltk node-js python textblob webcrawling webspider
Last synced: 09 Oct 2025
https://github.com/michaelradu/web-crawler
A Web Crawler developed in Python.
crawler crawler-python crawlers python python-3 python-script python3 script scripting scripting-language scripts web web-crawler web-crawler-python web-crawlers web-crawling webcrawl webcrawler webcrawling
Last synced: 25 Jul 2025
https://github.com/querateam/dataanalysis_bootcamp_crawler
Web scraper implementations for a variety of websites.
beautifulsoup beautifulsoup4 bootcamp bs4 data-analysis python quera scrapy selenium webcrawling webscraping
Last synced: 08 Jul 2025
https://github.com/moehmeni/ezweb
Easy to use web page analyzer
analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www
Last synced: 06 Apr 2025
https://github.com/robmch/mindfactory_crawling
A Python 3 Crawler for Mindfactory.de
crawler crawling data webcrawler webcrawling
Last synced: 07 May 2025
https://github.com/n3wjack/sitecrawler
A command-line based web crawler
crawler tool webcrawler webcrawling webdevelopment
Last synced: 07 Mar 2026
https://github.com/starsbit/bardle
A Blue Archive Wordle game variant. Guess the character based on the given attributes.
angular blue-archive bluearchive python webcrawling wordle
Last synced: 08 Apr 2026
https://github.com/elektrostudios/fhm-crawler-freehardmusic.com
Crawls download urls of albums from freehardmusic.com website
albums crawl crawler crawling desktop-app desktop-application dotnet music web-crawler web-crawling web-scraper web-scraping webcrawler webcrawling webscraper webscraping windows windows-app windowsapp winforms
Last synced: 19 Jul 2025
https://github.com/mominurr/social-media-scraping
Social Media Scraping – Scrapes data from TikTok, LinkedIn, Facebook, and Twitter (X.com), including user profiles, posts, engagement metrics, and comments.
datascraping facebook-scraper linkedin-scraper pandas python scraper scraping selenium tiktok-scraper twitter-scraper webcrawler webcrawling webscraping
Last synced: 13 Apr 2026
https://github.com/congcoi123/crawler-sheis
A small crawler for getting data from the website: https://sheis.vn
crawler webcrawler webcrawling webscraper webscraping
Last synced: 25 Feb 2026
https://github.com/dearopen/django-easy-scraper
Django apps to scrape data from web page easily
automation django django-rest-framework python python3 webcrawler webcrawling webscraper webscraping
Last synced: 14 May 2026
https://github.com/th3-c0der/web-crawler
A simple WebCrawler for exploring and downloading content from web pages within a given domain/url.
th3-c0der th3-coder th3c0der th3coder tool tools web-tool webcrawl webcrawler webcrawlers webcrawling
Last synced: 19 Mar 2026
https://github.com/nobrainghost/golamv2
Lightweight Web Crawler for Emails,Keywords,Deadlinks,Dead Domains written in Go. Suitable for low resource environments
Last synced: 16 Jun 2025
https://github.com/oussemabenhassena5/crawl4deepseek
Crawl4DeepSeek = Crawl4AI + DeepSeek 🚀 Smart, efficient, and built for deep web exploration! 🌐🤖
crawl4ai deepseek python webcrawling webscraping
Last synced: 09 Apr 2025
https://github.com/make-school-labs/makescraper
🕷Create your very own web scraper and crawler using Golang!
bew2-5 go golang makeschool webcrawling webscraping
Last synced: 19 May 2026
https://github.com/localizethedocs/scrapy-docs-l10n
Localization of The Scrapy Documentation
crowdin python scrapy sphinx translation webcrawling webscraping
Last synced: 17 May 2026
https://github.com/theghostyced/dictionary-json
👻 A generated json dictionary 📚 using Python
dictionary json pipenv python3 requests webcrawling
Last synced: 10 Sep 2025
https://github.com/iamfarrokhnejad/murkmaw
A web crawler using Rust.
functional functional-programming rust rust-lang web-crawler web-crawling webcrawler webcrawling
Last synced: 28 Mar 2025
https://github.com/rishabhverma17/webcrawler
Python based WebCrawler
pyhton3-app pyhton3-web-app pyhton3samplecode python python-web-crawler python-webapp python3 web-crawler webcrawler webcrawling
Last synced: 25 Aug 2025
https://github.com/kardbord/web-crawler
A very simple web crawler written in Go
go golang webcrawler webcrawling
Last synced: 18 Mar 2025
https://github.com/mominurr/web-scraping-projects
Explore a variety of web scraping projects showcasing my skills and experience in extracting valuable data and solving complex challenges.
amazon-scraper automation datascraping ecommerce-scraper google-map-scraper python real-estate-scraper scraper scraping selenium social-media-scraper tiktok-scraper webcrawler webcrawling webscraper webscraping yellowpages-scraper zillow-scraper
Last synced: 24 Apr 2026
https://github.com/mominurr/google-map-scraping
google map scraper collect google map all available data and collect email from business website.
datascraping google-map-scraper google-map-scraping python scraping selenium webcrawler webcrawling webscraper webscraping
Last synced: 16 May 2026
https://github.com/shilongdai/halcrawler
A web crawler framework
html java java-8 java-library jsoup webcrawler webcrawling webscraper webscrapping
Last synced: 17 Dec 2025
https://github.com/prosenjitjoy/web-crawling-with-goquery
Simple project to learn web crawling with Goquery using channels, goroutines and semaphore.
goquery goroutine theguardian webcrawling
Last synced: 05 Apr 2025
https://github.com/jaewonson37/python_programming
algorithms data-structures encapsulation exception-handling functions loop matplotlib monte-carlo numpy object-oriented-programming pandas prototyping pseudo-random-number-generation python recursion top-down unit-testing visualization webcrawling
Last synced: 09 May 2026
https://github.com/mominurr/yellow-pages-data-scraping
Yellow Pages Data Scraping – Automates the extraction of business details (name, email, phone, address, website) from Yellow Pages directories, providing structured and accurate data.
datascraping pandas python scraper scraping selenium webcrawler webcrawling webscraping yellowpages-scraper
Last synced: 15 Feb 2026
https://github.com/mominurr/stackoverflow.com
A web scraper collecting Stack Overflow questions for NLP, using threading and user-agent rotation
datascraping pandas python requests stackoverflow stackoverflowscraper webcrawler webcrawling webscraper webscraping
Last synced: 18 May 2026
https://github.com/amirespahbodi/google-maps-scraper
google map scraper. extract title, phone, address, latitude and longitude, category, website URL, rating, reviews number, email, active_hours, reviews and first picture of listing
dynamic-website google-map-scraper google-map-scraping google-maps-scraper google-maps-scraper-python google-maps-scraping playwright playwright-python python3 web-crawling web-scraping webcrawling webscraping
Last synced: 02 May 2026
https://github.com/glasswalk3r/app-spamcupng
Perl web crawler for finishing SpamCop.net reports automatically
perl spam spamcop-reports webcrawling
Last synced: 31 Oct 2025
https://github.com/mpschrader/mpi-webcrawlling-tutorium
Material for a single day web crawling workshop in Python
Last synced: 20 May 2026
https://github.com/jimmaphy/pokedex
A Pokédex project build as an android app (Xamarin.Android, C#) with image recognition (azure) & webscraping (python) for the 'We are in IT together'-conference.
android azure csharp image-recognition pokedex pokemon python webcrawling webscraping xamarin
Last synced: 08 Apr 2026
https://github.com/soyeon207/imax_crawling
🎥 파이썬으로 영화 예매 오픈 알리미 만들기
python telegram-bot webcrawling
Last synced: 24 Mar 2025
https://github.com/sebastianenger1981/cpan
Webcrawler and SEO Web Spider: Software, die ich auf CPAN.org und METACPAN.org veröffentlicht habe
cpan metacpan perl5 sourcecode spider tcp-client tcp-client-server tcp-server webcrawl webcrawler webcrawling webspider
Last synced: 28 Jan 2026
https://github.com/mominurr/cars.com
Cars.com Scraper – Extracts car listings (make, model, year, price, seller details) from cars.com using Selenium and BeautifulSoup, saving data in CSV format.
datascraping pandas python scraper scraping webcrawler webcrawling webscraping
Last synced: 06 May 2026
https://github.com/mominurr/realself.com_scraper
realself.cm data scraper that scrape website all information and bypass ip blocking and press & hold captcha.
datascraper datascraping python security-bypass webcrawler webcrawling webscraper webscraping
Last synced: 25 Mar 2025
https://github.com/splorg/sage
A scraper to get every quote from a book off of Goodreads.
books crawler datamining goodreads goodreads-data python scraper scrapy webcrawling webscraping
Last synced: 12 Jun 2025
https://github.com/kalana99/url_listing
[Test] Web Crawling tool to list down URLs for a given domain and build a tree structure
python3 selenium-webdriver tree-structure webcrawling
Last synced: 21 Jan 2026
https://github.com/medson/ocrawl
A simple crawler to map sites relations
charts golang goquery webcrawling
Last synced: 13 Mar 2026
https://github.com/beefy/workoutassistant
Email based AI assistant
ai-agent ai-assistant algotrading cryptocurrency email llm moltbook raspberry-pi solana webcrawling
Last synced: 03 Apr 2026
https://github.com/ajaythorve/data-structures-and-algorithms
compilation of all data structures and algorithms I implement in Java
algorithm-challenges ctci datastructures graph-algorithms java webcrawling
Last synced: 25 Jun 2026