An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with webcrawling

A curated list of projects in awesome lists tagged with webcrawling .

https://github.com/internetarchive/heritrix3

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

heritrix java warc webcrawling

Last synced: 15 May 2025

https://github.com/mehmetozkaya/dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 11 May 2025

https://github.com/mehmetozkaya/DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 18 Apr 2025

https://github.com/dedsecinside/gotor

This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.

cli command-line command-line-tool docker go golang golang-server hacktoberfest http-server information-extraction osint osint-tools rest-api service tor torbot webcrawler webcrawling webscraping

Last synced: 09 Apr 2025

https://github.com/feddelegrand7/ralger

ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.

dataextraction r rstats webcrawling webscraper-website webscraping

Last synced: 06 Apr 2025

https://github.com/voliveirajr/seleniumcrawler

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

asp-net python scraper scraping scraping-websites scrapper scrapy selenium selenium-webdriver webcrawler webcrawling

Last synced: 11 Oct 2025

https://github.com/andersonkrs/malheatmap

An extension for tracking your activities on myanimelist.net

myanimelist rails ruby webcrawling

Last synced: 01 Feb 2026

https://github.com/aavache/llmwebcrawler

A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.

api distributed-computing fastapi huggingface large-language-models llm machine-learning milvus nlp pydantic python rag ray raylib transformer vector-database webcrawler webcrawling

Last synced: 23 Oct 2025

https://github.com/datawizard1337/ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

crawling python scraping scrapy scrapyd webcrawling webscraping

Last synced: 20 Mar 2025

https://github.com/flickz/newspaperjs

News extraction and scraping. Article Parsing

crawler news news-aggregator nodejs scraper webcrawling webscraping

Last synced: 02 Jun 2026

https://github.com/crawler-commons/url-frontier

API definition, resources and reference implementation of URL Frontiers

grpc url-frontier urlfrontier web-crawlers webcrawling

Last synced: 14 Jan 2026

https://github.com/galarzaa90/tibia.py

API to parse tibia.com content into python objects.

beautifulsoup crawling-python python python3 tibia webcrawling

Last synced: 06 Apr 2025

https://github.com/robmch/mindfactory_crawling

A Python 3 Crawler for Mindfactory.de

crawler crawling data webcrawler webcrawling

Last synced: 07 May 2025

https://github.com/n3wjack/sitecrawler

A command-line based web crawler

crawler tool webcrawler webcrawling webdevelopment

Last synced: 07 Mar 2026

https://github.com/starsbit/bardle

A Blue Archive Wordle game variant. Guess the character based on the given attributes.

angular blue-archive bluearchive python webcrawling wordle

Last synced: 08 Apr 2026

https://github.com/mominurr/social-media-scraping

Social Media Scraping – Scrapes data from TikTok, LinkedIn, Facebook, and Twitter (X.com), including user profiles, posts, engagement metrics, and comments.

datascraping facebook-scraper linkedin-scraper pandas python scraper scraping selenium tiktok-scraper twitter-scraper webcrawler webcrawling webscraping

Last synced: 13 Apr 2026

https://github.com/congcoi123/crawler-sheis

A small crawler for getting data from the website: https://sheis.vn

crawler webcrawler webcrawling webscraper webscraping

Last synced: 25 Feb 2026

https://github.com/th3-c0der/web-crawler

A simple WebCrawler for exploring and downloading content from web pages within a given domain/url.

th3-c0der th3-coder th3c0der th3coder tool tools web-tool webcrawl webcrawler webcrawlers webcrawling

Last synced: 19 Mar 2026

https://github.com/nobrainghost/golamv2

Lightweight Web Crawler for Emails,Keywords,Deadlinks,Dead Domains written in Go. Suitable for low resource environments

golang webcrawler webcrawling

Last synced: 16 Jun 2025

https://github.com/oussemabenhassena5/crawl4deepseek

Crawl4DeepSeek = Crawl4AI + DeepSeek 🚀 Smart, efficient, and built for deep web exploration! 🌐🤖

crawl4ai deepseek python webcrawling webscraping

Last synced: 09 Apr 2025

https://github.com/make-school-labs/makescraper

🕷Create your very own web scraper and crawler using Golang!

bew2-5 go golang makeschool webcrawling webscraping

Last synced: 19 May 2026

https://github.com/localizethedocs/scrapy-docs-l10n

Localization of The Scrapy Documentation

crowdin python scrapy sphinx translation webcrawling webscraping

Last synced: 17 May 2026

https://github.com/theghostyced/dictionary-json

👻 A generated json dictionary 📚 using Python

dictionary json pipenv python3 requests webcrawling

Last synced: 10 Sep 2025

https://github.com/kardbord/web-crawler

A very simple web crawler written in Go

go golang webcrawler webcrawling

Last synced: 18 Mar 2025

https://github.com/mominurr/google-map-scraping

google map scraper collect google map all available data and collect email from business website.

datascraping google-map-scraper google-map-scraping python scraping selenium webcrawler webcrawling webscraper webscraping

Last synced: 16 May 2026

https://github.com/prosenjitjoy/web-crawling-with-goquery

Simple project to learn web crawling with Goquery using channels, goroutines and semaphore.

goquery goroutine theguardian webcrawling

Last synced: 05 Apr 2025

https://github.com/mominurr/yellow-pages-data-scraping

Yellow Pages Data Scraping – Automates the extraction of business details (name, email, phone, address, website) from Yellow Pages directories, providing structured and accurate data.

datascraping pandas python scraper scraping selenium webcrawler webcrawling webscraping yellowpages-scraper

Last synced: 15 Feb 2026

https://github.com/mominurr/stackoverflow.com

A web scraper collecting Stack Overflow questions for NLP, using threading and user-agent rotation

datascraping pandas python requests stackoverflow stackoverflowscraper webcrawler webcrawling webscraper webscraping

Last synced: 18 May 2026

https://github.com/amirespahbodi/google-maps-scraper

google map scraper. extract title, phone, address, latitude and longitude, category, website URL, rating, reviews number, email, active_hours, reviews and first picture of listing

dynamic-website google-map-scraper google-map-scraping google-maps-scraper google-maps-scraper-python google-maps-scraping playwright playwright-python python3 web-crawling web-scraping webcrawling webscraping

Last synced: 02 May 2026

https://github.com/glasswalk3r/app-spamcupng

Perl web crawler for finishing SpamCop.net reports automatically

perl spam spamcop-reports webcrawling

Last synced: 31 Oct 2025

https://github.com/sxoxgxi/webcrawler

A multi threaded web crawler

crawler python webcrawling

Last synced: 28 Jul 2025

https://github.com/mpschrader/mpi-webcrawlling-tutorium

Material for a single day web crawling workshop in Python

python tutorial webcrawling

Last synced: 20 May 2026

https://github.com/jimmaphy/pokedex

A Pokédex project build as an android app (Xamarin.Android, C#) with image recognition (azure) & webscraping (python) for the 'We are in IT together'-conference.

android azure csharp image-recognition pokedex pokemon python webcrawling webscraping xamarin

Last synced: 08 Apr 2026

https://github.com/soyeon207/imax_crawling

🎥 파이썬으로 영화 예매 오픈 알리미 만들기

python telegram-bot webcrawling

Last synced: 24 Mar 2025

https://github.com/sebastianenger1981/cpan

Webcrawler and SEO Web Spider: Software, die ich auf CPAN.org und METACPAN.org veröffentlicht habe

cpan metacpan perl5 sourcecode spider tcp-client tcp-client-server tcp-server webcrawl webcrawler webcrawling webspider

Last synced: 28 Jan 2026

https://github.com/mominurr/cars.com

Cars.com Scraper – Extracts car listings (make, model, year, price, seller details) from cars.com using Selenium and BeautifulSoup, saving data in CSV format.

datascraping pandas python scraper scraping webcrawler webcrawling webscraping

Last synced: 06 May 2026

https://github.com/mominurr/realself.com_scraper

realself.cm data scraper that scrape website all information and bypass ip blocking and press & hold captcha.

datascraper datascraping python security-bypass webcrawler webcrawling webscraper webscraping

Last synced: 25 Mar 2025

https://github.com/splorg/sage

A scraper to get every quote from a book off of Goodreads.

books crawler datamining goodreads goodreads-data python scraper scrapy webcrawling webscraping

Last synced: 12 Jun 2025

https://github.com/kalana99/url_listing

[Test] Web Crawling tool to list down URLs for a given domain and build a tree structure

python3 selenium-webdriver tree-structure webcrawling

Last synced: 21 Jan 2026

https://github.com/medson/ocrawl

A simple crawler to map sites relations

charts golang goquery webcrawling

Last synced: 13 Mar 2026

https://github.com/ajaythorve/data-structures-and-algorithms

compilation of all data structures and algorithms I implement in Java

algorithm-challenges ctci datastructures graph-algorithms java webcrawling

Last synced: 25 Jun 2026