Projects in Awesome Lists tagged with web-data-extraction
A curated list of projects in awesome lists tagged with web-data-extraction .
https://github.com/diffbot/diffbot-python
Python client library for Diffbot APIs
crawler knowledge-graph natural-language-processing web-data web-data-extraction
Last synced: 12 Jun 2026
https://github.com/mohamedhmini/iww
AI based web-wrapper for web-content-extraction
ai data-mining information-extraction library python web-content-extractor web-data-extraction web-mining web-scraping
Last synced: 26 Oct 2025
https://github.com/luminati-io/java-web-scraping
Quick guide with code example how to use Java for web scraping
java maven scraping-websites web-data-extraction
Last synced: 22 Apr 2025
https://github.com/jjonescz/awe
AI-based web extractor
deep-learning information-extraction structured-web-data web-data-extraction web-scraping
Last synced: 10 Oct 2025
https://github.com/kaizenplatform/facebookinsightsconnector
The Tableau Web Data Connector for Facebook Insights API
facebook facebook-insights tableau web-data-extraction
Last synced: 12 Oct 2025
https://github.com/lekhmanrus/real-shot-pdf
RealShotPDF is a Chrome extension designed to simplify the process of creating PDF documents from web content. The extension allows users to navigate through selected webpages, parse and display links in a tree view, and generate PDFs for the chosen pages. It operates locally without sending any data to external servers.
ai-assistant angular browser-extension chrome-extension data-preservation gpt-integration knowledge-base knowledgebase link-parsing local-data-processing pdf pdf-downloader pdf-generation pdf-generator pdf-merger web-content-capture web-crawling web-data-extraction web-scraping webpage-to-pdf
Last synced: 14 Sep 2025
https://github.com/ranajahanzaib/wdx
A web data extraction library written in golang.
go-scraper mongodb nextjs scraper web-data-extraction
Last synced: 22 Feb 2026
https://github.com/yumeangelica/store_data_extractor
A Python-based web data extractor designed to monitor online stores and track product updates in real-time. This project is developed as a standalone module but is also part of the larger jirai_sweeties project, where it integrates with additional features.
aiohttp data-extraction lxml python3 store-monitoring web-data-extraction
Last synced: 11 Nov 2025
https://github.com/sc10ntech/extract-site-metadata
Metadata extractor for the sprawling web ⚙️
metadata-extraction open-graph-protocol web-data-extraction
Last synced: 01 Feb 2026
https://github.com/wyattowalsh/proxywhirl
rotating proxy system
data data-extraction dataextraction proxy proxy-checker proxy-list proxy-scraper proxy-server proxypool python python3 rotating-proxy sqlite sqlite3 web-data-extraction
Last synced: 03 Mar 2026