Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with metadata-storage
A curated list of projects in awesome lists tagged with metadata-storage .
https://github.com/simonpierreboucher/crawler
A robust, modular web crawler built in Python for extracting and saving content from websites. This crawler is specifically designed to extract text content from both HTML and PDF files, saving them in a structured format with metadata.
concurrent-crawling content-extraction data-collection data-extraction-pipeline data-preservation-and-recovery data-scraping error-handling html-parsing http-requests metadata-storage modular-design pdf-text-extraction python-crawler rate-limiting structured-data-storage text-processing url-normalization web-crawling yaml-configuration
Last synced: 12 Dec 2024