https://github.com/hudson-newey/website-text-extractor

This is a project to systematically extract all readable text out of a web page (only works on very primitive pages at the moment)
https://github.com/hudson-newey/website-text-extractor

reader-mode text text-classification text-processing website-scraper

Last synced: 1 day ago
JSON representation

This is a project to systematically extract all readable text out of a web page (only works on very primitive pages at the moment)

Host: GitHub
URL: https://github.com/hudson-newey/website-text-extractor
Owner: hudson-newey
License: unlicense
Created: 2021-06-12T14:08:01.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2022-11-06T15:31:35.000Z (over 3 years ago)
Last Synced: 2025-11-23T07:17:36.542Z (7 months ago)
Topics: reader-mode, text, text-classification, text-processing, website-scraper
Language: Ruby
Homepage:
Size: 5.86 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- License: LICENSE

ecosyste.ms