https://github.com/hudson-newey/website-text-extractor
This is a project to systematically extract all readable text out of a web page (only works on very primitive pages at the moment)
https://github.com/hudson-newey/website-text-extractor
reader-mode text text-classification text-processing website-scraper
Last synced: 1 day ago
JSON representation
This is a project to systematically extract all readable text out of a web page (only works on very primitive pages at the moment)
- Host: GitHub
- URL: https://github.com/hudson-newey/website-text-extractor
- Owner: hudson-newey
- License: unlicense
- Created: 2021-06-12T14:08:01.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2022-11-06T15:31:35.000Z (over 3 years ago)
- Last Synced: 2025-11-23T07:17:36.542Z (7 months ago)
- Topics: reader-mode, text, text-classification, text-processing, website-scraper
- Language: Ruby
- Homepage:
- Size: 5.86 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- License: LICENSE