Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rednafi/html-to-text
Extract pure text from any webpage
https://github.com/rednafi/html-to-text
asgi flyio html llm python text
Last synced: 3 days ago
JSON representation
Extract pure text from any webpage
- Host: GitHub
- URL: https://github.com/rednafi/html-to-text
- Owner: rednafi
- License: mit
- Created: 2023-07-22T14:58:21.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-08T04:08:09.000Z (9 months ago)
- Last Synced: 2024-04-08T05:23:33.280Z (9 months ago)
- Topics: asgi, flyio, html, llm, python, text
- Language: Python
- Homepage: https://html-text.rednafi.com
- Size: 271 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
HTML to TEXT
>> Extract pure text from any webpage <<
## Why
LLMs with huge context windows like [Claude 2] enable the idea of pasting large blobs of
texts and asking questions about them. Often, I want to copy the entire content of a
webpage and pipe it into a chat window. One specific use case is when I want to grok
Python PEPs with the help of an LLM. This little [ASGI] tool allows you to parse the HTML
content of any publicly available page and turn it into pure text that's ingestible by a
language model.## Exploration
* Go to [html-text.rednafi.com] and paste a publicly accessible page URL. Then click
**Submit** and you'll see that the parsed text content will appear in the adjacent text
box:![screenshot-a]
* Copy the text content by clicking on the **Copy** button.
* Click **Clear** if you need a blank canvas.
## Development
* Ensure that docker is installed on your system.
* Clone the repo and head over to the root directory.
* Build and run the service locally:```sh
docker build -t html-to-text . \
&& docker run -p "5001:5000" html-to-text
```
* Head over to [http://localhost:5001] on your browser and explore the app.* Apply linter:
```sh
make lint
```* Run the tests
```sh
make test
```## Deployment
The app is built with Python 3.12 and is automagically deployed to [fly.io] via GitHub
Action.
✨ 🍰 ✨[screenshot-a]: https://github.com/rednafi/html-to-text/assets/30027932/20bb63bd-c4a8-48bf-8cda-d83857548b48
[http://localhost:5001]: http://localhost:5001
[html-text.rednafi.com]: https://html-text.rednafi.com
[fly.io]: https://fly.io
[claude 2]: https://www.anthropic.com/index/claude-2
[asgi]: https://asgi.readthedocs.io/en/latest/