https://github.com/gianlucatruda/recommendation_extractor
A pair of scripts for (1) extracting URLs from HTML-esque documents and (2) fetching the title, description, and permalink for each URL.
https://github.com/gianlucatruda/recommendation_extractor
evernote http ifttt python requests url
Last synced: 2 months ago
JSON representation
A pair of scripts for (1) extracting URLs from HTML-esque documents and (2) fetching the title, description, and permalink for each URL.
- Host: GitHub
- URL: https://github.com/gianlucatruda/recommendation_extractor
- Owner: gianlucatruda
- License: gpl-3.0
- Created: 2019-01-28T11:21:14.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-09-09T12:52:18.000Z (about 6 years ago)
- Last Synced: 2025-04-03T01:26:12.519Z (6 months ago)
- Topics: evernote, http, ifttt, python, requests, url
- Language: Python
- Size: 18.6 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Recommendations Extractor
This solves a little problem of my own. If you have the same (or a very similar) problem, this may help you somewhat.
## My Little Problem
I use [IFTTT](https://ifttt.com) to send the URLs of media (podcasts, videos, articles, etc.) to an Evernote doc. Sifting through these ifttt-customised links is a mission, so I wanted a way to extract the original URL, the title, and a description from each URL.
I can export the Evernote file in HTML format. From there, this Python 3 code makes use of regular expressions, basic string/file processing, and the awesome [requests](https://github.com/requests/requests) library to extract the URLs and retrieve their information — writing the output to a .txt file of the user's choice.
## Example
Input
```html
via Pocket http://ift.tt/12iIegy```
Intermediary
```
http://ift.tt/12iIegy
```Output
```
Bill Gates’ five favorite books of 2014
For the business-minded bookworm in your life.
https://qz.com/308144/bill-gates-five-favorite-books-of-2014/
[http://ift.tt/12iIegy]```
## Usage
```python
python3 extract.py urls.txt
python3 fetch.py urls.txt
```