Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/meyt/linkpreview
Get link preview in python
https://github.com/meyt/linkpreview
Last synced: 17 days ago
JSON representation
Get link preview in python
- Host: GitHub
- URL: https://github.com/meyt/linkpreview
- Owner: meyt
- License: mit
- Created: 2020-02-10T17:47:24.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-09-27T18:51:05.000Z (about 2 months ago)
- Last Synced: 2024-10-13T09:31:22.576Z (30 days ago)
- Language: Python
- Size: 71.3 KB
- Stars: 46
- Watchers: 3
- Forks: 9
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# linkpreview
[![Build Status](https://github.com/meyt/linkpreview/actions/workflows/main.yaml/badge.svg)](https://github.com/meyt/linkpreview/actions)
[![Coverage Status](https://coveralls.io/repos/github/meyt/linkpreview/badge.svg?branch=master)](https://coveralls.io/github/meyt/linkpreview?branch=master)
[![pypi](https://img.shields.io/pypi/pyversions/linkpreview.svg)](https://pypi.python.org/pypi/linkpreview)Get link preview in python
Gathering data from:
1. [OpenGraph](https://ogp.me/) meta tags
2. [TwitterCard](https://developer.twitter.com/en/docs/tweets/optimize-with-cards/overview/abouts-cards) meta tags
3. [Microdata]() meta tags
4. [JSON-LD](https://en.wikipedia.org/wiki/JSON-LD) meta tags
5. HTML Generic tags (`h1`, `p`, `img`)
6. URL readable parts## Install
```
pip install linkpreview
```## Usage
### Basic
```python
from linkpreview import link_previewurl = "http://localhost"
content = """
a title
"""
preview = link_preview(url, content)
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```### Automatic fetch link content
```python
from linkpreview import link_previewpreview = link_preview("http://github.com/")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```### `lxml` as XML parser
Very recommended for better performance.
[Install](https://lxml.de/installation.html) the `lxml` and use it like this:
```python
from linkpreview import link_previewpreview = link_preview("http://github.com/", parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```### Advanced
```python
from linkpreview import Link, LinkPreview, LinkGrabberurl = "http://github.com"
grabber = LinkGrabber(
initial_timeout=20,
maxsize=1048576,
receive_timeout=10,
chunk_size=1024,
)
content, url = grabber.get_content(url)
link = Link(url, content)
preview = LinkPreview(link, parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```Extend default headers:
```python
content, url = grabber.get_content(url, headers={'user-agent': 'Twitterbot'})
```Ignore default headers:
```python
content, url = grabber.get_content(
url,
headers={'user-agent': 'Twitterbot', 'accept': '*/*'},
replace_headers=True,
)
```Use preset headers:
```python
content, url = grabber.get_content( url, headers='googlebot')
```Available presets:
`firefox`,
`chrome`,
`googlebot`,
`twitterbot`,
`telegrambot`,
`imessagebot`If you already have parsed `BeautifulSoup` object:
```python
from bs4 import BeautifulSoup
from linkpreview import Link, LinkPreviewurl = "http://example.com"
content = "Hello
"
soup = BeautifulSoup(content, "html.parser")
link = Link(url, content)
preview = LinkPreview(link, soup=soup)
print("title:", preview.title)
```