https://github.com/meyt/linkpreview
Get link preview in python
https://github.com/meyt/linkpreview
Last synced: 9 months ago
JSON representation
Get link preview in python
- Host: GitHub
- URL: https://github.com/meyt/linkpreview
- Owner: meyt
- License: mit
- Created: 2020-02-10T17:47:24.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-09-27T18:51:05.000Z (over 1 year ago)
- Last Synced: 2025-03-29T07:09:46.311Z (9 months ago)
- Language: Python
- Size: 71.3 KB
- Stars: 48
- Watchers: 2
- Forks: 9
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# linkpreview
[](https://github.com/meyt/linkpreview/actions)
[](https://coveralls.io/github/meyt/linkpreview?branch=master)
[](https://pypi.python.org/pypi/linkpreview)
Get link preview in python
Gathering data from:
1. [OpenGraph](https://ogp.me/) meta tags
2. [TwitterCard](https://developer.twitter.com/en/docs/tweets/optimize-with-cards/overview/abouts-cards) meta tags
3. [Microdata]() meta tags
4. [JSON-LD](https://en.wikipedia.org/wiki/JSON-LD) meta tags
5. HTML Generic tags (`h1`, `p`, `img`)
6. URL readable parts
## Install
```
pip install linkpreview
```
## Usage
### Basic
```python
from linkpreview import link_preview
url = "http://localhost"
content = """
a title
"""
preview = link_preview(url, content)
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```
### Automatic fetch link content
```python
from linkpreview import link_preview
preview = link_preview("http://github.com/")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```
### `lxml` as XML parser
Very recommended for better performance.
[Install](https://lxml.de/installation.html) the `lxml` and use it like this:
```python
from linkpreview import link_preview
preview = link_preview("http://github.com/", parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```
### Advanced
```python
from linkpreview import Link, LinkPreview, LinkGrabber
url = "http://github.com"
grabber = LinkGrabber(
initial_timeout=20,
maxsize=1048576,
receive_timeout=10,
chunk_size=1024,
)
content, url = grabber.get_content(url)
link = Link(url, content)
preview = LinkPreview(link, parser="lxml")
print("title:", preview.title)
print("description:", preview.description)
print("image:", preview.image)
print("force_title:", preview.force_title)
print("absolute_image:", preview.absolute_image)
print("site_name:", preview.site_name)
print("favicon:", preview.favicon)
print("absolute_favicon:", preview.absolute_favicon)
```
Extend default headers:
```python
content, url = grabber.get_content(url, headers={'user-agent': 'Twitterbot'})
```
Ignore default headers:
```python
content, url = grabber.get_content(
url,
headers={'user-agent': 'Twitterbot', 'accept': '*/*'},
replace_headers=True,
)
```
Use preset headers:
```python
content, url = grabber.get_content( url, headers='googlebot')
```
Available presets:
`firefox`,
`chrome`,
`googlebot`,
`twitterbot`,
`telegrambot`,
`imessagebot`
If you already have parsed `BeautifulSoup` object:
```python
from bs4 import BeautifulSoup
from linkpreview import Link, LinkPreview
url = "http://example.com"
content = "
Hello
"
soup = BeautifulSoup(content, "html.parser")
link = Link(url, content)
preview = LinkPreview(link, soup=soup)
print("title:", preview.title)
```