https://github.com/meyt/linkpreview

Get link preview in python
https://github.com/meyt/linkpreview

Last synced: 9 months ago
JSON representation

Get link preview in python

Host: GitHub
URL: https://github.com/meyt/linkpreview
Owner: meyt
License: mit
Created: 2020-02-10T17:47:24.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2024-09-27T18:51:05.000Z (over 1 year ago)
Last Synced: 2025-03-29T07:09:46.311Z (9 months ago)
Language: Python
Size: 71.3 KB
Stars: 48
Watchers: 2
Forks: 9
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # linkpreview

[![Build Status](https://github.com/meyt/linkpreview/actions/workflows/main.yaml/badge.svg)](https://github.com/meyt/linkpreview/actions)

[![Coverage Status](https://coveralls.io/repos/github/meyt/linkpreview/badge.svg?branch=master)](https://coveralls.io/github/meyt/linkpreview?branch=master)

[![pypi](https://img.shields.io/pypi/pyversions/linkpreview.svg)](https://pypi.python.org/pypi/linkpreview)

Get link preview in python

Gathering data from:

1. [OpenGraph](https://ogp.me/) meta tags

2. [TwitterCard](https://developer.twitter.com/en/docs/tweets/optimize-with-cards/overview/abouts-cards) meta tags

3. [Microdata]() meta tags

4. [JSON-LD](https://en.wikipedia.org/wiki/JSON-LD) meta tags

5. HTML Generic tags (`h1`, `p`, `img`)

6. URL readable parts

## Install

```

pip install linkpreview

```

## Usage

### Basic

```python

from linkpreview import link_preview

url = "http://localhost"

content = """

  

    

    

    

    a title

  

  

  

  

"""

preview = link_preview(url, content)

print("title:", preview.title)

print("description:", preview.description)

print("image:", preview.image)

print("force_title:", preview.force_title)

print("absolute_image:", preview.absolute_image)

print("site_name:", preview.site_name)

print("favicon:", preview.favicon)

print("absolute_favicon:", preview.absolute_favicon)

```

### Automatic fetch link content

```python

from linkpreview import link_preview

preview = link_preview("http://github.com/")

print("title:", preview.title)

print("description:", preview.description)

print("image:", preview.image)

print("force_title:", preview.force_title)

print("absolute_image:", preview.absolute_image)

print("site_name:", preview.site_name)

print("favicon:", preview.favicon)

print("absolute_favicon:", preview.absolute_favicon)

```

### `lxml` as XML parser

Very recommended for better performance.

[Install](https://lxml.de/installation.html) the `lxml` and use it like this:

```python

from linkpreview import link_preview

preview = link_preview("http://github.com/", parser="lxml")

print("title:", preview.title)

print("description:", preview.description)

print("image:", preview.image)

print("force_title:", preview.force_title)

print("absolute_image:", preview.absolute_image)

print("site_name:", preview.site_name)

print("favicon:", preview.favicon)

print("absolute_favicon:", preview.absolute_favicon)

```

### Advanced

```python

from linkpreview import Link, LinkPreview, LinkGrabber

url = "http://github.com"

grabber = LinkGrabber(

    initial_timeout=20,

    maxsize=1048576,

    receive_timeout=10,

    chunk_size=1024,

)

content, url = grabber.get_content(url)

link = Link(url, content)

preview = LinkPreview(link, parser="lxml")

print("title:", preview.title)

print("description:", preview.description)

print("image:", preview.image)

print("force_title:", preview.force_title)

print("absolute_image:", preview.absolute_image)

print("site_name:", preview.site_name)

print("favicon:", preview.favicon)

print("absolute_favicon:", preview.absolute_favicon)

```

Extend default headers:

```python

content, url = grabber.get_content(url, headers={'user-agent': 'Twitterbot'})

```

Ignore default headers:

```python

content, url = grabber.get_content(

  url,

  headers={'user-agent': 'Twitterbot', 'accept': '*/*'},

  replace_headers=True,

)

```

Use preset headers:

```python

content, url = grabber.get_content( url, headers='googlebot')

```

Available presets:

`firefox`,

`chrome`,

`googlebot`,

`twitterbot`,

`telegrambot`,

`imessagebot`

If you already have parsed `BeautifulSoup` object:

```python

from bs4 import BeautifulSoup

from linkpreview import Link, LinkPreview

url = "http://example.com"

content = "
Hello"

soup = BeautifulSoup(content, "html.parser")

link = Link(url, content)

preview = LinkPreview(link, soup=soup)

print("title:", preview.title)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/meyt/linkpreview

Awesome Lists containing this project

README

Hello