https://github.com/harel/metatron
A Python 3.x HTML Meta tag parser, with emphasis on OpenGraph and complex meta tag schemes
https://github.com/harel/metatron
meta-tags opengraph parser python python3 twitter-card
Last synced: 19 days ago
JSON representation
A Python 3.x HTML Meta tag parser, with emphasis on OpenGraph and complex meta tag schemes
- Host: GitHub
- URL: https://github.com/harel/metatron
- Owner: harel
- License: mit
- Created: 2018-01-30T10:07:55.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2021-01-02T23:36:06.000Z (about 5 years ago)
- Last Synced: 2025-09-24T23:51:34.086Z (4 months ago)
- Topics: meta-tags, opengraph, parser, python, python3, twitter-card
- Language: Python
- Homepage:
- Size: 18.6 KB
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE.txt
Awesome Lists containing this project
README
.. image:: https://badge.fury.io/py/metatron.svg
:target: https://badge.fury.io/py/metatron
Metatron
========
HTML Meta tag parser, with emphasis on OpenGraph/Twitter Cards, and
complex (and custom) meta tag schemes. Supports Python 3.x and up.
The Metatron object extends dict, and all the meta tag data is set within it.
Installation
------------
You know the drill:
::
pip install metatron
Straight to an example
----------------------
Simple meta tag collector
^^^^^^^^^^^^^^^^^^^^^^^^^
This example collects top level meta tags without a schema:
::
> mt = Metatron(url='https://www.bbc.co.uk')
> mt.traverse()
{
'x-country': 'gb',
'x-audience': 'Domestic',
'CPS_AUDIENCE': 'Domestic',
'CPS_CHANGEQUEUEID': '115204091',
'theme-color': 'bb1919',
'msapplication-TileColor': '#bb1919'
}
> mt['x-country']
> 'gb'
Collect structures OpenGraph meta tags
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
> mt = Metatron(url='https://www.bbc.co.uk', schemas=['og'])
> mt.traverse()
{
'og': {
'title': 'Home - BBC News',
'description': 'Visit BBC News for up-to-the-minute news....',
'site_name': 'BBC News',
'locale': 'en_GB',
'article': {
'author': 'https://www.facebook.com/bbcnews',
'section': 'Home'
},
'url': 'http://www.bbc.co.uk/news',
'image': '//m.files.bbci.co.uk/modules/bbc-morph-news-waf-page-meta/2.1.0/bbc_news_logo.png'
}
}
Supports opengraph arrays (and can receive content as input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
> content = """
"""
> mt = Metatron(content=content, schemas=['og'])
> mt.traverse()
{
'og': {
'description': 'description tag',
'image': [
{
'alt': 'First image descriptoin',
'height': '300',
'image': 'http://example.com/image.jpg',
'secure_url': 'https://secure.example.com/image.jpg',
'type': 'image/jpeg',
'width': '400'
},
{
'alt': 'A shiny green apple with a bite taken out',
'height': '600',
'image': 'http://example.com/image2.jpg',
'secure_url': 'https://secure.example.com/ogp.jpg',
'type': 'image/jpeg',
'width': '500'
}
],
'title': [
'First title tag',
'Second title tag'
]
}
}
Add your own meta tag schema
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can provide Metatron with your own meta tag schema spec
::
> from metatron import add_schema_spec
> my_spec = {
'name': 'boom',
'attribute': 'name',
'value': 'value'
}
> add_schema_spec(my_spec)
You can then parse tags like:
::
Using
::
> mt = Metatron(url='http://example.com', schemas=['boom'])
> mt.traverse()
> {
'boom': {
'title': 'Boom title',
'description': 'Boom description'
}
}
Note that the `add_schema_spec` call allows for an additional argument to tag the schema as.
This is useful if you have multiple schema definitions with the same name (or no name).
Custom schema - custom tag
^^^^^^^^^^^^^^^^^^^^^^^^^^
It is possible to add a `tag` key to the custom schema spec,
which can then lookup different tags other than `meta`.
If not provided, `meta` is used by default.
*Added in 0.4*
::
> from metatron import add_schema_spec
> my_spec = {
'tag': 'link',
'name': '',
'attribute': 'rel',
'value': 'href'
}
> # Register the schema as 'link' to avoid duplicating existing no-name schema
> add_schema_spec(my_spec, 'link')
> content = """
"""
> mt = Metatron(content=content, schemas=['link'])
> mt.traverse()
Can run from the command line
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
$ make run URL=http://bbc.co.uk/news SCHEMA=og
or
$ python -m metatron.metatron http://bbc.co.uk/news og
$ Getting: http://bbc.co.uk/news (schemas: og)
{'og': {'description': 'Visit BBC News for up-to-the-minute news, breaking '
'news, video, audio and feature stories. BBC News '
'provides trusted World and UK news as well as local '
'and regional perspectives. Also entertainment, '
'business, science, technology and health news.',
'image': '//m.files.bbci.co.uk/modules/bbc-morph-news-waf-page-meta/2.2.1/bbc_news_logo.png',
'locale': 'en_GB',
'section': 'Home',
'site_name': 'BBC News',
'title': 'Home - BBC News',
'type': 'website',
'url': 'http://www.bbc.co.uk/news'}}
Dependencies
^^^^^^^^^^^^
- requests
- beautifulsoup4