https://github.com/mariocesar/htmlmetadata

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks
https://github.com/mariocesar/htmlmetadata

metadata-extraction metatags python-module python3 schema-org

Last synced: over 1 year ago
JSON representation

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks

Host: GitHub
URL: https://github.com/mariocesar/htmlmetadata
Owner: mariocesar
License: mit
Created: 2019-11-20T01:42:13.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2021-10-01T21:39:15.000Z (almost 5 years ago)
Last Synced: 2025-04-13T23:15:39.743Z (over 1 year ago)
Topics: metadata-extraction, metatags, python-module, python3, schema-org
Language: Python
Size: 24.4 KB
Stars: 4
Watchers: 0
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # HTMLmetadata

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks

> Inspired in https://metascraper.js.org

# Install

```bash

pip install htmlmetadata

```

# Use

You can use it by calling the module directly.

```

python -m htmlmetadata http://schema.org/docs/about.html                                                                            

{

  "request": {

    "url": "http://schema.org/docs/about.html"

  },

  "summary": {

    "description": "Schema.org is a set of extensible schemas that enables webmasters to embed\n    structured data on their web pages for use by search engines and other applications.",

    "title": "about page - schema.org",

    "language": "en"

  }

}

```

Or use it directly in your code.

```python

from htmlmetadata import extract_metadata

data = extract_metadata("http://schema.org/docs/about.html")

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mariocesar/htmlmetadata

Awesome Lists containing this project

README