Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mariocesar/htmlmetadata
Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks
https://github.com/mariocesar/htmlmetadata
metadata-extraction metatags python-module python3 schema-org
Last synced: 3 months ago
JSON representation
Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks
- Host: GitHub
- URL: https://github.com/mariocesar/htmlmetadata
- Owner: mariocesar
- License: mit
- Created: 2019-11-20T01:42:13.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-10-01T21:39:15.000Z (over 3 years ago)
- Last Synced: 2024-10-11T09:23:43.312Z (3 months ago)
- Topics: metadata-extraction, metatags, python-module, python3, schema-org
- Language: Python
- Size: 24.4 KB
- Stars: 3
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# HTMLmetadata
Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks> Inspired in https://metascraper.js.org
# Install
```bash
pip install htmlmetadata
```# Use
You can use it by calling the module directly.
```
python -m htmlmetadata http://schema.org/docs/about.html
{
"request": {
"url": "http://schema.org/docs/about.html"
},
"summary": {
"description": "Schema.org is a set of extensible schemas that enables webmasters to embed\n structured data on their web pages for use by search engines and other applications.",
"title": "about page - schema.org",
"language": "en"
}
}
```Or use it directly in your code.
```python
from htmlmetadata import extract_metadatadata = extract_metadata("http://schema.org/docs/about.html")
```