https://github.com/hackersandslackers/jsonld-scraper-tutorial
🌎 🖥 Supercharge your scraper to extract quality page metadata by parsing JSON-LD data via Python's extruct library.
https://github.com/hackersandslackers/jsonld-scraper-tutorial
beautifulsoup extruct json-ld python scraper structured-data tutorial
Last synced: 5 months ago
JSON representation
🌎 🖥 Supercharge your scraper to extract quality page metadata by parsing JSON-LD data via Python's extruct library.
- Host: GitHub
- URL: https://github.com/hackersandslackers/jsonld-scraper-tutorial
- Owner: hackersandslackers
- License: mit
- Created: 2020-07-02T12:32:38.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-04-20T22:03:01.000Z (6 months ago)
- Last Synced: 2025-04-28T11:27:15.711Z (5 months ago)
- Topics: beautifulsoup, extruct, json-ld, python, scraper, structured-data, tutorial
- Language: Python
- Homepage: https://hackersandslackers.com/scrape-metadata-json-ld/
- Size: 503 KB
- Stars: 14
- Watchers: 1
- Forks: 2
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Structured Data Scraping Tutorial




[](https://github.com/hackersandslackers/jsonld-scraper-tutorial/issues)
[](https://github.com/hackersandslackers/jsonld-scraper-tutorial/stargazers)
[](https://github.com/hackersandslackers/jsonld-scraper-tutorial/network)
Supercharge your scraper to extract quality page metadata by parsing JSON-LD data via Python's [extruct](https://github.com/scrapinghub/extruct) library.
This repository contains source code for the accompanying tutorial on Hackers and Slackers: https://hackersandslackers.com/scrape-metadata-json-ld/
## Installation
**Installation via `requirements.txt`**:
```shell
$ git clone https://github.com/hackersandslackers/jsonld-scraper-tutorial.git
$ cd jsonld-scraper-tutorial
$ python3 -m venv myenv
$ source myenv/bin/activate
$ pip3 install -r requirements.txt
$ python3 main.py
```**Installation via [Pipenv](https://pipenv-fork.readthedocs.io/en/latest/)**:
```shell
$ git clone https://github.com/hackersandslackers/jsonld-scraper-tutorial.git
$ cd jsonld-scraper-tutorial
$ pipenv shell
$ pipenv update
$ python3 main.py
```**Installation via [Poetry](https://python-poetry.org/)**:
```shell
$ git clone https://github.com/hackersandslackers/jsonld-scraper-tutorial.git
$ cd jsonld-scraper-tutorial
$ poetry shell
$ poetry update
$ poetry run
```## Usage
To change the URL targeted by this script, update the `URL` variable in **config.py**.
-----
**Hackers and Slackers** tutorials are free of charge. If you found this tutorial helpful, a [small donation](https://www.buymeacoffee.com/hackersslackers) would be greatly appreciated to keep us in business. All proceeds go towards coffee, and all coffee goes towards more content.