https://github.com/zembrodt/pymdb

Python package to both parse datsets provided by IMDb and scrape information from imdb.com
https://github.com/zembrodt/pymdb

actor actress api cinema composer director film imdb imdb-api imdb-dataset imdb-movies movie-database moviedb-api movies movies-api pymdb tvdb webscraper webscrapping writer

Last synced: 2 months ago
JSON representation

Python package to both parse datsets provided by IMDb and scrape information from imdb.com

Host: GitHub
URL: https://github.com/zembrodt/pymdb
Owner: zembrodt
License: mit
Created: 2019-09-28T16:11:23.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-12-08T07:43:52.000Z (over 3 years ago)
Last Synced: 2025-08-19T05:23:19.232Z (10 months ago)
Topics: actor, actress, api, cinema, composer, director, film, imdb, imdb-api, imdb-dataset, imdb-movies, movie-database, moviedb-api, movies, movies-api, pymdb, tvdb, webscraper, webscrapping, writer
Language: Python
Homepage: https://pymdb.com
Size: 377 KB
Stars: 6
Watchers: 1
Forks: 0
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # PyMDb

[![PyPI](https://img.shields.io/pypi/v/py-mdb.svg)](https://pypi.org/project/py-mdb/)

[![Python Versions](https://img.shields.io/pypi/pyversions/py-mdb.svg)](https://pypi.org/project/py-mdb/)

[![License](https://img.shields.io/pypi/l/py-mdb.svg)](https://github.com/zembrodt/pymdb/blob/master/LICENSE)

[![Build Status](https://travis-ci.com/zembrodt/pymdb.svg?branch=master)](https://travis-ci.com/zembrodt/pymdb)

PyMDb is a package for both parsing the [datasets provided by IMDb](https://datasets.imdbws.com/) and scraping information from their web pages.

This package is able to gather information on people, titles, and companies provided by IMDb and is split into two separate modules: one for parsing the IMDb datasets, and one for scraping webpages on [imdb.com](http://imdb.com/).

## Installation

The latest release of PyMDb can be installed from PyPI with:

```pip install py-mdb```

If downloading the source from GitHub, PyMDb requires the following packages:

- [requests](https://github.com/psf/requests)

- [selectolax](https://github.com/rushter/selectolax)

## Usage

```python

>>> import pymdb

>>> from collections import defaultdict

>>>

>>> parser = pymdb.PyMDbParser(gunzip_files=True)

>>> genre_count = defaultdict(int)

>>> for title in parser.get_title_basics("path/to/files"):

...     for genre in title.genres:

...             genre_count[genre] += 1

...

>>> for genre in genre_count:

...     print(f"{genre}: {genre_count[genre]}")

...

Documentary: 600184

Short: 837912

Animation: 312227

    ...

Talk-Show: 584252

Reality-TV: 307037

Adult: 178493

>>>

>>> scraper = pymdb.PyMDbScraper(rate_limit=500)

>>> title = scraper.get_title("tt0076759")

>>> print(f"{title.display_title} came out in {title.release_date.year}!")

Star Wars: Episode IV - A New Hope came out in 1977!

```  

## Documentation

Full documentation can be found at the [PyMDb Read the Docs](https://pymdb.readthedocs.io/) page.

## Disclaimer

PyMDb is still in a pre-release state and has only been tested with a small amount of data found on [imdb.com](http://imdb.com/).

The web scraper portion of the code does have a rate limiter value you can customize, please be kind to IMDb.

If any bugs or issues are found, please do not hesitate to create an issue or make a pull request on [GitHub](https://github.com/zembrodt/pymdb).

Suggestions for features to be added to PyMDb in future releases are also welcome!

## License

This project is licensed under the MIT License. Please see the [LICENSE](https://github.com/zembrodt/pymdb/blob/master/LICENSE) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zembrodt/pymdb

Awesome Lists containing this project

README