https://github.com/zembrodt/pymdb
Python package to both parse datsets provided by IMDb and scrape information from imdb.com
https://github.com/zembrodt/pymdb
actor actress api cinema composer director film imdb imdb-api imdb-dataset imdb-movies movie-database moviedb-api movies movies-api pymdb tvdb webscraper webscrapping writer
Last synced: 2 months ago
JSON representation
Python package to both parse datsets provided by IMDb and scrape information from imdb.com
- Host: GitHub
- URL: https://github.com/zembrodt/pymdb
- Owner: zembrodt
- License: mit
- Created: 2019-09-28T16:11:23.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T07:43:52.000Z (over 3 years ago)
- Last Synced: 2025-08-19T05:23:19.232Z (10 months ago)
- Topics: actor, actress, api, cinema, composer, director, film, imdb, imdb-api, imdb-dataset, imdb-movies, movie-database, moviedb-api, movies, movies-api, pymdb, tvdb, webscraper, webscrapping, writer
- Language: Python
- Homepage: https://pymdb.com
- Size: 377 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PyMDb
[](https://pypi.org/project/py-mdb/)
[](https://pypi.org/project/py-mdb/)
[](https://github.com/zembrodt/pymdb/blob/master/LICENSE)
[](https://travis-ci.com/zembrodt/pymdb)
PyMDb is a package for both parsing the [datasets provided by IMDb](https://datasets.imdbws.com/) and scraping information from their web pages.
This package is able to gather information on people, titles, and companies provided by IMDb and is split into two separate modules: one for parsing the IMDb datasets, and one for scraping webpages on [imdb.com](http://imdb.com/).
## Installation
The latest release of PyMDb can be installed from PyPI with:
```pip install py-mdb```
If downloading the source from GitHub, PyMDb requires the following packages:
- [requests](https://github.com/psf/requests)
- [selectolax](https://github.com/rushter/selectolax)
## Usage
```python
>>> import pymdb
>>> from collections import defaultdict
>>>
>>> parser = pymdb.PyMDbParser(gunzip_files=True)
>>> genre_count = defaultdict(int)
>>> for title in parser.get_title_basics("path/to/files"):
... for genre in title.genres:
... genre_count[genre] += 1
...
>>> for genre in genre_count:
... print(f"{genre}: {genre_count[genre]}")
...
Documentary: 600184
Short: 837912
Animation: 312227
...
Talk-Show: 584252
Reality-TV: 307037
Adult: 178493
>>>
>>> scraper = pymdb.PyMDbScraper(rate_limit=500)
>>> title = scraper.get_title("tt0076759")
>>> print(f"{title.display_title} came out in {title.release_date.year}!")
Star Wars: Episode IV - A New Hope came out in 1977!
```
## Documentation
Full documentation can be found at the [PyMDb Read the Docs](https://pymdb.readthedocs.io/) page.
## Disclaimer
PyMDb is still in a pre-release state and has only been tested with a small amount of data found on [imdb.com](http://imdb.com/).
The web scraper portion of the code does have a rate limiter value you can customize, please be kind to IMDb.
If any bugs or issues are found, please do not hesitate to create an issue or make a pull request on [GitHub](https://github.com/zembrodt/pymdb).
Suggestions for features to be added to PyMDb in future releases are also welcome!
## License
This project is licensed under the MIT License. Please see the [LICENSE](https://github.com/zembrodt/pymdb/blob/master/LICENSE) file for details.