https://github.com/deflix-tv/imdb2meta
A service for getting movie and TV show metadata for an IMDb ID via HTTP or gRPC
https://github.com/deflix-tv/imdb2meta
go golang grpc http imdb imdb-dataset metadata web-service
Last synced: about 2 months ago
JSON representation
A service for getting movie and TV show metadata for an IMDb ID via HTTP or gRPC
- Host: GitHub
- URL: https://github.com/deflix-tv/imdb2meta
- Owner: Deflix-tv
- License: agpl-3.0
- Created: 2020-11-21T21:12:46.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2021-01-16T19:37:26.000Z (over 5 years ago)
- Last Synced: 2024-12-29T15:44:32.990Z (over 1 year ago)
- Topics: go, golang, grpc, http, imdb, imdb-dataset, metadata, web-service
- Language: Go
- Homepage:
- Size: 72.3 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# imdb2meta
A service for getting movie and TV show metadata for an IMDb ID via HTTP or gRPC, using the official IMDb datasets
## Content
- [Content](#content)
- [Usage](#usage)
1. [Import data](#1-import-data)
2. [Run service](#2-run-service)
3. [Query service](#3-query-service)
- [Protocol buffer generation](#protocol-buffer-generation)
- [⚠ Warning](#⚠-warning)
## Usage
First you need import the data of the IMDb dataset into a database, then you need to start the web service which is backed by the database and finally you can query it via HTTP or gRPC.
### 1. Import data
First you need import the data of the IMDb dataset into a database. We support [BadgerDB](https://github.com/dgraph-io/badger) and [bbolt](https://github.com/etcd-io/bbolt).
Steps:
1. Download the `title.basics.tsv.gz` dataset from
- For more info about IMDb datasets see
- > ⚠ Warning: `IMDb.com, Inc` is the copyright owner of the data in the IMDb datasets. You may only use the data for personal and non-commercial use. For more info see ["Can I use IMDb data in my software?"](https://help.imdb.com/article/imdb/general-information/can-i-use-imdb-data-in-my-software/G5JTRESSHJBBHTGX) and their [copyright/conditions of use](https://www.imdb.com/conditions) statement.
2. Exract the TSV file somewhere
3. Run the import tool with the appropriate CLI arguments
- Example: `imdb2meta-import -tsvPath "/home/john/Downloads/data.tsv" -badgerPath "/home/john/imdb2meta/badger"`
> Note: The import takes a while (and much longer with bbolt than with BadgerDB), the process requires a lot of memory and the final DB size is fairly big.
> With a 6-core, 12-thread CPU and a mid-range SSD, an import of all data (7351639 rows as of 2020-11-21) into BadgerDB takes 4 minutes, up to 1.03 GB memory and the final DB size is 1.29 GB.
> When skipping TV episodes and storing only the minimal metadata it takes 1 minute and 5 seconds, up to 530 MB memory and the final DB size is 314 MB.
CLI reference:
```text
Usage of imdb2meta-import:
-badgerPath string
Path to the directory with the BadgerDB files
-boltPath string
Path to the bbolt DB file
-limit int
Limit the number of rows to process (excluding the header row)
-minimal
Only store minimal metadata (ID, type, title, release/start year)
-skipEpisodes
Skip storing individual TV episodes
-skipMisc
Skip title types like "videoGame", "audiobook" and "radioSeries"
-tsvPath string
Path to the "data.tsv" file that's inside the "title.basics.tsv.gz" archive
```
### 2. Run service
After importing the data you can start the web service.
Example: `imdb2meta-service -badgerPath "/home/john/imdb2meta/badger"`
CLI reference:
```text
Usage of imdb2meta-service:
-badgerPath string
Path to the directory with the BadgerDB files
-bindAddr string
Local interface address to bind to. "localhost" only allows access from the local host. "0.0.0.0" binds to all network interfaces. (default "localhost")
-boltPath string
Path to the bbolt DB file
-grpcPort int
Port to listen on for gRPC requests (default 8081)
-httpPort int
Port to listen on for HTTP requests (default 8080)
```
#### Docker
You can also run the service as Docker container.
1. Update the image: `docker pull doingodswork/imdb2meta-service`
2. Start the container: `docker run --name imdb2meta -v /path/to/badger:/data -p 8080:8080 -p 8081:8081 doingodswork/imdb2meta-service -badgerPath "/data"`
- > Note: `Ctrl-C` only detaches from the container. It doesn't stop it.
- When detached, you can attach again with `docker attach imdb2meta`
3. To stop the container: `docker stop imdb2meta`
4. To start the (still existing) container again: `docker start imdb2meta`
### 3. Query service
After starting the web service you can query it via HTTP or gRPC:
#### HTTP
Example request: `curl "http://localhost:8080/meta/tt1254207"`
Example response:
```json
{
"id": "tt1254207",
"titleType": "SHORT",
"primaryTitle": "Big Buck Bunny",
"startYear": 2008,
"runtime": 10,
"genres": [
"Animation",
"Comedy",
"Short"
]
}
```
#### gRPC
Example request (using [grpcurl](https://github.com/fullstorydev/grpcurl)): `grpcurl -plaintext -d '{"id":"tt1254207"}' localhost:8081 imdb2meta.MetaFetcher/Get`
(In Windows/PowerShell you have to use `'{\"id\":\"tt1254207\"}'`)
Example response:
```json
{
"id": "tt1254207",
"titleType": "SHORT",
"primaryTitle": "Big Buck Bunny",
"startYear": 2008,
"runtime": 10,
"genres": [
"Animation",
"Comedy",
"Short"
]
}
```
## Protocol buffer generation
To re-generate the `meta.pb.go` file from the `meta.proto` file, run: `protoc -I="./protos" --go_out=./pb --go_opt=paths=source_relative meta.proto`
To re-generate the `service.pb.go` and `service_grpc.pb.go` files from the `service.proto` file, run: `protoc -I="./protos" --go_out=./pb --go_opt=paths=source_relative --go-grpc_out=./pb --go-grpc_opt=paths=source_relative service.proto`
## ⚠ Warning
`IMDb.com, Inc` is the copyright owner of the data in the IMDb datasets. You may only use the data for personal and non-commercial use. For more info see ["Can I use IMDb data in my software?"](https://help.imdb.com/article/imdb/general-information/can-i-use-imdb-data-in-my-software/G5JTRESSHJBBHTGX) and their [copyright/conditions of use](https://www.imdb.com/conditions) statement.