https://github.com/burntsushi/imdb-rename
A command line tool to rename media files based on titles from IMDb.
https://github.com/burntsushi/imdb-rename
Last synced: 6 months ago
JSON representation
A command line tool to rename media files based on titles from IMDb.
- Host: GitHub
- URL: https://github.com/burntsushi/imdb-rename
- Owner: BurntSushi
- License: unlicense
- Created: 2018-04-12T21:51:06.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-09-25T20:47:03.000Z (about 1 year ago)
- Last Synced: 2025-04-06T03:54:23.679Z (6 months ago)
- Language: Rust
- Size: 346 KB
- Stars: 233
- Watchers: 5
- Forks: 20
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: COPYING
Awesome Lists containing this project
README
imdb-rename
===========
A command line tool to rename media files based on titles from IMDb.
imdb-rename downloads the official IMDb data set and creates a local index to
use for fast fuzzy searching.[](https://travis-ci.org/BurntSushi/imdb-rename)
[](https://ci.appveyor.com/project/BurntSushi/imdb-rename)
[](https://crates.io/crates/imdb-rename)Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
### Installation
**[Archives of precompiled binaries for imdb-rename are available for Windows,
macOS and Linux.](https://github.com/BurntSushi/imdb-rename/releases)**Otherwise, users are expected to compile imdb-rename from source:
```
$ git clone https://github.com/BurntSushi/imdb-rename
$ cd imdb-rename
$ cargo build --release
$ ./target/release/imdb-rename --help
```Alternatively, if you have
[Cargo installed](https://rustup.rs),
then you can install imdb-rename directly from
[crates.io](https://crates.io):```
$ cargo install imdb-rename
```imdb-rename's minimum supported Rust version is **1.28.0**.
#### Archlinux
An aur package is available: [imdb-rename](https://aur.archlinux.org/packages/imdb-rename/).
### Quick example
Ever since Season 1 of The Simpsons came out on DVD, I've been collecting them
and ripping them on to my hard drive. My process is somewhat manual, but I
wind up with a directory that looks like this:```
S18E01.mkv S18E05.mkv S18E09.mkv S18E13.mkv S18E17.mkv S18E21.mkv
S18E02.mkv S18E06.mkv S18E10.mkv S18E14.mkv S18E18.mkv S18E22.mkv
S18E03.mkv S18E07.mkv S18E11.mkv S18E15.mkv S18E19.mkv
S18E04.mkv S18E08.mkv S18E12.mkv S18E16.mkv S18E20.mkv
```It would be much nicer if these files had their proper episode titles.
imdb-rename can rename these files automatically using episode titles from
IMDb:```
$ imdb-rename -q 'the simpsons {show}' *.mkv
```This command ran a query with the `-q` flag to identify the TV show, provided
the files to rename, and... presto!```
S18E01 - The Mook, the Chef, the Wife and Her Homer.mkv
S18E02 - Jazzy & The Pussycats.mkv
S18E03 - Please Homer, Don't Hammer 'Em.mkv
S18E04 - Treehouse of Horror XVII.mkv
S18E05 - G.I. (Annoyed Grunt).mkv
S18E06 - Moe'N'a Lisa.mkv
S18E07 - Ice Cream of Margie: With the Light Blue Hair.mkv
S18E08 - The Haw-Hawed Couple.mkv
S18E09 - Kill Gil, Vol. 1 & 2.mkv
S18E10 - The Wife Aquatic.mkv
S18E11 - Revenge Is a Dish Best Served Three Times.mkv
S18E12 - Little Big Girl.mkv
S18E13 - Springfield Up.mkv
S18E14 - Yokel Chords.mkv
S18E15 - Rome-old and Juli-eh.mkv
S18E16 - Homerazzi.mkv
S18E17 - Marge Gamer.mkv
S18E18 - The Boys of Bummer.mkv
S18E19 - Crook and Ladder.mkv
S18E20 - Stop or My Dog Will Shoot.mkv
S18E21 - 24 Minutes.mkv
S18E22 - You Kent Always Say What You Want.mkv
```### Fancier example
imdb-rename isn't limited to just renaming TV episodes based on season/episode
numbers. It can also perform a fuzzy match based on the contents of the
file name. For example, given this file:```
Thor.Ragnarok.2017.1080p.WEB-DL.DD5.1.H264-FGT.mkv
```We can "clean it up" and rename it to a nice title like so:
```
$ imdb-rename Thor.Ragnarok.2017.1080p.WEB-DL.DD5.1.H264-FGT.mkv
```which gives us:
```
Thor: Ragnarok (2017).mkv
```### Freeform searching
We can also use imdb-rename to search IMDb, which is the default behavior
when a `-q/--query` is provided without any file names:```
$ imdb-rename -q 'homey loves flanders'
# score id kind title year tv
1 1.000 tt0773646 tvEpisode Homer Loves Flanders 1994 S05E16 The Simpsons
2 0.646 tt2101691 tvEpisode Tiny Loves Flowers N/A S02E08 Dinosaur Train
3 0.568 tt3203408 tvEpisode Courtney Loves Love 2014 S01E05 Courtney Loves Dallas
4 0.561 tt1722576 short In Flanders Fields 2010
5 0.561 tt2253780 tvSeries In Vlaamse Velden 2014
6 0.555 tt4528474 video My Lovely Homeland 2011
7 0.551 tt0220646 tvMovie Moll Flanders 1975
[... results truncated ...]
```Notice that our query had a typo in it. imdb-rename does its best to find the
most relevant results. It is also fast. Even though the above query searches
through all 6 million names in IMDb, it runs in under 100ms. This is thanks to
using an inverted index memory mapped from disk.### How does it work?
imdb-rename works by downloading
[approved datasets from IMDb](https://www.imdb.com/interfaces/),
and creating an inverted index based on ngrams extracted
from the names in IMDb's data. The inverted index provides a
quick way to search and rank results using techniques from
[information retrieval](https://nlp.stanford.edu/IR-book/)
such as
[Okapi-BM25](https://en.wikipedia.org/wiki/Okapi_BM25).### Motivation
My motivation for building this tool is somewhat idiosyncratic, but three-fold:
1. I find it very convenient to have a tool to rename media files
automatically. imdb-rename is my third iteration on this tool. The first was
an unpublished hodge podge of Python scripts and a MySQL database. The
second was a
[Go program with a PostgreSQL database](https://github.com/BurntSushi/goim).
The Go program served me well, but IMDb retired their old data format, which
required me to build a new tool to adapt.
2. I've been working on a low-level information retrieval library off-and-on
for a couple years, and initially built this tool on top of that library as
a form of dogfooding. It didn't work out as well as I'd hoped, so I scrapped
the generic library and built out a specific solution tailored to IMDb. I'm
no longer dogfooding directly, but I've established a useful baseline.
3. I want more people to learn about information retrieval, and I believe this
tool can serve to teach others. In particular, imdb-rename is a complete
end-to-end information retrieval system that is fast, solves a real problem,
is only a few thousand lines of code and comes with a built-in
evaluation that is easy to run.This tool is perhaps a bit over engineered, but I had fun with it. Believe it
or not, parts of imdb-rename are intentionally simple at the cost of both query
speed and size on disk!### Evaluation
It is possible to run an evaluation to compare the various parameters available
for searching. The evaluation system is available as a separate tool called
imdb-eval, which is included in this repository. To use it, we must first build
it:```
$ git clone https://github.com/BurntSushi/imdb-rename
$ cd imdb-rename
$ cargo build --release --all
$ ./target/release/imdb-eval --help
```Running an evaluation is simple. We can run an evaluation on all combinations
of scorer and similarity function, along with ngram sizes of 3 and 4 like so:
(This will use truth data that is built into the `imdb-eval` binary.)```
$ ./target/release/imdb-eval --ngram-size 3 --ngram-size 4 | tee eval.csv
```This will output the results of running a search on every item in the truth
data. The results include the rank of the expected answer. The results can be
summarized into a single score called the
[Mean Reciprocal Rank](https://en.wikipedia.org/wiki/Mean_reciprocal_rank)
(which is itself a specific instance of MAP, or mean average precision)
with the `--summarize` flag like so:```
$ ./target/release/imdb-eval --summarize eval.csv
```If you have [xsv](https://github.com/BurntSushi/xsv) installed, then the
results can be easily sorted and formatted:```
$ ./target/release/imdb-eval --summarize eval.csv | xsv sort -R -s mrr | xsv table
```If you want to tweak the truth data, then you might consider starting with the
bundled truth data (assuming you're at the root of the imdb-rename repository):```
$ $EDITOR data/eval/truth.toml
$ ./target/release/imdb-eval --ngram-size 3 --ngram-size 4 --truth data/eval/truth.toml
```### What does this tool not do?
imdb-rename is tool for renaming media files, and to the extent that searching
IMDb facilitates renaming files, it is also a search tool. There is no
intent to develop this further to explore all IMDb data, such as cast/crew
information.Folks interested in building a different type of IMDb tool may be interested
in the [`imdb-index`](https://docs.rs/imdb-index) crate, which provides
programmatic access to the index created by imdb-rename.### IMDb licensing
The data used by imdb-rename is retrieved from
[IMDb datasets](https://www.imdb.com/interfaces/).
In particular, imdb-rename will never scrape imdb.com, and only uses the data
provided by IMDb in the `tsv` files.Additionally, imdb-rename must only be used for non-commercial and personal
uses.