Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/welpo/shuku
Shrink media to keep only the dialogue. Optimise your language immersion!
https://github.com/welpo/shuku
comprehensible-input condensation ffmpeg language-immersion language-learning language-learning-app media-condensation media-processing
Last synced: 12 days ago
JSON representation
Shrink media to keep only the dialogue. Optimise your language immersion!
- Host: GitHub
- URL: https://github.com/welpo/shuku
- Owner: welpo
- License: gpl-3.0
- Created: 2024-11-21T15:08:36.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-12-20T20:56:37.000Z (14 days ago)
- Last Synced: 2024-12-20T21:40:36.172Z (13 days ago)
- Topics: comprehensible-input, condensation, ffmpeg, language-immersion, language-learning, language-learning-app, media-condensation, media-processing
- Language: Python
- Homepage:
- Size: 38.6 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: COPYING
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
shuku
Shrink media to keep only the dialogue.
shuku — from 縮しゅく小しょう: "minification"Designed for language learners, `shuku` creates dialogue-only versions of media (show episodes, films, or YouTube videos) using subtitles. Useful to revisit content efficiently and improve comprehension.
https://github.com/user-attachments/assets/ae06b259-7043-423a-8895-5e8a60a32e37
## Table of contents
- [Features](#features)
- [Comparison with similar tools](#comparison-with-similar-tools)
- [Installation](#installation)
- [Usage](#quick-start)
- [Configuration](#configuration)
- [General options](#general-options)
- [Condensed audio options](#condensed-audio-options)
- [Condensed video options](#condensed-video-options)
- [Condensed subtitles options](#condensed-subtitles-options)
- [Audio quality settings](#audio-quality-settings)
- [Command line options](#command-line-options)
- [Examples](#examples)
- [Support](#support)
- [Contributing](#contributing)
- [License](#license)## Features
- Create condensed audio, video, and subtitle files based on dialogue
- Multiplatform support: GNU+Linux, macOS, and Windows
- Extensive configuration options, including codec, quality, and custom FFmpeg arguments
- Smart audio/subtitle track selection
- Fuzzy matching for external subtitles
- Pad subtitle timings for context and smoother transitions
- Shift subtitle timing without extra tools
- Skip unwanted subtitle lines (e.g., lyrics or sound effects)
- Skip chapters (e.g., openings, previews, credits)
- Smart metadata extraction (season, episode, title)
- Logging and progress tracking
- Generate LRC files from subtitles to use with music players
- Batch processing of multiple files and directories## Comparison with similar tools
| Feature | [shuku](#comparison-with-other-tools) | [impd](https://github.com/Ajatt-Tools/impd) | [Condenser](https://github.com/ercanserteli/condenser) |
|---------|:-------:|:------:|:-----------:|
| **Platform support** |
| GNU+Linux | ✅ | ✅ | ❌ |
| macOS | ✅ | ❌ | ❌ |
| Windows | ✅ | ❌ | ✅ |
| **Output** |
| Condensed audio | ✅ | ✅ | ✅ |
| Condensed subtitles | ✅ | ❌ | ✅ |
| Condensed video | ✅ | ❌ | ❌ |
| **Subtitle handling** |
| Support internal and external subs | ✅ | ✅ | ✅ |
| [Fuzzy subtitle search](#external_subtitle_search) | ✅ | ❌ | ❌ |
| [Language preference settings](#subtitle_languages) | ✅ | ✅ | ❌ |
| [Subtitle timing padding](#padding) | ✅ | ✅ | ✅ |
| [Shift subtitle timing](#--sub-delay-ms) | ✅ | ❌ | ❌ |
| [Skip chapters (credits, opening…)](#skip_chapters) | ✅ | ❌ | ❌ |
| [Skip subtitle lines based on patterns](#line_skip_patterns) | Unlimited patterns | One pattern | Filters content in parentheses & list of characters |
| **File management** |
| [Clean filenames](#clean_output_filename) | ✅ | ❌ | ❌ |
| [Custom filename suffix](#output_suffix) | ✅ | ❌ | ✅ |
| YouTube video processing | ❌ | ✅ | ❌ |
| Playlist management | ❌ | ✅ | ❌ |
| Automatic archiving of old files | ❌ | ✅ | ❌ |
| **User interface** |
| Graphical user interface | ❌ | ❌ | For user prompts |
| Command line interface | ✅ | ✅ | ✅ |
| **Configuration** |
| [Custom codec](#audio_codec) | ✅ | ❌ | ✅ |
| [Custom quality](#audio-quality-settings) | ✅ | ❌ | ❌ |
| [Custom ffmpeg arguments](#custom_ffmpeg_args) | ✅ | ✅ | ❌ |
| [Can extract without reencoding](#copy) | ✅ | ❌ | ❌ |
| Configuration format | TOML | Key-value | JSON |
| **Metadata** |
| Automatic metadata generation | ✅ | ✅ | ❌ |
| Episode number | ✅ | ✅ | ❌ |
| Season number | ✅ | ❌ | ❌ |
| Clean media title | ✅ | ❌ | ❌ |
| **Repository metrics** |
| Resolution time | [![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/welpo/shuku.svg)](http://isitmaintained.com/project/welpo/shuku) | [![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/Ajatt-Tools/impd.svg)](http://isitmaintained.com/project/Ajatt-Tools/impd) | [![Average time to resolve an issue](http://isitmaintained.com/badge/resolution/ercanserteli/condenser.svg)](http://isitmaintained.com/project/ercanserteli/condenser) |
| Open issues | [![Percentage of issues still open](http://isitmaintained.com/badge/open/welpo/shuku.svg)](http://isitmaintained.com/project/welpo/shuku) | [![Percentage of issues still open](http://isitmaintained.com/badge/open/Ajatt-Tools/impd.svg)](http://isitmaintained.com/project/Ajatt-Tools/impd) | [![Percentage of issues still open](http://isitmaintained.com/badge/open/ercanserteli/condenser.svg)](http://isitmaintained.com/project/ercanserteli/condenser) |
| License | [![License](https://img.shields.io/github/license/welpo/shuku?style=flat-square)](https://github.com/welpo/shuku/blob/main/COPYING) | [![License](https://img.shields.io/github/license/Ajatt-Tools/impd?style=flat-square)](https://github.com/Ajatt-Tools/impd/blob/master/LICENSE) | [![License](https://img.shields.io/github/license/ercanserteli/condenser?style=flat-square)](https://github.com/ercanserteli/condenser/blob/master/LICENSE) |
| Stars | [![Stars](https://img.shields.io/github/stars/welpo/shuku?style=flat-square)](https://github.com/welpo/shuku/stargazers) | [![Stars](https://img.shields.io/github/stars/Ajatt-Tools/impd?style=flat-square)](https://github.com/Ajatt-Tools/impd/stargazers) | [![Stars](https://img.shields.io/github/stars/ercanserteli/condenser?style=flat-square)](https://github.com/ercanserteli/condenser/stargazers) |
| Contributors | [![Contributors](https://img.shields.io/github/contributors/welpo/shuku?style=flat-square)](https://github.com/welpo/shuku/graphs/contributors) | [![Contributors](https://img.shields.io/github/contributors/Ajatt-Tools/impd?style=flat-square)](https://github.com/Ajatt-Tools/impd/graphs/contributors) | [![Contributors](https://img.shields.io/github/contributors/ercanserteli/condenser?style=flat-square)](https://github.com/ercanserteli/condenser/graphs/contributors) |
| Last Commit | [![Last Commit](https://img.shields.io/github/last-commit/welpo/shuku?style=flat-square)](https://github.com/welpo/shuku/commits) | [![Last Commit](https://img.shields.io/github/last-commit/Ajatt-Tools/impd?style=flat-square)](https://github.com/Ajatt-Tools/impd/commits) | [![Last Commit](https://img.shields.io/github/last-commit/ercanserteli/condenser?style=flat-square)](https://github.com/ercanserteli/condenser/commits) |
| Language | [![Top Language](https://img.shields.io/github/languages/top/welpo/shuku?style=flat-square)](https://github.com/welpo/shuku/search?l=python) | [![Top Language](https://img.shields.io/github/languages/top/Ajatt-Tools/impd?style=flat-square)](https://github.com/Ajatt-Tools/impd/search?l=shell) | [![Top Language](https://img.shields.io/github/languages/top/ercanserteli/condenser?style=flat-square)](https://github.com/ercanserteli/condenser/search?l=python) |
| Code Coverage | [![Code Coverage](https://img.shields.io/codecov/c/gh/welpo/shuku?style=flat-square)](https://codecov.io/gh/welpo/shuku) | ![Code Coverage](https://img.shields.io/codecov/c/gh/Ajatt-Tools/impd?style=flat-square) | [![Code Coverage](https://img.shields.io/codecov/c/gh/ercanserteli/condenser?style=flat-square)](https://codecov.io/gh/ercanserteli/condenser) |> [!IMPORTANT]
> If you feel the comparisons are incomplete or unfair, please [file an issue](https://github.com/welpo/shuku/issues/new?&labels=bug&template=2_bug_report.yml) so this page can be improved. Even better, submit a pull request!## Installation
1. Make sure FFmpeg is installed on your system. You can download it from [ffmpeg.org](https://ffmpeg.org/download.html) or install it using your package manager:
```bash
# Debian, Linux Mint, Ubuntu…
sudo apt install ffmpeg# Arch Linux, Manjaro…
sudo pacman -S ffmpeg# macOS with brew (https://brew.sh/)
brew install ffmpeg
```For Windows, here's [a good guide](https://phoenixnap.com/kb/ffmpeg-windows).
2. Install shuku with [pipx](https://github.com/pypa/pipx) (recommended) or pip:
```bash
pipx install shuku
# or
pip install shuku
```Alternatively, download a [pre-built binary](https://github.com/welpo/shuku/releases).
## Quick start
If you run:
```bash
shuku video.mkv
````shuku` will create `video (condensed).ogg` containing only the dialog from the video, next to the original file.
For this to work, `video.mp4` must either have internal subtitles, or a matching subtitle file in the same directory.
## Configuration
To dump the default configuration, run:
```bash
shuku --init
```This will create a `shuku.toml` file in your system's default configuration directory:
- On Windows: `%APPDATA%\shuku\shuku.toml`
- On Unix-like systems (GNU+Linux, macOS):
- By default: `~/.config/shuku/shuku.toml`
- If `XDG_CONFIG_HOME` is set: `$XDG_CONFIG_HOME/shuku/shuku.toml`The configuration file is self-documented and should be easy to understand.
The options are split into four categories: general options, condensed audio options, condensed video options, and condensed subtitles options.
### General options
#### `loglevel`
Sets the verbosity of log messages. Only messages of the selected level or higher will be displayed.
Default: `'info'`
Choices: `debug`, `info`, `success`, `warning`, `error`, `critical`#### `clean_output_filename`
If `true`, removes quality indicators, release group tags, and other technical information from output filenames. For example, `[GROUP] Show.S01E01.1080p.x264-GROUP.mkv` becomes `Show S01E01.ext` (where `.ext` is `.ogg`, `.srt`, etc.).
Default: `true`.
#### `output_directory`
Specifies the directory where output files will be saved. If not set, outputs to the same directory as the input file.
#### `output_suffix`
The suffix added to output files.
Default: `' (condensed)'`
#### `if_file_exists`
What to do when output file exists. Can be:
- `'ask'`: Prompt user to overwrite, rename, or skip
- `'overwrite'`: Overwrite without prompting
- `'rename'`: Automatically rename with timestamp
- `'skip'`: Skip without promptingDefault: `'ask'`
#### `padding`
The number of seconds to add before and after each subtitle timing.
Default: `0.5`
#### `subtitle_directory`
Specifies a directory to search for external subtitle files. Overridden by the `--subtitles` command-line argument.
#### `audio_languages`
Specifies preferred languages for audio tracks, in order of preference.
Example: `['jpn', 'jp', 'ja', 'eng']`
#### `subtitle_languages`
Specifies preferred languages for subtitle tracks, in order of preference.
Example: `['jpn', 'jp', 'ja', 'eng']`
#### `external_subtitle_search`
Determines how external subtitles are matched to video files. `disabled` turns off external subtitle search, `exact` requires perfect filename matches, and `fuzzy` allows for inexact matches.
Default: `'fuzzy'`
#### `subtitle_match_threshold`
When using fuzzy matching, this sets the minimum similarity score for subtitle files. Lower values allow more lenient matching but risk false positives.
Default: `0.6`
#### `skip_chapters`
A list of chapter titles to skip when processing. Case-insensitive. Useful for skipping openings, previews, credits…
Default:
```toml
skip_chapters = ['avant', '1. opening credits', 'logos/opening credits', 'opening titles', 'opening', 'op', 'ending', 'ed', 'start credit', 'credits', 'end credits', 'end credit', 'closing credits', 'next episode', 'preview', 'avante', 'trailer']
```#### `line_skip_patterns`
Regular expression patterns for subtitle lines to skip. Useful for removing song lyrics, sound effects, etc. Use single quotes to enclose patterns.
Default:
```toml
line_skip_patterns = [
# Skip music.
'^(~|〜)?♪.*',
'^♬(~|〜)$',
'^♪?(~|〜)♪?$',
# Skip lines containing only '・~'
'^・(~|〜)$',
# Skip lines entirely enclosed in various types of brackets.
'^\\([^)]*\\)$', # Parentheses ()
'^([^)]*)$', # Full-width parentheses ()
'^\\[.*\\]$', # Square brackets []
'^\\{[^\\}]*\\}$', # Curly braces {}
'^<[^>]*>$', # Angle brackets <>
]
```### Condensed audio options
#### `enabled`
Whether to create condensed audio files.
Default: `true`
#### `audio_codec`
Audio codec for the condensed audio.
Default: `'libopus'`
Choices: `libmp3lame`, `aac`, `libopus`, `flac`, `pcm_s16le`, `copy`, `mp3`, `wav`, `opus`, `ogg`#### `audio_quality`
See [Audio quality settings](#audio-quality-settings) for details.
Default: `'48k'`
#### `custom_ffmpeg_args`
Extra FFmpeg arguments to use when creating the final audio.
Example:
```toml
custom_ffmpeg_args = {
"af" = 'loudnorm=I=-16:LRA=6:TP=-1,acompressor=threshold=-12dB:ratio=3:attack=200:release=1000'
}
```### Condensed video options
#### `enabled`
Whether to create condensed video files.
Default: `false`
#### `audio_codec`
Audio codec for the condensed video.
Default: `'copy'`
Choices: `libmp3lame`, `aac`, `libopus`, `flac`, `pcm_s16le`, `copy`, `mp3`, `wav`, `opus`, `ogg`#### `audio_quality`
See [Audio quality settings](#audio-quality-settings) for details.
Example: `'128k'`
#### `video_codec`
Video codec to use. `'copy'` copies the video stream without re-encoding.
Default: `'copy'`
#### `video_quality`
Video quality.
### Video quality settings
The `video_quality` setting depends on the chosen video codec:
#### `libx264` / `libx265` (H.264 / H.265)
Modern video codecs that offer excellent compression. H.265 generally achieves better compression than H.264 but may take longer to encode.
- **How to set quality**: Uses Constant Rate Factor (CRF). Specify a number between 0-51. Lower values = better quality & larger files. A change of ±6 roughly doubles/halves the file size.
- **Examples**:
- `'23'`: Default, good quality
- `'18'`: Very high quality, visually lossless
- `'28'`: Acceptable quality, smaller files#### `libvpx-vp9` (VP9)
An open video codec developed by Google, offering quality comparable to H.265.
- **How to set quality**: Uses CRF like H.264/H.265. Specify a number between 0-63. Lower values = better quality & larger files.
- **Examples**:
- `'31'`: Default balanced quality
- `'24'`: High quality
- `'35'`: More compression, smaller files#### Other codecs
For other video codecs, quality is set using bitrate.
- **How to set quality**: Specify the bitrate in bits per second with optional 'k' or 'M' suffix
- `'1000k'` or `'1M'` = 1 Mbps
- Lower values = lower quality & smaller files- **Examples**:
- `'1M'` or `'1000k'`: Medium quality
- `'2M'` or `'2000k'`: High quality
- `'500k'`: Lower quality, smaller file#### `copy`
Copies the video stream without re-encoding. Use this to maintain original quality and for fastest processing.
- **Quality Setting**: `video_quality` is ignored when using `copy`
#### `custom_ffmpeg_args`
Custom FFmpeg arguments for processing the final video.
Example:
```toml
custom_ffmpeg_args = { "preset" = 'faster', "crf" = '23', "threads" = '0', "tune" = 'film' }
```### Condensed subtitles options
#### `enabled`
Whether to create condensed subtitle files.
Default: `false`
#### `format`
Output format for condensed subtitles. `'auto'` matches the input format.
Default: `'auto'`
Choices: `auto`, `srt`, `ass`, `lrc`### Audio quality settings
The `audio_quality` setting depends on the chosen audio codec:
#### `libopus` (`ogg`, `opus`) (Default)
A modern audio codec known for high quality and efficiency at low bitrates. The best size/quality ratio. If your media player supports it, don't even think about it.
- **How to set quality**: Specify the desired bitrate either as a number representing bits per second (e.g., `48000` for 48 kbps) or using the 'k' suffix (e.g., `'48k'`, `'128k'`)."
- **Examples**:
- `'48k'` or `48000`: Good quality with small file size (default).
- `'128k'` or `128000`: Higher quality, larger file size.#### `libmp3lame` (`mp3`)
A widely supported format compatible with most devices.
- **How to set quality**:
- **Variable Bitrate (VBR)**: Use `'V'` followed by a number from `0` to `9` (e.g., `'V0'`, `'V5'`). Lower numbers mean better quality.
- **Constant Bitrate (CBR)**: Specify the bitrate in kilobits per second with `'k'` (e.g., `'128k'`, `'320k'`).- **Examples**:
- `'V3'`: Good balance between quality and file size.
- `'192k'`: High-quality CBR MP3.#### `aac`
Offers better sound quality than MP3 at similar bitrates.
- **How to set quality**:
- **Variable Bitrate (VBR)**: Use a single digit from `1` to `5` (e.g., `'2'`, `'5'`). Higher numbers mean better quality.
- **Bitrate**: Specify the bitrate in kilobits per second with `'k'` (e.g., `'128k'`).- **Examples**:
- `'2'`: Good quality with reasonable file size.
- `'128k'`: Standard quality.#### `flac`
A lossless codec that preserves original audio quality but results in larger files.
- **How to set quality**: Specify the compression level from `0` to `12`. Higher numbers provide better compression but take longer to encode.
- **Examples**:
- `5`: Default compression level, good balance.
- `12`: Maximum compression, slower encoding.#### `pcm_s16le` (`wav`)
An uncompressed audio format resulting in very large files. The only reason you might want to use it over FLAC is software compatibility.
- **Quality Setting**: `audio_quality` is ignored.
#### `copy`
Copies the audio stream without any re-encoding. Use this if you want to keep the original audio as-is.
- **Quality Setting**: `audio_quality` is ignored.
## Command line options
### `--init`
Create a default configuration file in the following locations depending on your operating system:
- **Windows**: `C:\Users\\AppData\Roaming\shuku\shuku.toml`
- **Unix-like systems (GNU+Linux, macOS)**: `~/.config/shuku/shuku.toml`### `-c , --config `
Path to a configuration file, or "none" to use the default configuration.
Default: `shuku.toml` in the user config directory (e.g., `~/.config/shuku/shuku.toml`).
### `-s , --subtitles `
Path to subtitle file or directory containing subtitle files to match to input videos.
### `-o , --output `
Path to the output directory. If not specified, the input file's directory will be used.
### `--audio-track-id `
ID of the audio track to use. You can use `ffprobe {file}` to list available tracks.
### `--sub-track-id `
ID of the subtitle track to use.
### `--sub-delay `
Delay subtitles by `` milliseconds. Can be negative.
### `-v {level}, --loglevel {level}`
Set the logging level. Choices: `debug`, `info`, `success`, `warning`, `error`, `critical`.
### `--log-file `
Logs will be written to this file in addition to the terminal.
### `-h, --help`
Show the help message and exit.
### `-V, --version`
Print the version number and exit.
## Examples
1. Process a single video file:
```bash
shuku video.mkv
```2. Process multiple files:
```bash
shuku video1.mp4 path/to/directory_with_videos/
```3. Use a specific subtitle file:
```bash
shuku video.mp4 -s subtitles.srt
```4. Set output directory and logging level:
```bash
shuku video.mkv -o ~/condensed -v debug
```5. Use a specific audio track and apply subtitle delay:
```bash
shuku video.mp4 --audio-track-id 2 --sub-delay 500
```6. Use the default configuration, show all logs and save them to a file:
```bash
shuku file.mkv -c none -v debug --log-file shuku.log
```## Support
Something not working? Have an idea? Let us know!
- Questions? → [Start a discussion](https://github.com/welpo/shuku/discussions)
- Found a bug? → [Report it here](https://github.com/welpo/shuku/issues/new?&labels=bug&template=2_bug_report.yml)
- Feature request? → [Tell us more!](https://github.com/welpo/shuku/issues/new?&labels=feature&template=3_feature_request.yml)## Contributing
Please do! We appreciate bug reports, feature or documentation improvements (however minor), feature requests…
Take a look at the [Contributing Guidelines](/CONTRIBUTING.md) to learn more.
## License
`shuku` is free software: you can redistribute it and/or modify it under the terms of the [GNU General Public License as published by the Free Software Foundation](./COPYING), either version 3 of the License, or (at your option) any later version.