{"id":13585436,"url":"https://github.com/zmwangx/caterpillar","last_synced_at":"2025-07-15T21:39:14.475Z","repository":{"id":57416983,"uuid":"115477120","full_name":"zmwangx/caterpillar","owner":"zmwangx","description":"Hardened HLS merger","archived":false,"fork":false,"pushed_at":"2020-11-16T03:03:01.000Z","size":340,"stargazers_count":51,"open_issues_count":3,"forks_count":9,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-12T06:04:15.103Z","etag":null,"topics":["ffmpeg","hls","http-live-streaming","m3u8"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zmwangx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-12-27T03:24:44.000Z","updated_at":"2024-07-25T11:27:08.000Z","dependencies_parsed_at":"2022-09-11T14:01:19.995Z","dependency_job_id":null,"html_url":"https://github.com/zmwangx/caterpillar","commit_stats":null,"previous_names":[],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zmwangx%2Fcaterpillar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zmwangx%2Fcaterpillar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zmwangx%2Fcaterpillar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zmwangx%2Fcaterpillar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zmwangx","download_url":"https://codeload.github.com/zmwangx/caterpillar/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248525144,"owners_count":21118618,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ffmpeg","hls","http-live-streaming","m3u8"],"created_at":"2024-08-01T15:04:56.430Z","updated_at":"2025-04-12T06:05:11.331Z","avatar_url":"https://github.com/zmwangx.png","language":"Python","funding_links":[],"categories":["HarmonyOS","Python"],"sub_categories":["Windows Manager"],"readme":"\u003ch1 align=\"center\"\u003e\n  Caterpillar\u003cbr\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/zmwangx/caterpillar/master/assets/logo.png\" alt=\"Caterpillar\" width=\"691\" height=\"112\"\u003e\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.python.org/pypi/caterpillar-hls\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/caterpillar-hls.svg?maxAge=3600\" alt=\"pypi\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/python-3.6,%203.7-blue.svg?maxAge=86400\" alt=\"python: 3.6, 3.7\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-MIT-blue.svg?maxAge=86400\" alt=\"license: MIT\"\u003e\n  \u003ca href=\"https://github.com/zmwangx/caterpillar/actions\"\u003e\u003cimg src=\"https://github.com/zmwangx/caterpillar/workflows/test/badge.svg?branch=master\" alt=\"Build Status\" /\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\n`caterpillar` is a hardened HLS merger. It takes an HTTP Live Streaming VOD URL (typically an .m3u8 URL), downloads the video segments, and attempts to merge them into a single, coherent file. It is specially designed to combat timestamp discontinuities (symptom: a naive FFmpeg run spews tons of \"Non-monotonous DTS in output stream\" warning messages and ends up with a useless file with completely broken timestamps).\n\n`caterpillar` supports [up to version 3](https://tools.ietf.org/html/rfc8216#section-7) of the HTTP Live Streaming protocol (VOD only; non-VOD playlists are treated as VOD, and may result in unexpected consequences).\n\n## Contents\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n\n\n- [Dependencies](#dependencies)\n- [Installation](#installation)\n  - [For end users](#for-end-users)\n  - [For developers and beta testers](#for-developers-and-beta-testers)\n  - [For application developers](#for-application-developers)\n- [Usage](#usage)\n- [Batch mode](#batch-mode)\n- [Configuration](#configuration)\n- [Notes and limitations](#notes-and-limitations)\n- [Etymology](#etymology)\n- [Copyright](#copyright)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\n## Dependencies\n\nA recent version of [FFmpeg](https://ffmpeg.org/download.html). FFmpeg 3.3.4 is known to work with caterpillar; FFmpeg 3.2.4 is known to NOT work.\n\n## Installation\n\nPython 3.6 or later is required.\n\n*If in doubt, check out the detailed \"[Installation Guide for Novices](https://github.com/zmwangx/caterpillar/wiki/Installation-Guide-for-Novices)\".*\n\n### For end users\n\nTo install,\n\n```\npip install caterpillar-hls\n```\n\nTo upgrade to the latest version,\n\n```\npip install -U caterpillar-hls\n```\n\n### For developers and beta testers\n\nTo install from the master branch,\n\n```\ngit clone https://github.com/zmwangx/caterpillar.git\ncd caterpillar\npython setup.py develop\ncaterpillar -h\n```\n\nTo update to the latest master,\n\n```\ncd /path/to/caterpillar\ngit pull origin master\n```\n\n### For application developers\n\nShort of calling `caterpillar.caterpillar.main` with `sys.argv` set appropriately, you can access caterpillar's functionality through `caterpillar.caterpillar.process_entry` and `caterpillar.caterpillar.process_batch`. Warning: there's no stability guarantee to these interfaces, although I won't break compatibility without a very compelling reason.\n\n`process_entry` and `process_batch` additionally support event hooks (a feature not exposed to end users). See [`caterpillar.caterpillar.events`](https://github.com/zmwangx/caterpillar/blob/master/src/caterpillar/events.py) for types of events emitted and associated data attributes.\n\n## Usage\n\n```console\n$ caterpillar -h\nusage: caterpillar [-h] [-b] [-e] [-f] [-j JOBS] [-k]\n                   [-m {concat_demuxer,concat_protocol,0,1}] [-r RETRIES]\n                   [--remove-manifest-on-success] [--workdir WORKDIR]\n                   [--workroot WORKROOT] [--wipe] [-v] [--progress]\n                   [--no-progress] [-q] [--debug] [-V]\n                   m3u8_url [output]\n\npositional arguments:\n  m3u8_url              the VOD URL, or the batch mode manifest file\n  output                path to the final output file (default is a .ts file\n                        in the current directory with the basename of the VOD\n                        URL)\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -b, --batch           run in batch mode (see the \"Batch Mode\" section in\n                        docs)\n  -e, --exist-ok        skip existing targets (only works in batch mode)\n  -f, --force           overwrite the output file if it already exists\n  -j JOBS, --jobs JOBS  maximum number of concurrent downloads (default is\n                        twice the number of CPU cores, including virtual\n                        cores)\n  -k, --keep            keep intermediate files even after a successful merge\n  -m {concat_demuxer,concat_protocol,0,1}, --concat-method {concat_demuxer,concat_protocol,0,1}\n                        method for concatenating intermediate files (default\n                        is 'concat_demuxer'); see\n                        https://github.com/zmwangx/caterpillar/#notes-and-limitations\n                        for details\n  -r RETRIES, --retries RETRIES\n                        number of times to retry when a possibly recoverable\n                        error (e.g. download issue) occurs; default is 2, and\n                        0 turns off retries\n  --remove-manifest-on-success\n                        remove manifest file if all downloads are successful\n                        (only works in batch mode)\n  --workdir WORKDIR     working directory to store downloaded segments and\n                        other intermediate files (default is automatically\n                        determined based on URL and output file)\n  --workroot WORKROOT   if nonempty, this path is used as the root directory\n                        for all processing, under which both the working\n                        directory and final destination are mapped; after\n                        merging is done, the artifact is eventually moved to\n                        the destination (use cases: destination on a slow HDD\n                        with workroot on a fast SSD; destination on a\n                        networked drive with workroot on a local drive)\n  --wipe                wipe all downloaded files (if any) and start over\n  -v, --verbose         increase logging verbosity (can be specified multiple\n                        times)\n  --progress            show download progress bar regardless of verbosity\n                        level\n  --no-progress         suppress download progress bar regardless of verbosity\n                        level\n  -q, --quiet           decrease logging verbosity (can be specified multiple\n                        times)\n  --debug               output debugging information (also implies highest\n                        verbosity)\n  -V, --version         show program's version number and exit\n\nenvironment variables:\n  CATERPILLAR_USER_CONFIG_DIR\n                        custom directory for caterpillar.conf\n  CATERPILLAR_USER_DATA_DIR\n                        custom directory for certain data cached by\n                        caterpillar\n  CATERPILLAR_NO_USER_CONFIG\n                        when set to a non-empty value, do not load\n                        options from user config file\n  CATERPILLAR_NO_CACHE  when set to a non-empty value, do not read or\n                        write caterpillar's cache\n\nconfiguration file:\n  \u003can operating system and user-dependent path\u003e\n\n```\n\nSee the [wiki page](https://github.com/zmwangx/caterpillar/wiki/Usage-Examples) for usage examples.\n\n## Batch mode\n\nIn normal mode, `caterpillar` deals with only one stream. There is also a batch mode for downloading multiple streams at once. In this mode, you specify a manifest file on the command line in the place of the VOD URL, where the manifest file contains a VOD URL and a filename (or path) seperated by a tab on each line, e.g., `caterpillar manifest.txt`, where `manifest.txt` contains\n\n```\nhttps://example.com/hls/1.m3u8\t1.mp4\nhttps://example.com/hls/2.m3u8\t2.mp4\nhttps://example.com/hls/3.m3u8\t3.mp4\n```\n\nThe filenames (or paths) are relative to the parent directory of the manifest file. The tab character is not allowed in the filenames (or paths).\n\nComments that start with `#` are allowed in the manifest file.\n\nMost options for normal mode are also allowed in the batch mode, as are options set in the configuration file.\n\n## Configuration\n\nTo save some retyping, `caterpillar` supports the configuration of default options in an operating system and user-dependent configuration file. The path is usually `~/Library/Application Support/caterpillar/caterpillar.conf` on macOS, `%AppData%\\org.zhimingwang\\caterpillar\\caterpillar.conf` on Windows, and `~/.config/caterpillar/caterpillar.conf` on Linux. Run `caterpillar -h` to view the actual path.\n\nThe syntax of the configuration file is documented in the template (automatically created for you if possible), duplicated below:\n\n```\n# You may configure default options here so that you don't need to\n# specify the same options on the command line every time.\n#\n# Each option, along with its argument (if any), should be on a separate\n# line; unlike on the command line, you don't need to quote or escape\n# whitespace or other special characters in an argument, e.g., a line\n#\n#     --workdir Temporary Directory\n#\n# is interpreted as two command line arguments \"--workdir\" and\n# \"Temporary Directory\".\n#\n# Positional arguments are not allowed, i.e., option lines must begin\n# with -.\n#\n# Blank lines and lines starting with a pound (#) are ignored.\n#\n# You can always override the default options here on the command line.\n#\n# Examples:\n#\n#     --jobs 32\n#     --concat-method concat_protocol\n```\n\n## Notes and limitations\n\n- [`EXT-X-MEDIA`](https://tools.ietf.org/html/rfc8216#section-4.3.4.1) for alternative renditions are not supported since multiple playlists need to be downloaded, merged, and muxed, which is beyond the current scope of this tool.\n\n- [`EXT-X-STREAM-INF`](https://tools.ietf.org/html/rfc8216#section-4.3.4.2) variant streams are only partially supported: when two or more variant streams are present, the best one (selected based on higher resolution, higher average bandwidth, and higher bandwidth, in that order) is automatically selected; there is no mechanism for users to select another variant at the moment. In addition, `AUDIO`, `VIDEO`, `SUBTITLES` or `CLOSED-CAPTIONS` attributes referencing `EXT-X-MEDIA` tags are not supported.\n\n  Efforts could be made to extract the variant streams and show them to the user, and it is even feasible to proceed with the download if only one variant stream is present. Contribution is welcome for this feature.\n\n- **A note on `-m, --concat-method`**: The final step of `caterpillar` is concatenating one or more parts (generated from splitted playlists with FFmpeg's `hls` demuxer) into a single output file. In this step we provide two methods of choice: the [concat demuxer](https://ffmpeg.org/ffmpeg-all.html#concat-1) and the [concat protocol](https://ffmpeg.org/ffmpeg-all.html#concat-2) (the former is the default). To pick the non-default `concat_protocol`, specify `--concat-method concat_protocol` on the command line, or as a shortcut, `-m 1` (`0` is an alias for `concat_demuxer`, and `1` is an alias for `concat_protocol`).\n\n  Each of these two methods may work better in certain cases. For instance, for [this stream](http://ts.snh48.com/chaoqing/8001/20171201185235-playlist.m3u8?beginTime=20171201205500\u0026endTime=20171201210500), the concat demuxer simply fails with loads of error messages like \"Application provided duration: 7980637472 / timestamp: 7994129672 is out of range for mov/mp4 format\", whereas the concat protocol works fine. However, for [this stream](http://live.us.sinaimg.cn/000XDYqUjx07gRaRHSCz070d010002TZ0k01.m3u8), the concat protocol dumps a bunch of worrisome warnings like \"DTS out of order\" or \"Non-monotonous DTS in output stream\", whereas the concat demuxer doesn't.\n\n  *In short, if the default fails you (either doesn't work outright, or the generated video is bad in some way), try `-m 1`.*\n\n- \u003ca id=\"encoding\"\u003e**Encoding-related issues.**\u003c/a\u003e `caterpillar`, since version 1.0, reads and writes every on-disk file in UTF-8 encoding, period. In particular, user config file and batch mode manifests need to be in UTF-8. Legacy config file might need to be migrated to UTF-8.\n\n  If you're unfamiliar with the concept of character encodings, you may find the classic [*Joel on Software* article on charsets and Unicode](https://goo.gl/kSwEYj) informative (gosh, it's fifteen years old now). On Unix-like systems, [`uchardet`](https://www.freedesktop.org/wiki/Software/uchardet/) is my go-to tool for encoding detection; it's not readily packaged on Windows, and it's not officially supported and a royal pain to compile by yourself (to a non-Windows developer like me, and probably to whomever is reading this note). I don't know a good alternative on Windows, but I have a small web app that does encoding detection entirely in the browser: \u003chttps://chardet.tcl.sh/\u003e.\n\n  Converting a file to UTF-8, once you know its contents, is actually pretty easy. Any text editor worth its salt (and still afloat) should be able to save as UTF-8; heck, even Windows Notepad can do this (it even appears to be the default).\n\n  This note is mostly only relevant to Windows. I can't overstate how I absolutely fucking detest dealing with encodings surrounding Microsoft products. If you ever saw �'s on web pages in places where there should be quotation marks — that's Latin-1, or Windows-1252, being decoded as UTF-8. That's just the tip of the iceberg. If you ever deal with CJK, or another alphabet other than Latin, oh baby, you're in for a world of pain. I can't count the number of times I opened a file or unzipped an archive from a Windows user, only to find garbage filenames or garbage content or both, and had to guess the encoding to restore the meaning. Also, meet Windows cmd or PowerShell; you have to fucking `chcp 65001` just to work with Unicode, and even then for whatever fucking reason I still get ??? for Chinese characters in input commands – oh and I use Cmder, which allows a secondary font for CJK, an infinite improvement over stock cmd or PowerShell already, but still no luck. You may also want to meet the latest and greatest Office for Mac, where you still can't import a UTF-8-encoded CSV file with non-ASCII characters (I know a workaround, for fuck's sake). Not sure about the Windows version, and not enthusiastic enough to find out. (In all fairness, Apple's HFS+ with its infamous NFD-based UTF-8-MAC encoding in filenames was hell, but other than that Mac OS X and its successors have been pretty good encoding-wise; and APFS is finally rid of the pseudo-NFD garbage — it might have had its own problems at first, but I've yet to run into any. HFS+ encoding problems are peanuts compared to Microsoft's encoding problems, by the way. I'm not even bringing fonts into the mix.)\n\n  In addition, if you ever have Chinese filenames in the English version of Windows — gosh, they're so ugly they almost bring tears to my eyes. [I had to swap the entire fucking system UI font via a registry hack](https://gist.github.com/zmwangx/0d638102c04818f84176dc83096fb46e) to keep tears at bay. Last I heard there are plenty of Chinese folks working in Redmond — hell, I even have acquitances there. How they managed to tolerate this shit for so many years is beyond me.\n\n  Pretty sure I can tell you a lot more if I spent more time with Microsoft products (which is close to zero at the moment).\n\n  Windows is more than thirty years old now, and the UN is more than seventy years old. Yet Redmond still hasn't got the memo on globalization.\n\n  This dissolved into another Microsoft rant. Oh well.\n\n## Etymology\n\nThe word \"caterpillar\" starts with `cat(1)`, and the body of a caterpillar is segmented.\n\n## Copyright\n\nCopyright © 2017 Zhiming Wang\n\nThis project is licensed under [the MIT license](https://opensource.org/licenses/MIT). See `COPYING` for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzmwangx%2Fcaterpillar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzmwangx%2Fcaterpillar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzmwangx%2Fcaterpillar/lists"}