{"id":13907382,"url":"https://github.com/dukebw/lintel","last_synced_at":"2025-09-03T00:41:05.857Z","repository":{"id":41142813,"uuid":"126611816","full_name":"dukebw/lintel","owner":"dukebw","description":"A Python module to decode video frames directly, using the FFmpeg C API.","archived":false,"fork":false,"pushed_at":"2019-04-06T18:09:04.000Z","size":75,"stargazers_count":262,"open_issues_count":18,"forks_count":38,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-24T10:11:11.089Z","etag":null,"topics":["action-recognition","ffmpeg","python","video-processing"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dukebw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-24T15:16:39.000Z","updated_at":"2025-03-13T21:08:04.000Z","dependencies_parsed_at":"2022-09-09T21:20:56.966Z","dependency_job_id":null,"html_url":"https://github.com/dukebw/lintel","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dukebw%2Flintel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dukebw%2Flintel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dukebw%2Flintel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dukebw%2Flintel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dukebw","download_url":"https://codeload.github.com/dukebw/lintel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248667159,"owners_count":21142383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["action-recognition","ffmpeg","python","video-processing"],"created_at":"2024-08-06T23:01:54.838Z","updated_at":"2025-04-13T05:26:18.734Z","avatar_url":"https://github.com/dukebw.png","language":"C","funding_links":[],"categories":["HarmonyOS"],"sub_categories":["Windows Manager"],"readme":"# lintel\n[![Anaconda badge](https://anaconda.org/conda-forge/lintel/badges/version.svg)](https://anaconda.org/conda-forge/lintel)\n\nLintel is a Python module that can be used to decode videos, and return a byte\narray of all of the frames in the video, using the FFmpeg C interface directly.\n\nLintel was created for the purpose of developing machine learning algorithms\nusing video datasets such as the\n[NTU RGB+D](http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp),\n[Kinetics](https://deepmind.com/research/open-source/open-source-datasets/kinetics/)\nand [Charades](http://allenai.org/plato/charades/) action recognition datasets.\n\nThe foremost advantages of Lintel are:\n\n1. Lintel provides a simple and fast Python interface to video decoding that\n   can be dropped into existing machine learning training scripts.\n\n2. By decoding video on the fly in a data processing pipeline, input pipelines\n   can be made to circumvent an I/O bottleneck that would become a problem if\n   data were stored as frames in an encoded image format such as JPEG.\n\n3. Decoding videos on the fly using the FFmpeg C API provides a high degree of\n   control over the input. For example, video can be decoded at dynamic\n   framerates, with no loss in efficiency.\n\n   An example of this control is in the implementation of `loadvid_frame_nums`,\n   where a list of frame indices is passed to indicate which specific frames\n   are to be decoded.\n\n4. By using the FFmpeg C API directly, as opposed to piping input to the\n   `ffmpeg` command line tool, a number of issues and complications surrounding\n   interfacing the `ffmpeg` command line tool from a performance-intensive\n   machine learning application are completely avoided.\n\n\n# Pre-requisites\n\nPython 3 is required. To use Python 2, I believe a (potentially small) amount\nof code would have to be changed in `lintel/py_ext/lintelmodule.c`.\n\nA version of FFmpeg that supports the `avcodec_send_packet()` and\n`avcodec_receive_frame()` API for decoding video is required. For example,\nFFmpeg version 3.3.6 should work fine, as should downloading and installing the\nlatest development version of FFmpeg.\n\nIf the version of FFmpeg distributed with your system is too old, see the\nInstalling FFmpeg from Source section below to install a newer version.\n\n\n# Installation\n\n\n## From Source\n\nRun the following to install a locally editable version of the library, with pip:\n\n`pip3 install --editable . --user`\n\n\n## Conda\n\nRun: `conda install -c conda-forge lintel`.\n\nOnly Mac and Linux are supported.\n\n\n# Testing Lintel\n\n1. After installing, run:\n\n   `lintel_test --filename \u003cvideo-filename\u003e --width \u003cwidth\u003e --height \u003cheight\u003e`\n\n   Pass criteria: decoded frames from the video should show up without\n   distortion, decoding each clip in \u003c 500ms.\n\n2. Run:\n\n   `lintel_test --filename \u003cvideo-filename\u003e --width \u003cwidth\u003e --height \u003cheight\u003e --frame-nums --should-seek`\n\n   to test the frame number API.\n\nPassing `--width 0 --height 0` will test the dynamic resizing.\n\n\n# Usage in a data processing pipeline\n\nThe `lintel.loadvid` interface can be used in a Python input pipeline as\nfollows:\n\n```python\ndef _sample_frame_sequence_to_4darray(video, dataset, should_random_seek, fps_cap):\n    \"\"\"Called to extract a frame sequence `dataset.num_frames` long, sampled\n    uniformly from inside `video`, to a 4D numpy array.\n.\n    Args:\n        video: Encoded video.\n        dataset: Dataset meta-info, e.g., width and height.\n        should_random_seek: If set to `True`, then `lintel.loadvid` will start\n            decoding from a uniformly random seek point in the video (with\n            enough space to decode the requested number of frames).\n\n            The seek distance will be returned, so that if the label of the\n            data depends on the timestamp, then the label can be dynamically\n            set.\n        fps_cap: The _maximum_ framerate that will be captured from the video.\n            Excess frames will be dropped, i.e., if `fps_cap` is 30 for a video\n            with a 60 fps framerate, every other frame will be dropped.\n\n    Returns:\n        A tuple (frames, seek_distance) where `frames` is a 4-D numpy array\n        loaded from the byte array returned by `lintel.loadvid`, and\n        `seek_distance` is the number of seconds into `video` that decoding\n        started from.\n\n    Note that the random seeking can be turned off.\n\n    Use _sample_frame_sequence_to_4darray in your PyTorch Dataset object, which\n    subclasses torch.utils.data.Dataset. Call _sample_frame_sequence_to_4darray\n    in __getitem__. This means that for every minibatch, for each example, a\n    random keyframe in the video is seeked to and num_frames frames are decoded\n    from there. num_frames would normally tend to be small (if you were going\n    to use them as input to a 3D ConvNet or optical flow algorithm), e.g., 32\n    frames.\n    \"\"\"\n    video, seek_distance = lintel.loadvid(\n        video,\n        should_random_seek=should_random_seek,\n        width=dataset.width,\n        height=dataset.height,\n        num_frames=dataset.num_frames,\n        fps_cap=fps_cap)\n    video = np.frombuffer(video, dtype=np.uint8)\n    video = np.reshape(\n        video, newshape=(dataset.num_frames, dataset.height, dataset.width, 3))\n\n    return video, seek_distance\n```\n\nThe `lintel.loadvid_frame_nums` API can be used similarly:\n\n```python\ndef _load_frame_nums_to_4darray(video, dataset, frame_nums):\n    \"\"\"Decodes a specific set of frames from `video` to a 4D numpy array.\n    \n    Args:\n        video: Encoded video.\n        dataset: Dataset meta-info, e.g., width and height.\n        frame_nums: Indices of specific frame indices to decode, e.g.,\n            [1, 10, 30, 35] will return four frames: the first, 10th, 30th and\n            35 frames in `video`. Indices must be in strictly increasing order.\n\n    Returns:\n        A numpy array, loaded from the byte array returned by\n        `lintel.loadvid_frame_nums`, containing the specified frames, decoded.\n    \"\"\"\n    decoded_frames = lintel.loadvid_frame_nums(video,\n                                               frame_nums=frame_nums,\n                                               width=dataset.width,\n                                               height=dataset.height)\n    decoded_frames = np.frombuffer(decoded_frames, dtype=np.uint8)\n    decoded_frames = np.reshape(\n        decoded_frames,\n        newshape=(dataset.num_frames, dataset.height, dataset.width, 3))\n\n    return decoded_frames\n```\n\nBoth APIs can be used without passing a width and height, in which case the\nwidth and height of the video will be determined by `libavcodec` and returned\nin the result tuple.\n\n```python\ndecoded_frames, width, height = lintel.loadvid_frame_nums(\n    video, frame_nums=frame_nums)\n\nvideo, width, height, seek_distance = lintel.loadvid(\n    video,\n    should_random_seek=should_random_seek,\n    num_frames=dataset.num_frames,\n    fps_cap=fps_cap)\n```\n\n\n# Installing FFmpeg from Source\n\nIt may be necessary to compile FFmpeg from source, e.g. if there is no way to\nget the development FFmpeg files from the package manager. To do so, nasm, x264\nand FFmpeg must all be installed.\n\nNote in the following I assume that you have created a directory `$HOME/.local`\nfor local installations, and that `$HOME/.local/include`, `$HOME/.local/bin`\nand `$HOME/.local/lib` are in your `CPATH`, `PATH` and `LD_LIBRARY_PATH` (as\nwell as `LIBRARY_PATH`) environment variables, respectively.\n\n1. Download and install nasm:\n\n```\nwget http://www.nasm.us/pub/nasm/releasebuilds/2.13.01/nasm-2.13.01.tar.bz2\n\ntar xvjf nasm-2.13.01.tar.bz2 \u0026\u0026 cd nasm-2.13.01\n\n./configure --prefix=$HOME/.local/\n\nmake -j$(nproc) \u0026\u0026 make install\n```\n\n2. Download and install x264:\n\n```\ngit clone git://git.videolan.org/x264.git \u0026\u0026 cd x264\n\n./configure --enable-static --enable-shared --prefix=$HOME/.local\n\nmake -j$(nproc) \u0026\u0026 make install\n```\n\n3. Download and install FFmpeg:\n\n```\ngit clone https://github.com/FFmpeg/FFmpeg.git \u0026\u0026 cd FFmpeg\n\n./configure --enable-shared --enable-gpl --enable-libx264 --enable-pic --enable-runtime-cpudetect --cc=\"gcc -fPIC\" --prefix=$HOME/.local\n\nmake -j$(nproc) \u0026\u0026 make install\n```\n\n\n## Installation Debugging\n\nThe following error:\n\n`ImportError: \u003clintel-path\u003e/_lintel.cpython-36m-x86_64-linux-gnu.so: undefined symbol: avcodec_receive_frame`\n\ncan be debugged as follows.\n\nOne way to see what shared objects a binary is linking to is using ldd:\n`LD_DEBUG=libs ldd \u003cbinary-name\u003e`.\n\nE.g.,\n\n`LD_DEBUG=libs ldd \u003clintel-path\u003e/_lintel.cpython-36m-x86_64-linux-gnu.so`\n\nIt should spit out a bunch of information, including a line like this:\nlibavcodec.so.57 =\u003e /export/mlrg/bduke/.local/lib/libavcodec.so.57\n(0x00007ff4b997a000). This libavcodec.so.57 =\u003e line should point to the new\nlibavcodec.so that you compiled and installed.\n\nIt is possible that this issue may occur if `LIBRARY_PATH` (different from\n`LD_LIBRARY_PATH`) is not set during compile time of lintel. `LIBRARY_PATH`\nshould also point to wherever libavcodec.so lives, the same place as\n`LD_LIBRARY_PATH`, but `LIBRARY_PATH` is used at compile time instead of\nruntime (i.e., be sure that `LIBRARY_PATH` includes a directory with your new\nlibavcodec.so in it when you run `pip install` on lintel). I suspect that the\nlibavcodec.so.57 symbol name is baked into the lintel CPython shared object at\ncompile time.\n\n\n# Citing\n\nIf you find Lintel useful for an academic publication, then please use the\nfollowing BibTeX to cite it:\n\n```\n@misc{lintel,\n  author = {Duke, Brendan},\n  title = {Lintel: Python Video Decoding},\n  year = {2018},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/dukebw/lintel}},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdukebw%2Flintel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdukebw%2Flintel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdukebw%2Flintel/lists"}