https://github.com/kingsdigitallab/framesense
A modular command-line tool to process video collections.
https://github.com/kingsdigitallab/framesense
command-line-tool reusable video-processing
Last synced: 13 days ago
JSON representation
A modular command-line tool to process video collections.
- Host: GitHub
- URL: https://github.com/kingsdigitallab/framesense
- Owner: kingsdigitallab
- License: mit
- Created: 2025-06-17T13:39:08.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2026-05-11T17:03:26.000Z (about 1 month ago)
- Last Synced: 2026-05-11T19:08:09.579Z (about 1 month ago)
- Topics: command-line-tool, reusable, video-processing
- Language: Python
- Homepage:
- Size: 88.9 MB
- Stars: 10
- Watchers: 2
- Forks: 2
- Open Issues: 19
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codemeta: codemeta.json
Awesome Lists containing this project
README
# 🎞️FrameSense
FrameSense is a highly modular command line tool designed to process your video collections.
**Current focus**: consolidation phase aiming to improve stability, performance and architecture.
## Requirements
* python 3.10+
* Docker or Singularity
## Initial set Up
1. Clone this repository anywhere on your system;
2. Copy [`docs/collections.json`](docs/collections.json) to the folder containing your video collections and adapt its contents to your needs. All paths `collections.json` are relative to that file.
3. Copy [`docs/.env`](docs/.env) in the folder that contains `framesense.py` and adapt the values to your environment. `FRAMESENSE_COLLECTIONS` should be the absolute path to your copy of `collections.json`.
4. Create the python virtual environment: `python3 -m venv venv`
5. Activate the virtual environment: `source venv/bin/activate`
6. Install the required packages: `pip install -r requirements.txt`
## Concepts
In FrameSense your collection is broken down into a hierarchy of smaller units. A `Collection` contains `videos`, which are made of `Clips`. Each Clip is a sequence of `Shots`. And each Shot is composed of a series of still `Frames`.
Each operation provided by FrameSense works on a particular unit in that hierarchy.
A video can be manually annotated in an `annotation file` that contains a list of annotations. An `annotation` describes and locates a clip within the video.
## Expected folder structure of your collections
* COLLECTIONS
* collections.json*
* COLLECTION1*
* VIDEO1*
* VIDEO1.mp4
* CLIP1
* CLIP1.mp4
* shots
* SHOT_INDEX # three digits, zero-padded
* shot.mp4
* 01-XXX.jpg # first frame
* 02-XXX.jpg # middle frame
* 03-XXX.jpg # last frame
* frames.json
* ANNOTATIONS1
* VIDEO1.json
In the above tree, a file or folder name in capital can be name whichever way you like.
File or folder with an asterisk are mandatory. Names in lowercase are predefined, you can't change them.
Initially each video folder must have either a video file (e.g. godfather/godfather.mp4) or at least one clip file (e.g. godfather/godfather/baptism/baptism.mp4).

## Usage
The tool offers a command-line interface to a series of modular operators acting on your video collections.
To see the list of available operators:
`python framesense.py operators`
For instance, to see the collections and the videos they contain:
`python framesense.py collections -v`
### arguments
* `-f FILTER` : only input files which path contain the FILTER string (case insensitive) will be processed by the operator
* `-r` : forces the operator to **redo** the operation, even if the output already exists. **USE WITH CAUTION** as it can destroy outputs from previous operations
* `-v` : **verbose** mode, print more stuff
* `--dry-run` : does not make any change on disk. A way of testing an operation's scope before running it
The operator will return an error if you try to use an argument that it doesn't support (yet).
## Architectural principles
* **Modular**: open architecture so each operator has its own module and containerised dependencies; easy to extend or swap operations;
* **Incremental**: an operation builds on top of outputs from other operations, enabling caching, reuse of intermediate results and custom pipelines;
* **HPC-friendly**: non-interactive command line tool which is easy to install and run on SLURM;
* **Portable**: should be easy to run on different machines, including lower-end personal computers
## Operators
FrameSense comes with a battery of built-in operators.
For a complete and up to date list, please use `python framesense.py operators`.
Check the README.md under each operator folder for a specification card. (Work in progress)
### Built-in operators
#### Information
* **operators**:
List all available operators
* **annotations**:
List all annotation files
* **collections**:
List all collections
#### Segmentation
* **[make_clips_ffmpeg](operators/make_clips_ffmpeg/)**:
Extract clips from videos based on timecodes in annotation files
* **[make_shots_scenedetect](operators/make_shots_scenedetect)**:
Extract shots from clips using PySceneDetect
* **[make_frames_ffmpeg](operators/make_frames_ffmpeg)**:
Extract frames from shots using ffmpeg
#### Detection
* **[scale_frames_sssabet](operators/scale_frames_sssabet)**:
Shot scale classification from frames based on https://github.com/sssabet/Shot_Type_Classification
#### Sound
* **[extract_sound_ffmpeg](operators/extract_sound_ffmpeg/)**:
Extract clip audio channel into a sound file
#### Transcoding
* **[transcode_clips_ffmpeg](operators/transcode_clips_ffmpeg/)**:
Convert a clip from one format to another
#### Question answering
* **[answer_videos_vlm](operators/answer_videos_vlm/)**:
Answer questions about a video file using a video/vision language model
* **[answer_transcription_ollama](operators/answer_transcription_ollama/)**:
Answer questions about a clip transcription using a large language model
* **[answer_frames_vlm](operators/answer_frames_vlm/)**:
Answer questions about a frame using a vision language model
### Design principles
It is expected that each operator:
* is functionally minimal ("does one thing and does it well");
* is atomic (only valid and complete output are persisted);
* implements a single method or strategy;
* should only process the input if its output doesn't already exist;
* uses containers to isolate its software dependencies;
* works on all files at one specific level in the hierarchy (e.g. make_shots splits all your clips into shots);
* has a name which reflects what it does, on what unit, with which method (e.g. make_clips_ffmpeg);
* is written as a Python class within a module `operator.py` under a package which name matches the name of the operator (e.g. `operators/make_clips_ffmpeg/operator.py`);
* inherits from the [base operator](operators/base/operator.py);
### Containers
Any software dependency needed for an operator to run
should be packaged into a container image.
FrameSense supports Docker and Singularity container engines.
**The first time an operator is executed the container image
will be built** from its Dockerfile.
After that, the container image will only be rebuilt
if the Dockerfile has changed.
If the operator folder contains an `./app` directory,
it will be mounted to `/app` within the container.
#### Singularity
When using Singularity (instead of Docker),
the Dockerfile is converted to a .def file
before the conainer image is built.
All singularity definition and image files
are generated and stored under the `./singularity` folder.
It is possible to build the images on one machine
and copy the `singularity` folder across to another
so they are ready to be used.
The build process will be offloaded to
the remote Singularity endpoint
you are logged into.
If you are not logged into a remote endpoint
the build process uses the `--fakeroot` method.
But this method requires privileges
only available on a personal computer.
On a HPC environment with Singularity
you can either log into a remote endpoint
or copy the `singularity` folder from another machine.
If you are not logged and `--fakeroot` is not allowed,
you may encounter this error:
`FATAL: fakeroot requires to set 'allow setuid = yes' in /etc/singularity/singularity.conf`
## Features and Bugs
Please use [github issue tracker](https://github.com/kingsdigitallab/framesense/issues) to report bugs or request new features.
This tool is currently being developed primarily to serve the needs of the [ISSA research project](https://github.com/kingsdigitallab/issa).
Tickets related to ISSA will therefore take priority.
Unrelated tickets are welcome but we can't guarantee that they will be addressed promptly or at all
until FrameSense receives more dedicated support (external contributors or additional funding).
## Performance and HPC
We are aiming to support low end (laptop) and high end (HPCs) compute environment.
However at the moment support is limited and the operators are very slow.
## Testing
See [tests/README.md](tests/README.md) for details.
## Environment variables
Variables affecting how FrameSense works can be set
in your session environment
or the `.env` file located in the same folder as framesense.py.
You can point to a different `.env` file
by setting the absolute or relative (to `framesense.py`) path
in the `FRAMESENSE_DOTENV_PATH` environment variable.
[`./docs/.env`](./docs/.env) is a template
that contains a list of all available variables and their default values.
If CUDA_VISIBLE_DEVICES=''
the containers will not be able to access the host's GPU.
Any operator parameter can be passed as an environment variable.
The name is OPERATOR_PARAM,
where OPERATOR is the uppercase name of the operator,
and PARAM the uppercase name of the parameter.
For instance ANSWER_FRAMES_VLM_MODEL="qwen3-vl:2b"
to set the `model` param of `answer_frames_vlm` to "qwen3-vl:2b".
## Operator parameters
The value of any operator parameter originates from these sources,
in the following order of precedence:
* environment variable (see above)
* variable in `.env` file
* value under meta.params.OPERATOR.PARAM in the collection json file (note that OPERATOR and PARAM values should be in lower case, e.g. `answer_frames_vlm` and `model`)
* the default value set in the operator's `params.json` file
## GPU access
Docker containers can access the host GPU
if the [nvidia container toolkit](https://docs.nvidia.com/ai-enterprise/deployment/vmware/latest/docker.html) is installed.
To check support run this command:
`docker run --rm --gpus all alpine`
If it returns an error related to gpu in the message
the toolkit might be missing.