{"id":48470302,"url":"https://github.com/abhiksark/python-h264-scratch","last_synced_at":"2026-04-07T06:03:17.337Z","repository":{"id":344391630,"uuid":"1175741559","full_name":"abhiksark/python-h264-scratch","owner":"abhiksark","description":"Pure Python H.264 video decoder — pixel-perfect, from scratch, no C extensions. Decodes real MP4 files.","archived":false,"fork":false,"pushed_at":"2026-03-27T05:02:27.000Z","size":5349,"stargazers_count":19,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-27T08:43:26.245Z","etag":null,"topics":["cabac","cavlc","codec","educational","from-scratch","h264","mp4","numpy","pixel-perfect","python","video-decoder","video-processing"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abhiksark.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-08T05:17:32.000Z","updated_at":"2026-03-27T05:02:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/abhiksark/python-h264-scratch","commit_stats":null,"previous_names":["abhiksark/python-h264-scratch"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/abhiksark/python-h264-scratch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhiksark%2Fpython-h264-scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhiksark%2Fpython-h264-scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhiksark%2Fpython-h264-scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhiksark%2Fpython-h264-scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abhiksark","download_url":"https://codeload.github.com/abhiksark/python-h264-scratch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhiksark%2Fpython-h264-scratch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31501903,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cabac","cavlc","codec","educational","from-scratch","h264","mp4","numpy","pixel-perfect","python","video-decoder","video-processing"],"created_at":"2026-04-07T06:03:17.061Z","updated_at":"2026-04-07T06:03:17.322Z","avatar_url":"https://github.com/abhiksark.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# h264-decoder\n\n[![Python](https://img.shields.io/badge/Python-3.9+-blue.svg)](https://www.python.org/)\n[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](LICENSE)\n[![Tests](https://img.shields.io/badge/Tests-1850_passing-brightgreen.svg)](#running-tests)\n[![Profile](https://img.shields.io/badge/H.264-Baseline_|_Main_|_High-orange.svg)](#supported-features)\n\nA pixel-perfect H.264 video decoder written from scratch in pure Python and NumPy.\n\nDecodes real MP4 files downloaded from the internet — no C extensions, no FFI, no dependencies on existing codec libraries. Built to understand how video compression actually works.\n\n![Showcase — ffmpeg reference vs our Python decoder, side by side](docs/linkedin_showcase.png)\n\n\u003cdetails\u003e\n\u003csummary\u003eWith pixel diff visualization (click to expand)\u003c/summary\u003e\n\n![Showcase with diff column — all black = zero error](docs/showcase.png)\n\n\u003c/details\u003e\n\n## What it does\n\n```python\nfrom decoder.decoder import H264Decoder\n\ndecoder = H264Decoder()\n\n# Decode an MP4 from the internet\nfor frame in decoder.decode_file(\"big_buck_bunny.mp4\"):\n    print(f\"{frame.width}x{frame.height}\")\n    y, cb, cr = frame.luma, frame.cb, frame.cr  # YUV 4:2:0\n    rgb = frame.to_rgb()                         # or RGB\n```\n\n## Pixel-perfect accuracy\n\nVerified against ffmpeg on real internet videos — zero pixel difference across Y, Cb, and Cr channels:\n\n| Video | Resolution | Max pixel diff |\n|-------|-----------|---------------|\n| Big Buck Bunny | 640x360 | **0** |\n| Big Buck Bunny | 1280x720 | **0** |\n| Jellyfish | 640x360 | **0** |\n| Sintel | 640x360 | **0** |\n\n## Supported features\n\n| Feature | Status |\n|---------|--------|\n| **Profiles** | Baseline, Main, High |\n| **Entropy coding** | CAVLC, CABAC |\n| **Frame types** | I, P, B |\n| **Intra prediction** | 4x4 (9 modes), 8x8 (9 modes), 16x16 (4 modes) |\n| **Inter prediction** | All partition sizes, sub-pixel interpolation, weighted prediction |\n| **Transforms** | 4x4 IDCT, 8x8 IDCT, Hadamard |\n| **Deblocking filter** | Full implementation |\n| **Container** | Raw H.264 (Annex B), MP4 |\n| **Reference management** | DPB, MMCO, reference list reordering |\n\n## Decoding pipeline\n\nEach module maps to a stage of the H.264 spec. The full pipeline, from\ncontainer bytes to RGB output:\n\n```mermaid\nflowchart LR\n    A[\"MP4 / Annex B\n    container/\"] --\u003e B[\"NAL Parser\n    bitstream/\"]\n    B --\u003e C[\"SPS / PPS\n    parameters/\"]\n    C --\u003e D[\"Slice Header\n    slice/\"]\n    D --\u003e E[\"Entropy Decode\n    entropy/\"]\n    E --\u003e F[\"Dequantize\n    dequant/\"]\n    F --\u003e G[\"Inverse Transform\n    transform/\"]\n    G --\u003e H[\"Prediction\n    intra/ inter/\"]\n    H --\u003e I[\"Reconstruct\n    reconstruct/\"]\n    I --\u003e J[\"Deblock\n    deblock/\"]\n    J --\u003e K[\"Output\n    color/\"]\n\n    style A fill:#2d2d3d,stroke:#555,color:#ccc\n    style B fill:#2d2d3d,stroke:#555,color:#ccc\n    style C fill:#2d2d3d,stroke:#555,color:#ccc\n    style D fill:#2d2d3d,stroke:#555,color:#ccc\n    style E fill:#2d2d3d,stroke:#555,color:#ccc\n    style F fill:#2d2d3d,stroke:#555,color:#ccc\n    style G fill:#2d2d3d,stroke:#555,color:#ccc\n    style H fill:#2d2d3d,stroke:#555,color:#ccc\n    style I fill:#2d2d3d,stroke:#555,color:#ccc\n    style J fill:#2d2d3d,stroke:#555,color:#ccc\n    style K fill:#2d2d3d,stroke:#555,color:#ccc\n```\n\n### How a frame is decoded\n\nThe decoder processes one NAL unit at a time. SPS/PPS NALs are stored. Slice\nNALs trigger the full macroblock-by-macroblock decode loop. After all MBs in a\npicture are decoded, the deblocking filter runs across the entire frame, and\nthe result is either stored in the decoded picture buffer (for reference frames)\nor output in display order.\n\n```mermaid\nflowchart TD\n    NAL[\"NAL unit stream\"] --\u003e DISP{NAL type?}\n    DISP --\u003e|SPS/PPS| STORE[\"Store parameter sets\"]\n    DISP --\u003e|Slice| SHDR[\"Parse slice header\"]\n    STORE --\u003e NAL\n\n    SHDR --\u003e ENT{Entropy mode?}\n    ENT --\u003e|CAVLC| CAVLC[\"CAVLC MB loop\"]\n    ENT --\u003e|CABAC| CABAC[\"Init 460 contexts + CABAC MB loop\"]\n\n    CAVLC --\u003e STYPE{Slice type?}\n    CABAC --\u003e STYPE\n    STYPE --\u003e|I-slice| IMB[\"Intra prediction + residual\"]\n    STYPE --\u003e|P-slice| PMB[\"Motion compensation from L0 + residual\"]\n    STYPE --\u003e|B-slice| BMB[\"Bi-prediction from L0/L1 + residual\"]\n\n    IMB --\u003e FRAME[\"Frame buffer\"]\n    PMB --\u003e FRAME\n    BMB --\u003e FRAME\n    FRAME --\u003e DBLK[\"Deblocking filter\"]\n    DBLK --\u003e DPB{\"Reference\\nframe?\"}\n    DPB --\u003e|Yes| STORE_DPB[\"Add to DPB\"]\n    DPB --\u003e REORDER[\"Reorder by POC\"]\n    STORE_DPB --\u003e REORDER\n    REORDER --\u003e OUTPUT[\"Output DecodedFrame\"]\n\n    style SHDR fill:#1a3a1a,stroke:#4a4,color:#cfc\n    style STYPE fill:#3a2a1a,stroke:#a84,color:#fda\n    style OUTPUT fill:#1a2a3a,stroke:#48a,color:#adf\n```\n\n### How a macroblock is decoded\n\n```mermaid\nflowchart TD\n    A[Read MB type from bitstream] --\u003e B{Intra or Inter?}\n    B --\u003e|Intra| C[Predict from neighbors]\n    B --\u003e|Inter| D[Motion compensate from reference frames]\n    C --\u003e E[Decode residual coefficients]\n    D --\u003e E\n    E --\u003e F[Inverse quantize]\n    F --\u003e G[Inverse transform 4x4 or 8x8]\n    G --\u003e H[Add residual to prediction]\n    H --\u003e I[Deblocking filter]\n    I --\u003e J[Store in frame buffer]\n\n    style A fill:#1a3a1a,stroke:#4a4,color:#cfc\n    style B fill:#3a2a1a,stroke:#a84,color:#fda\n    style J fill:#1a2a3a,stroke:#48a,color:#adf\n```\n\n### CABAC binary arithmetic decoding\n\n```mermaid\nflowchart LR\n    A[Coded bits] --\u003e B[Arithmetic decoder\n    codIRange / codIOffset]\n    B --\u003e C{Context model\n    pStateIdx, valMPS}\n    C --\u003e|MPS| D[Most probable symbol]\n    C --\u003e|LPS| E[Least probable symbol]\n    D --\u003e F[Update context: increase MPS probability]\n    E --\u003e G[Update context: decrease MPS probability]\n    F --\u003e H[Renormalize range]\n    G --\u003e H\n    H --\u003e I[Decoded bin]\n\n    style B fill:#2d2d3d,stroke:#555,color:#ccc\n    style C fill:#3a2a1a,stroke:#a84,color:#fda\n```\n\n## Project structure\n\n```\nh264-decoder/\n├── bitstream/       # NAL unit parsing, bit-level I/O\n├── parameters/      # SPS/PPS parsing\n├── slice/           # Slice header, weight tables\n├── entropy/         # CAVLC and CABAC decoding\n├── dequant/         # Inverse quantization, scaling lists\n├── transform/       # 4x4 and 8x8 IDCT, Hadamard\n├── intra/           # Intra prediction (4x4, 8x8, 16x16)\n├── inter/           # Inter prediction, motion compensation\n├── deblock/         # Deblocking filter\n├── reconstruct/     # Macroblock reconstruction\n├── color/           # YCbCr to RGB conversion\n├── container/       # MP4 demuxer\n├── decoder/         # Main decoder orchestration\n├── test_data/       # Test streams (not tracked, see below)\n└── docs/            # Architecture docs, spec mapping\n```\n\n## Module overview\n\nEach module corresponds to a stage in the H.264 spec. They are listed here in\npipeline order, from input to output.\n\n| Module | Spec Section | What it does |\n|--------|-------------|--------------|\n| [`container/`](container/) | ISO 14496-12/15 | MP4 demuxer: parses box hierarchy, extracts NALs from `avcC`, converts AVCC length-prefixed format to Annex B start codes |\n| [`bitstream/`](bitstream/) | Annex B, Sec 7.2 | NAL unit framing (start code detection, emulation prevention byte removal) and `BitReader` for exp-Golomb / fixed-width reads |\n| [`parameters/`](parameters/) | Sec 7.3.2 | SPS and PPS parsing: profile/level, picture dimensions, reference frame limits, scaling lists, VUI |\n| [`slice/`](slice/) | Sec 7.3.3 | Slice header parsing: slice type, QP, reference list modification, weighted prediction tables, MMCO commands |\n| [`entropy/`](entropy/) | Sec 9 | CAVLC (run-level VLC tables) and CABAC (binary arithmetic decoder with 460 context models, binarization, context derivation) |\n| [`dequant/`](dequant/) | Sec 8.5.12 | Inverse quantization with position-dependent scaling matrices, 4x4 and 8x8 scaling list support for High Profile |\n| [`transform/`](transform/) | Sec 8.5.12 | Integer 4x4 and 8x8 inverse DCT (butterfly), 4x4 Hadamard (luma DC, chroma DC) |\n| [`intra/`](intra/) | Sec 8.3.1-8.3.3 | Intra prediction: 4x4 (9 modes), 8x8 (9 modes with reference sample filtering), 16x16 (4 modes), chroma (4 modes) |\n| [`inter/`](inter/) | Sec 8.4 | Motion vector prediction (median), motion compensation (6-tap quarter-pel interpolation), B-frame bi-prediction, weighted prediction, direct mode |\n| [`reconstruct/`](reconstruct/) | Sec 8.5 | Macroblock assembly: prediction + dequant + transform + clip, for all MB types |\n| [`deblock/`](deblock/) | Sec 8.7 | In-loop deblocking filter: boundary strength calculation (bS 0-4), adaptive 4-tap / 3-tap filtering on block edges |\n| [`color/`](color/) | Annex E | YCbCr to RGB conversion: BT.601 and BT.709 matrices, 4:2:0/4:2:2/4:4:4 chroma upsampling |\n| [`decoder/`](decoder/) | Sec 7-8 | Top-level orchestration: NAL dispatch, I/P/B slice loops, DPB management, MMCO, POC calculation, frame reordering, error concealment |\n\n## Setup\n\n```bash\ngit clone https://github.com/abhiksark/h264-decoder.git\ncd h264-decoder\npip install -r requirements.txt\n```\n\n## Running tests\n\n```bash\n# Full test suite (1850 tests)\npytest -v\n\n# Specific module\npytest decoder/tests/ -v\npytest entropy/tests/ -v\n\n# High Profile pixel-perfect tests (requires test videos)\npytest decoder/tests/test_high_profile.py -v\n```\n\n### Test data\n\nBinary test files (`.264`, `.yuv`, `.mp4`) are not tracked in git. To run the full test suite including pixel-perfect comparisons:\n\n```bash\n# Download test videos\nwget \"https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/360/Big_Buck_Bunny_360_10s_1MB.mp4\" \\\n  -O test_data/bbb_360_10s.mp4\nwget \"https://test-videos.co.uk/vids/jellyfish/mp4/h264/360/Jellyfish_360_10s_1MB.mp4\" \\\n  -O test_data/jellyfish_360_10s.mp4\n\n# Generate ffmpeg reference output\nffmpeg -skip_loop_filter all -i test_data/bbb_360_10s.mp4 \\\n  -vframes 1 -f rawvideo -pix_fmt yuv420p test_data/bbb_frame1_ref.yuv\nffmpeg -skip_loop_filter all -i test_data/jellyfish_360_10s.mp4 \\\n  -vframes 1 -f rawvideo -pix_fmt yuv420p test_data/jellyfish_360_10s_ref.yuv\n```\n\n## How it works\n\nThis decoder implements every stage of H.264 decoding from the spec (ITU-T\nH.264 / ISO 14496-10). Here is what each stage does and why it matters.\n\n**Entropy decoding.** The bitstream is entropy-coded to reduce size. This\ndecoder supports both CAVLC (context-adaptive variable-length codes, used in\nBaseline) and CABAC (context-adaptive binary arithmetic coding, used in\nMain/High). CABAC maintains 460 context models that adapt based on previously\ndecoded symbols, achieving roughly 10% better compression than CAVLC.\n\n**Inverse quantization.** The encoder discards information by dividing\ntransform coefficients by a quantization step size. The decoder multiplies back\nby the step size (scaled by position-dependent weighting matrices). High Profile\nadds 8x8 scaling lists for finer quality control.\n\n**Inverse transform.** H.264 uses integer approximations of the DCT, not\nfloating-point. The 4x4 and 8x8 butterfly transforms here match the JM\nreference decoder exactly -- bit-identical output on every input. Hadamard\ntransforms handle DC coefficients for luma 16x16 and chroma.\n\n**Intra prediction.** I-macroblocks predict pixel values from already-decoded\nneighbors in the same frame. Nine directional modes for 4x4 and 8x8 blocks\n(vertical, horizontal, diagonal down-left, etc.) plus four 16x16 modes. 8x8\nmode adds lowpass reference sample filtering to reduce prediction noise.\n\n**Inter prediction.** P and B macroblocks predict from previously decoded\nreference frames stored in the DPB. Motion vectors are predicted from neighbors\n(median prediction) and refined per-block. Quarter-pixel interpolation uses a\n6-tap FIR filter. B-frames add bi-prediction (weighted average of L0 and L1\nreferences), weighted prediction, and direct mode (MV derived from co-located\nblocks in the reference).\n\n**Deblocking filter.** Block-based coding creates visible edges at block\nboundaries. The in-loop deblocking filter smooths these edges adaptively:\nboundary strength (bS) ranges from 0 (no filtering) to 4 (strong filtering for\nintra edges), and the filter strength adapts to local QP and pixel gradient.\n\n## Performance\n\nThis is an educational decoder — correctness over speed. Pure Python with NumPy, no SIMD, no threading.\n\n![Performance benchmarks](docs/benchmarks.png)\n\n| Input | Resolution | I-frame decode | Throughput |\n|-------|-----------|---------------|------------|\n| Big Buck Bunny | 640x360 | ~6s | 0.04 Mpx/s |\n| Jellyfish | 640x360 | ~2.5s | 0.09 Mpx/s |\n| Big Buck Bunny | 1280x720 | ~11s | 0.08 Mpx/s |\n\nMulti-frame decoding (P/B-frames): ~0.6 fps at 640x360.\n\nFor comparison, ffmpeg decodes the same content at ~1000x the speed. The goal here isn't performance — it's a readable, spec-compliant implementation you can step through with a debugger.\n\n## Dependencies\n\n- `numpy` — array operations\n- `pytest` — testing (dev only)\n- `pillow` — image output (optional)\n\n## Acknowledgements\n\nThis project would not have been possible without the **[JM Reference Software](https://github.com/shihuade/JM)** (Joint Model). JM was the ground truth at every stage of development — when our output didn't match, JM's source code told us exactly why. Every butterfly coefficient, every context index, every dequantization formula was verified by reading JM's C implementation and comparing intermediate values. If you want to understand H.264, read JM. It's the single best resource after the spec itself.\n\nThanks also to:\n- **[ffmpeg](https://ffmpeg.org/)** — used for generating pixel-perfect reference output and for the MP4 demuxing reference\n- **[x264](https://www.videolan.org/developers/x264.html)** — the encoder behind most of our test streams\n- The authors of **ITU-T H.264** — a remarkably well-designed spec that makes a pure Python implementation feasible\n- The **[test-videos.co.uk](https://test-videos.co.uk/)** project — for hosting freely downloadable H.264 test clips\n\n## References\n\n- [ITU-T H.264](https://www.itu.int/rec/T-REC-H.264) — the spec\n- [JM Reference Software](https://github.com/shihuade/JM) — the reference decoder we verified every function against\n- [ffmpeg](https://ffmpeg.org/) — used for reference YUV generation and validation\n\n## License\n\nApache License 2.0 — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhiksark%2Fpython-h264-scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabhiksark%2Fpython-h264-scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhiksark%2Fpython-h264-scratch/lists"}