{"id":20515366,"url":"https://github.com/guilt/neuralinkcompression","last_synced_at":"2025-04-14T00:19:07.605Z","repository":{"id":241988182,"uuid":"805211213","full_name":"guilt/NeuralinkCompression","owner":"guilt","description":"Neuralink Compression Submission","archived":false,"fork":false,"pushed_at":"2024-05-31T01:30:52.000Z","size":549,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-10T14:13:31.279Z","etag":null,"topics":["compression","data","neuralink"],"latest_commit_sha":null,"homepage":"https://content.neuralink.com/compression-challenge/README.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guilt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-24T05:36:57.000Z","updated_at":"2024-06-07T07:59:53.000Z","dependencies_parsed_at":"2024-05-31T02:49:45.790Z","dependency_job_id":null,"html_url":"https://github.com/guilt/NeuralinkCompression","commit_stats":null,"previous_names":["guilt/neuralinkcompression"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guilt%2FNeuralinkCompression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guilt%2FNeuralinkCompression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guilt%2FNeuralinkCompression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guilt%2FNeuralinkCompression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guilt","download_url":"https://codeload.github.com/guilt/NeuralinkCompression/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248799984,"owners_count":21163404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","data","neuralink"],"created_at":"2024-11-15T21:21:13.940Z","updated_at":"2025-04-14T00:19:07.570Z","avatar_url":"https://github.com/guilt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Neuralink Compression Submission\n\n## Problem Statement\n\nSee [Problem](https://content.neuralink.com/compression-challenge/README.html).\n\n## Software Used\n\n7Zip:\n\n```shell\n$ 7z\n\n7-Zip 24.05 (x64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-05-14\n```\n\nFFMpeg:\n\n```shell\n$ ffmpeg\nffmpeg version 6.0-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers\n  built with gcc 12.2.0 (Rev10, Built by MSYS2 project)\n  configuration: --enable-gpl --enable-version3 --enable-shared --disable-w32threads --disable-autodetect\n                 --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp\n                 --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt\n                 --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2\n                 --enable-libaribb24 --enable-libdav1d --enable-libdavs2 --enable-libuavs3d --enable-libzvbi \n                 --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265\n                 --enable-libxavs2 --enable-libxvid --enable-libaom --enable-libjxl --enable-libopenjpeg\n                 --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype\n                 --enable-libfribidi --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg\n                 --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc\n                 --enable-d3d11va --enable-dxva2 --enable-libvpl --enable-libshaderc --enable-vulkan --enable-libplacebo\n                 --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt\n                 --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame\n                 --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus\n                 --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa\n                 --enable-librubberband --enable-libsoxr --enable-chromaprint\n  libavutil      58.  2.100 / 58.  2.100\n  libavcodec     60.  3.100 / 60.  3.100\n  libavformat    60.  3.100 / 60.  3.100\n  libavdevice    60.  1.100 / 60.  1.100\n  libavfilter     9.  3.100 /  9.  3.100\n  libswscale      7.  1.100 /  7.  1.100\n  libswresample   4. 10.100 /  4. 10.100\n  libpostproc    57.  1.100 / 57.  1.100\n```\n\nPython:\n\n```shell\n$ python --version\nPython 3.12.3\n```\n\nSciPy/NumPy:\n\n```shell\n$ pip install -U scipy\nRequirement already satisfied: scipy in python312\\lib\\site-packages (1.13.1)\nRequirement already satisfied: numpy\u003c2.3,\u003e=1.22.4 in python312\\lib\\site-packages (from scipy) (1.26.4)\n```\n\nGoldWave:\n\n```shell\nv6.80\n```\n\n## Basic Data Analysis\n\nAn analysis of the file in `data.zip` reveals a ton of `.wav` files. First, we want\nto find out what are these files, what is the size of metadata in all these files.\n\nWe prepare the Data folder by simply unzipping:\n\n```shell\n$ 7z x data.zip\n```\n\nand we basically try to look at what the data is:\n\n```shell\n$ file data/00d4f842-fc92-45f5-8cae-3effdc2245f5.wav\ndata/00d4f842-fc92-45f5-8cae-3effdc2245f5.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 19531 Hz\n```\n\n## Audio File Entropy\n\nFirst, I converted manually to ALAC to see how it fared, and removed all metadata.\n\n```shell\n$ du -hs data/00d4f842-fc92-45f5-8cae-3effdc2245f5.wav\n196K    data/00d4f842-fc92-45f5-8cae-3effdc2245f5.wav\n\n$ ffmpeg -i data/00d4f842-fc92-45f5-8cae-3effdc2245f5.wav -map_metadata -1 -acodec alac data/00d4f842-fc92-45f5-8cae-3effdc2245f5.m4a\n\n$ du -hs data/00d4f842-fc92-45f5-8cae-3effdc2245f5.m4a\n132K    data/00d4f842-fc92-45f5-8cae-3effdc2245f5.m4a\n```\n\nI also tried compressing the file individually, and saw that `zip` and `7z` formats did just as badly. \nWhat is to be remembered is that a streaming format is optimized for lower latencies, and few good streaming\nlibraries exist for archival formats.\n\n```shell\n$ cd data\n$ 7z a -tzip -mx9 00d4f842-fc92-45f5-8cae-3effdc2245f5.zip 00d4f842-fc92-45f5-8cae-3effdc2245f5.wav\n$ 7z a -t7z -mx9 00d4f842-fc92-45f5-8cae-3effdc2245f5.7z 00d4f842-fc92-45f5-8cae-3effdc2245f5.wav\n\n$ du -hs 00d4f842-fc92-45f5-8cae-3effdc2245f5.zip\n$ 75K    00d4f842-fc92-45f5-8cae-3effdc2245f5.zip\n\n$ du -hs 00d4f842-fc92-45f5-8cae-3effdc2245f5.7z\n$ 75K    00d4f842-fc92-45f5-8cae-3effdc2245f5.7z\n$ cd ..\n```\n\nSo, 7z still seems to do better in compression ratio. However, there is a lot of unncessary\ninformation stored (file, bytes etc.) in the Zip, and metadata information is redundant and significant,\nso what is the impact of eliminating all of that, assuming the textual data can be moved to a side-channel\nand encoded much better.\n\n```shell\n$ python Scripts/ConcatenateWav.py\n$ 7z a -tzip -mx9 Output.zip Output.wav\n$ 7z a -tzip -mx9 OutputSide.zip Output.txt\n\n$ du -hs Output.zip\n57M     Output.zip\n\n$ du -hs OutputSide.zip\n20K     Output.zip\n```\n\nWe just saved 6MB excluding the list of files and all the extraneous metadata alone.\n\nThe SideCar data is only a mere 20KB. If we only emitted the frames information and the filename\nhad no relevance (seems to be a random UUID), the whole thing is **1374 bytes only** even\nwhen compressed as text.\n\nWhat about `7z`?\n\n```shell\n$ 7z a -t7z -mx9 Output.7z Output.wav\n$ 7z a -t7z -mx9 OutputSide.7z Output.txt\n\n$ du -hs Output.7z\n50M     Output.7z\n\n$ du -hs OutputSide.7z\n16K     OutputSide.7z\n```\n\nWe already improved the baseline by 12MB, a 20% saving. With extremely time-taking compression,\nbut it gets us very close to the entropy of this file. Sidecar sizes are similar.\n\nThis isn't great but it already confirms my suspicions about the file.\n\n## Audio/Perception based Entropy\n\nOne of the better approaches to compress this kind of data is based on Fourier transforms. If\nwe can visualize this file better, we can see if it's random noise or smooth stuff.\n\nIt's not random noise, when I concatenated the whole damn thing and opened in\n[GoldWave](https://www.goldwave.com/goldwave.php). What I saw was this:\n\n[![GoldWave Screenshot](Images/GoldWave-Full.png)](Images/GoldWave-Full.png)\n\nSo, it looks like it can be compressed lossily, and we can print the PSNR. This\nis outside the scope of the assignment, but I want it looked at carefully.\n\nThis is how MP3 fares:\n\n```shell\n$ ffmpeg -i Output.wav -c:a mp3 Output.mp3\n\n$ du -hs Output.mp3\n15M    Output.mp3\n```\n\nand 64 kbps Opus:\n\n```shell\n$ ffmpeg -i Output.wav -c:a libopus Output.opus\n\n$ du -hs Output.opus\n30M     Output.opus\n```\n\nand 32 kbps Opus:\n\n```shell\n$ ffmpeg -i Output.wav -b:a 32k -c:a libopus Output.opus\n\n$ du -hs Output.opus\n14M     Output.opus\n```\n\nand back:\n\n```shell\n$ ffmpeg -i Output.opus -ar 19531 Lossy.wav\n\n$ du -hs Lossy.wav\n140M    Lossy.wav\n```\n\nAlready at a further 40-80% reduction in size. What is the perceptual\ndifference between these files?\n\nFor MP3:\n\n```shell\n$shell Scripts/SNR.py\nOutput.wav =\u003e 0.30912332921100283\nLossy.wav =\u003e 0.3086850807150957\n```\n\nFor 64 kbps Opus:\n\n```shell\n$ python Scripts/SNR.py\nOutput.wav =\u003e 0.30912332921100283\nLossy.wav =\u003e 0.0002217705736148185\n```\n\nFor 32 kbps Opus:\n\n```shell\n$ python Scripts/SNR.py\nOutput.wav =\u003e 0.30912332921100283\nLossy.wav =\u003e -2.963617756168254e-05\n```\n\nMP3 seems to be Viable?!! The WaveForm Image Comparison\ndoes not appear to be bad at all! Of course it's not \nlossless, but it's already 1/4th of the Zip file size to\nbegin with.\n\n[![GoldWave Lossy Screenshot](Images/GoldWave-Lossy.png)](Images/GoldWave-Lossy.png)\n\nIf Neuralink Scientists are interested, they should\ngive this a try and see how it fares in lab tests.\n\n## Conclusion\n\nI first looked at the data, tried to separate the signal data and the metadata. That\ngave us a good way to get rid of extraneous attributes quickly.\n\nSome basic findings there:\n\n- Individual signal data entropy is not same as Individual file entropy.\n- Combined file entropy is not the same as packing individual data together in a Zip file.\n- Use the Wave Combiner then run compression on the Output.wav. Compress the Sidecar separately.\n- Their per file compression has a lower ratio than combined.\n- If Neuralink had shipped a zip file for each wav it would be much larger.\n- If Neuralink need all these files separately and yet they wish to compress better,\n   they need a side car to store segment information efficiently.\n- Individual file entropy is meaningless if a collection of wav files together\n  plus their sidecar has lower entropy.\n\nNext, I checked out the feasibility of the lossless audio algorithms available today. They\ndid not perform better than the textual algorithms.\n\nThere are a lot of Hutter Prize algorithms to compress this data losslessly at high ratios, they\ndo not work well today to meet the latency nor compression ratio constraints. Even the best ones.\nAssuming you wanted to develop a low cost low power chip to do that algorithm, there isn't one to\ncompress as fast today to meet the latency challenges. Not at 10mW with a latency of \u003c1ms at 200x\n ratios, with humanly available technology in 2024. If there is amazing alien level technology\navailable today in a secret government lab, I'd say bring it on and solve for the world.\n\nIf there is a good signal generator (or an encoder) available already which can mimic the behavior\nof the real world, that would indeed be the ideal scenario, then all you need to do is send the\ncommand stream across and play it on the other side. Then you don't start with generic audio\nfiles like this, you just go digital already!\n\nAnother thing is how important each electrode's data is. This can only come from empirical\nexperimentation, and since I am not a Neuroscientist *yet* I cannot tell how to get rid of extraneous\nsignal entropy any more. I am also assuming this signal data will be noisy, the real world\nneeds to tell me if this assumption is correct or not.\n\nNext, I actually analyzed the signal data for what we can do with it today. What I found was that\nlossy compression is actually reasonably better than LZ based file compression. We can get roughly\n15MB of audio data for an hour tops and has excellent SNR ratios similar to the original. If we \ndo not need higher fidelity, we do not need to overengineer for it. Adequate is what I am aiming\nfor.\n\nI think lossy compression might have a bright future if a good SNR value can be determined. In\nall seriousness, Engineering is about making things happen today with what's available today\nto shape a better tomorrow. Innovation is great yet there needs to be more talk about what is\ngood enough for a variety of usecases, so that it helps us embrace and design for real-world\nconstraints.\n\n## Feedback\n\nAll feedback welcome!\n\n* Author: Karthik Kumar Viswanathan\n* Web   : http://karthikkumar.org\n* Email : me@karthikkumar.org\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguilt%2Fneuralinkcompression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguilt%2Fneuralinkcompression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguilt%2Fneuralinkcompression/lists"}