Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ashvardanian/tenpack
Fast Tensors Packaging library for text, image, video, and audio data compatible with PyTorch, TensorFlow, & NumPy 🖼️🎵🎥 ➡️ 🧠
https://github.com/ashvardanian/tenpack
clip laion multi-modal numpy parser pytorch simd tensor tensorflow transformer
Last synced: about 2 months ago
JSON representation
Fast Tensors Packaging library for text, image, video, and audio data compatible with PyTorch, TensorFlow, & NumPy 🖼️🎵🎥 ➡️ 🧠
- Host: GitHub
- URL: https://github.com/ashvardanian/tenpack
- Owner: ashvardanian
- Created: 2022-07-24T12:22:43.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-06T06:31:12.000Z (8 months ago)
- Last Synced: 2024-10-15T01:25:07.050Z (2 months ago)
- Topics: clip, laion, multi-modal, numpy, parser, pytorch, simd, tensor, tensorflow, transformer
- Language: C++
- Homepage:
- Size: 108 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# TenPack
Three simple things this library does to you:
1. Guess the media type from raw bytes,
2. Parse its' dimensions, sizes, lengths, etc.,
3. Unpack data into regular preallocated Tensors.Where do we use it?
To connect the Data Storage layer of [UKV](github.com/unum-cloud/ukv) to High-Performance Computing libraries like [TensorFlow](tensorflow.org) and [PyTorch](pytorch.org).## How it works?
Most common file-formats have "signatures" or "magic numbers" embedded into them.
Often, as the prefix of the byte-stream.* [List of file signatures](https://en.wikipedia.org/wiki/List_of_file_signatures)
* [Magic numbers in programming](https://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files)Libraries implementing the first step have been implemented for other languages:
* [filetype](https://github.com/h2non/filetype) for GoLang
* [filetype.py](https://github.com/h2non/filetype.py) for Python
* [FileType](https://github.com/rzane/file_type) for Elixir
* [FileSignatures](https://github.com/neilharvey/FileSignatures) for C### Alternatives for Tensor Exports
* [Pillow](https://pillow.readthedocs.io/en/stable/) and [Pillow-SIMD](https://github.com/uploadcare/pillow-simd) for [image formats](https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html).
* [FFmpeg](https://ffmpeg.org/), for video formats.
* [Nyquist](https://github.com/ddiakopoulos/libnyquist), for audio formats.In fact, TenPack is just a CMake-friendly generalization of those libraries with a C interface and focus on memory reusing.