Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mitiko/incan74re
A global dynamic dictionary optimizer. Incantare (from latin - to cast ~spells~) casts a "weather forcast" and is used as dictionary preprocessor for the weath3rb0i compressor.
https://github.com/mitiko/incan74re
compression compression-algorithm dictionary-learning rust-language
Last synced: about 2 months ago
JSON representation
A global dynamic dictionary optimizer. Incantare (from latin - to cast ~spells~) casts a "weather forcast" and is used as dictionary preprocessor for the weath3rb0i compressor.
- Host: GitHub
- URL: https://github.com/mitiko/incan74re
- Owner: mitiko
- License: gpl-3.0
- Created: 2021-11-10T20:47:33.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-06-12T12:17:34.000Z (over 2 years ago)
- Last Synced: 2024-10-14T02:41:36.598Z (3 months ago)
- Topics: compression, compression-algorithm, dictionary-learning, rust-language
- Language: Rust
- Homepage:
- Size: 127 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Incan74re
A global dynamic dictionary optimizer.
Unlike CM coders, this compression utility can see future data and make better global decisions - it casts not spells but weather forecasts :wink:.How it works:
1. Build the SA, LCP array and an additional offsets structure for faster word counting
2. Generate all matches
3. Rank matches
4. Choose the best word at each iteration
5. Split the data at the locations of the word
6. Discard matches with rank < 0
7. Repeat until no more matches are leftCopyright (c) 2021 Dimitar Rusev
## License
The incan74re (*incantare*) project is released under the GPL-3.0 License
A build requirement is the libsais library by Ilya Grebnov licensed under Apache License 2.0## Notes
This is a port from my [BWDPerf project](https://github.com/Mitiko/BWDPerf).
The project started as an lzw python competitor with genetic data in mind.
Then I rewrote it in C# (BWD) for better speed and clarity (also I had made some logic mistakes the first time).
Next I introduced a better data structure for a match finder. This went through multiple changes and optimizations.
Finally I found out about the Suffix Array and the FM-index, for a final C# rewrite.
Turns out I was starting to get throttled by the GC and the project was growing into more of a modular compressor rather than a singular dictonary transform, also a more functional laguage would've benefited my use case, so I took up rust (quite quickly and with little effort actually) and rewrote it in rust for a ~90x speed improvement.Later, I tried adding MT code but the bottleneck still is memory latency and log2 computations, that I have to work on first.