{"id":13936688,"url":"https://github.com/pirate/gzint","last_synced_at":"2025-06-20T16:35:55.444Z","repository":{"id":62568866,"uuid":"74190576","full_name":"pirate/gzint","owner":"pirate","description":":scissors: A python3 library for efficiently storing massive integers (stands for gzipped-integer).","archived":false,"fork":false,"pushed_at":"2021-01-01T21:00:18.000Z","size":1282,"stargazers_count":41,"open_issues_count":0,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-09-14T22:16:24.088Z","etag":null,"topics":["compression","gzip","large-numbers","math","python","repeating-patterns"],"latest_commit_sha":null,"homepage":"https://pypi.python.org/pypi/gzint/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pirate.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-11-19T06:06:21.000Z","updated_at":"2024-01-04T16:09:10.000Z","dependencies_parsed_at":"2022-11-03T17:01:02.833Z","dependency_job_id":null,"html_url":"https://github.com/pirate/gzint","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pirate%2Fgzint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pirate%2Fgzint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pirate%2Fgzint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pirate%2Fgzint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pirate","download_url":"https://codeload.github.com/pirate/gzint/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221925715,"owners_count":16902750,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","gzip","large-numbers","math","python","repeating-patterns"],"created_at":"2024-08-07T23:02:54.780Z","updated_at":"2024-10-28T20:18:42.945Z","avatar_url":"https://github.com/pirate.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# gzint: A library for storing huge integers efficiently [![PyPI](https://img.shields.io/pypi/v/gzint.svg?style=flat-square)](https://pypi.python.org/pypi/gzint/) [![PyPI](https://img.shields.io/pypi/pyversions/gzint.svg?style=flat-square)](https://pypi.python.org/pypi/gzint/) [![Twitter URL](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/thesquashSH)\n\n\nThis python library helps store massive integers by using a gzipped-string representation in memory.\nIt makes storing and comparing huge integers fast and lightweight, while gracefully falling back to normal\ninteger operations when math is needed.  It works as a drop-in replacement for `int`.\n\n## Quickstart:\n\n```bash\npip3 install gzint\n```\n\n```python\n\u003e\u003e\u003e from gzint import HugeInt\n\n\u003e\u003e\u003e normal_int = 10**1000000        # huge, but compressable (lots of 0's)\n\u003e\u003e\u003e huge_int = HugeInt(normal_int)\n\n# HugeInts are useful when needing to store lots of large numbers without running out of memory\n# Notice how the memory footprint of a normal int is much larger than the equivalent HugeInt\n\u003e\u003e\u003e normal_int.__sizeof__()\n442948                      # almost 0.5mb!!\n\u003e\u003e\u003e huge_int._value.__sizeof__()\n1025                        # only 1kb\n\n# HugeInts and normal ints are interchageably comparable, and have the same hashes\n\u003e\u003e\u003e HugeInt(5) == 5\nTrue\n\u003e\u003e\u003e HugeInt(5) + 5\n10\n\u003e\u003e\u003e HugeInt(5) + HugeInt(5)\n10\n\u003e\u003e\u003e 5 in {HugeInt(5), 6, 7}   # uses python's hashes of the original int for identity\nTrue\n\n# Of course, this is all silly if you're know beforehand that you're only storing 10**100000, you can just store the string '10**10^6' (57 bytes), and compute it later.\n# This applies to almost all compressible data, if you know beforehand what you're storing, picking the perfect compression method is easy.\n# The tricky part is applying general encryption methods, because compression is expensive and it's not worth the CPU cost of trying methods sequentially until you find the right one.\n# gzip is a fairly simple compression algorithm for catching repeating data, I'm also planning on testing JPEG-style fft compression.\n```\n\n## Theory:\n\nThis library is not magic, I have not somehow figured out how to break the [pigeon-hole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle).\nIt simply exploits the fact that most large numbers we work with in real life are not 100% random, and\neither contain repeating patterns (like lots of 0's) or can be represented compactly by using using notations like\nscientific notation, factorial notation, [knuth's up-arrow notation](https://en.wikipedia.org/wiki/Knuth%27s_up-arrow_notation), etc..\n\nDo not bother trying to use this library if you're actually reading random data,\nit will only make your `int`s bigger.\n\nThe alpha implementation works by compressing repeating patterns in the base-10 representation of `int`s,\nwhich works very well for large numbers with lots of repeating digits (in base-10).  I'm working on\nadding other compression schemes, and automatically picking the one with the most memory savings (which may\nrequire adding threading to compress the int in several different ways concurrently).\n\nAnother possible option is to try and compress all the `int`s used across an entire program, by storing some state\nevery time a HugeInt is created, and seeing if patterns exist globally that can be compressed together.\n\n## Docs:\n\n`HugeInt` is a type which aids in storing very large, but **compressable numbers** in memory in python \u003e= 3.5.\nIt sacrifices CPU time during intialization and math operations, for fast comparisons and at-rest memory efficiency.\n\n`HugeInt` implements the `int` interface, you can almost always treat it like a normal python `int`.\nIt will fall back to creating the full `int` in memory if an operation is not supported on the compressed form (e.g. multiplication).\n\n`HugeInt` provides these methods on top of `int`:\n\n```python\n - HugeInt.__init__:   Initialize a HugeInt from an `int` or str representation\n - HugeInt.__eq__:     Efficiently compare a `HugeInt` with another `HugeInt` or `int`\n - HugeInt.__str__:    Get the full `str` representation of the `HugeInt`\n - HugeInt.__repr__:   Get a short representation of the `HugeInt` suitable for console display\n - HugeInt.__hash__:   Get the `__hash__` of the uncompressed `int`\n - HugeInt.to_int:     Get the `int` representation of the `HugeInt`\n```\n\nBecause `HugeInt` stores a compressed representation of the number, fast, direct math operations are not possible.\nFor the following operations, the number gets de-compressed, the operation performed using the `int`\nequivalent method, and then the result is re-compressed and returned as a `HugeInt` (which can be very slow).\n\n`__abs__`, `__add__`, `__and__`, `__ceil__`, `__floor__`, `__floordiv__`, `__int__`, `__invert__`, `__le__`, `__lshift__`, `__lt__`, `__mod__`, `__mul__`, `__neg__`, `__or__`, `__pos__`, `__pow__`, `__radd__`, `__rand__`, `__rfloordiv__`, `__rlshift__`, `__rmod__`, `__rmul__`, `__ror__`, `__round__`, `__rpow__`, `__rrshift__`, `__rshift__`, `__rsub__`, `__rtruediv__`, `__rxor__`, `__sub__`, `__truediv__`, `__trunc__`, `__xor__`\n\n**Example Use Case:**\n\nRead a file full of huge numbers, and check to see which ones occur more than once (in `O(n)` time).\n\n```python\nnumbers_seen = set()\n\nfor line in open('big_data.txt', 'r'):\n    compressed_int = HugeInt(line)\n    if compressed_int in numbers_seen:\n        print('Found a familiar number:', repr(compressed_int))\n    numbers_seen.add(compressed_int)\n\ndel line\n\nif 1000 in numbers_seen:\n    print('Saw 1000')\n\nif HugeInt(10**1000000) in numbers_seen:\n    print('Saw 10^1,000,000')\n```\n\n**Why `HugeInt` is slow to init:**\n\nYou may notice that initializing big `HugeInt`s takes some time.  This is because `HugeInt` uses\nthe gzip \"deflate\" algorithm, and must perform an O(n) pass over the number, where n is the number of digits in base-10.\nDue to this initial cost, it's recommended to avoid using `HugeInt`s for applications where you will need to re-initialize\nmany `HugeInt`s, or perform many math operations on `HugeInt`s in memory.\n\nRight now, only `__eq__` (`==`) and `__hash__` (`in`) are optimized to work directly on the compressed number,\nother operations will fall back to decompressing back to an `int` and using the slower `int` math methods,\nthen recompressing the returned value.\n\n## Development:\n\n```bash\ngit clone https://github.com/pirate/gzint.git       # python3.5 is the only dependency (brew install python3)\ncd gzint\npython3.5 setup.py test                             # optional, check that tests are passing\npython3.5 setup.py install\n# all code is inside gzint/main.py\n```\n\n**TODOs:**\n\n 1. Implement more compression methods and allow users to manually chose which one, with a way to find the optimal one for a given number:\n    - gzipped hex, binary, octal, or other base representations of the number\n    - base + exponents\n    - scientific notation\n    - knuth's up-arrow notation\n    - factorial notation\n    - prime factor notation\n    - other polynomial representations\n    - python [rational number support](https://docs.python.org/3.6/library/numbers.html#numbers.Rational)\n 2. Fall back to storing the int uncompressed if compression ends up making it bigger\n 3. Speed up/parallelize the compression \u0026 decompression\n 4. See if more math operations can be performed directly on compressed `HugeInt`s without uncompressing first, depending on compression method\n 5. Use a cached_property to prevent decompressing the same HugeInt repeatedly during `int` operations (allow expiry eventually with timeout to get GC benefits...?)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpirate%2Fgzint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpirate%2Fgzint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpirate%2Fgzint/lists"}