{"id":20703249,"url":"https://github.com/ahmedfgad/arithmeticencodingpython","last_synced_at":"2025-06-29T17:34:37.927Z","repository":{"id":53870703,"uuid":"310644357","full_name":"ahmedfgad/ArithmeticEncodingPython","owner":"ahmedfgad","description":"Data Compression using Arithmetic Encoding in Python","archived":false,"fork":false,"pushed_at":"2024-02-01T21:35:26.000Z","size":34,"stargazers_count":80,"open_issues_count":1,"forks_count":16,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-23T00:30:03.341Z","etag":null,"topics":["arithmetic-coding","data-compression","data-science","entropy-coding","lossless-compression-algorithm","python"],"latest_commit_sha":null,"homepage":"https://www.linkedin.com/in/ahmedfgad","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ahmedfgad.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":null,"open_collective":"pygad","ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":["https://donate.stripe.com/eVa5kO866elKgM0144","http://paypal.me/ahmedfgad"]}},"created_at":"2020-11-06T16:09:02.000Z","updated_at":"2025-02-13T17:48:06.000Z","dependencies_parsed_at":"2025-04-23T00:37:56.390Z","dependency_job_id":null,"html_url":"https://github.com/ahmedfgad/ArithmeticEncodingPython","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ahmedfgad/ArithmeticEncodingPython","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfgad%2FArithmeticEncodingPython","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfgad%2FArithmeticEncodingPython/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfgad%2FArithmeticEncodingPython/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfgad%2FArithmeticEncodingPython/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ahmedfgad","download_url":"https://codeload.github.com/ahmedfgad/ArithmeticEncodingPython/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ahmedfgad%2FArithmeticEncodingPython/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262638197,"owners_count":23341248,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arithmetic-coding","data-compression","data-science","entropy-coding","lossless-compression-algorithm","python"],"created_at":"2024-11-17T01:06:54.413Z","updated_at":"2025-06-29T17:34:37.897Z","avatar_url":"https://github.com/ahmedfgad.png","language":"Python","readme":"# ArithmeticEncodingPython\r\n\r\nThis project implements the lossless data compression technique called **arithmetic encoding (AE)**. The project is simple and has just some basic features.\r\n\r\nThe project supports encoding the input as both a floating-point value and a binary code.\r\n\r\nThe project has a main module called `pyae.py` which contains a class called `ArithmeticEncoding` to encode and decode messages.\r\n\r\n# Usage Steps\r\n\r\nTo use the project, follow these steps:\r\n\r\n1. Import `pyae`\r\n2. Instantiate the `ArithmeticEncoding` Class\r\n3. Prepare a Message\r\n4. Encode the Message\r\n5. Get the binary code of the encoded message.\r\n6. Decode the Message\r\n\r\n## Import `pyae`\r\n\r\nThe first step is to import the `pyae` module.\r\n\r\n```python\r\nimport pyae\r\n```\r\n\r\n## Instantiate the `ArithmeticEncoding` Class\r\n\r\nCreate an instance of the `ArithmeticEncoding` class. Its constructor accepts 2 arguments:\r\n\r\n1. `frequency_table`: The frequency table as a dictionary where key is the symbol and value is the frequency.\r\n2. `save_stages`: If `True`, then the intervals of each stage are saved in a list. Note that setting `save_stages=True` may cause memory overflow if the message is large\r\n\r\nAccording to the following frequency table, the messages to be encoded/decoded must have only the 3 characters **a**, **b**, and **c**.\r\n\r\n```python\r\nfrequency_table = {\"a\": 2,\r\n                   \"b\": 7,\r\n                   \"c\": 1}\r\n\r\nAE = pyae.ArithmeticEncoding(frequency_table=frequency_table,\r\n                            save_stages=True)\r\n```\r\n\r\n## Prepare a Message\r\n\r\nPrepare the message to be compressed. All the characters in this message must exist in the frequency table.\r\n\r\n```python\r\noriginal_msg = \"abc\"\r\n```\r\n\r\n## Encode the Message\r\n\r\nEncode the message using the `encode()` method. It accepts the message to be encoded and the probability table. It returns the encoded message (single double value) and the encoder stages.\r\n\r\n```python\r\nencoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg, \r\n                                                                          probability_table=AE.probability_table)\r\n```\r\n\r\n## Get the Binary Code of the Encoded Message\r\n\r\nConvert the floating-point value returned from the `AE.encode()` function into a binary code using the `AE.encode_binary()` function.\r\n\r\n```python\r\nbinary_code, encoder_binary = AE.encode_binary(float_interval_min=interval_min_value,\r\n                                               float_interval_max=interval_max_value)\r\n```\r\n\r\n## Decode the Message\r\n\r\nDecode the message using the `decode()` method. It accepts the encoded message, message length, and the probability table. It returns the decoded message and the decoder stages.\r\n\r\n```python\r\ndecoded_msg, decoder = AE.decode(encoded_msg=encoded_msg, \r\n                                 msg_length=len(original_msg),\r\n                                 probability_table=AE.probability_table)\r\n```\r\n\r\nNote that the symbols in the decoded message are returned in a `list`. If the original message is a string, then consider converting the list into a string using `join()` function as follows.\r\n\r\n```python\r\ndecoded_msg = \"\".join(decoded_msg)\r\n```\r\n\r\n# \u003cu\u003eIMPORTANT\u003c/u\u003e: `double` Module\r\n\r\nThe floating-point numbers in Python are limited to a certain precision. Beyond it, Python cannot store any additional decimal numbers. This is why the project uses the double data type offered by the [`decimal` module](https://docs.python.org/2/library/decimal.html).\r\n\r\nThe `decimal` module has a class named `Decimal` that can use any precision. The precision can be changed using the `prec` attribute as follows:\r\n\r\n```python\r\ngetcontext().prec = 50\r\n```\r\n\r\nThe precision defaults to 28. It is up to the user to set the precision to any value that serves the application. Note that the precision only affects the arithmetic operations. \r\n\r\nFor more information about the `decimal` module, check its [documentation](https://docs.python.org/2/library/decimal.html): https://docs.python.org/2/library/decimal.html\r\n\r\n# Example\r\n\r\nThe [`example.py`](/example.py) script has an example that compresses the message `abc` using arithmetic encoding. The precision of the `decimal` data type is left to the default value 28 as it can encode the message `abc` without losing any information. \r\n\r\n```python\r\nimport pyae\r\n\r\n# Example for encoding a simple text message using the PyAE module.\r\n# This example returns the floating-point value in addition to its binary code that encodes the message. \r\n\r\nfrequency_table = {\"a\": 2,\r\n                   \"b\": 7,\r\n                   \"c\": 1}\r\n\r\nAE = pyae.ArithmeticEncoding(frequency_table=frequency_table,\r\n                            save_stages=True)\r\n\r\noriginal_msg = \"abc\"\r\nprint(\"Original Message: {msg}\".format(msg=original_msg))\r\n\r\n# Encode the message\r\nencoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg, \r\n                                                                          probability_table=AE.probability_table)\r\nprint(\"Encoded Message: {msg}\".format(msg=encoded_msg))\r\n\r\n# Get the binary code out of the floating-point value\r\nbinary_code, encoder_binary = AE.encode_binary(float_interval_min=interval_min_value,\r\n                                               float_interval_max=interval_max_value)\r\nprint(\"The binary code is: {binary_code}\".format(binary_code=binary_code))\r\n\r\n# Decode the message\r\ndecoded_msg, decoder = AE.decode(encoded_msg=encoded_msg, \r\n                                 msg_length=len(original_msg),\r\n                                 probability_table=AE.probability_table)\r\ndecoded_msg = \"\".join(decoded_msg)\r\nprint(\"Decoded Message: {msg}\".format(msg=decoded_msg))\r\nprint(\"Message Decoded Successfully? {result}\".format(result=original_msg == decoded_msg))\r\n```\r\n\r\nThe printed messages out of the code are:\r\n\r\n```\r\nOriginal Message: abc\r\nEncoded Message: 0.1729999999999999989175325511\r\nThe binary code is: 0.0010110\r\nDecoded Message: abc\r\nMessage Decoded Successfully? True\r\n```\r\n\r\nSo, the message `abc` is encoded using the double number `0.173`.\r\n\r\nIt is possible to print the encoder to get information about the stages of the encoding process. The encoder is a list of dictionaries where each dictionary represents a stage.\r\n\r\n```python\r\nprint(encoder)\r\n```\r\n\r\n```python\r\n[{'a': [Decimal('0'), Decimal('0.6999999999999999555910790150')],\r\n  'b': [Decimal('0.6999999999999999555910790150'),\r\n   Decimal('0.7999999999999999611421941381')],\r\n  'c': [Decimal('0.7999999999999999611421941381'),\r\n   Decimal('0.9999999999999999722444243844')]},\r\n {'a': [Decimal('0'), Decimal('0.4899999999999999378275106210')],\r\n  'b': [Decimal('0.4899999999999999378275106210'),\r\n   Decimal('0.5599999999999999372723991087')],\r\n  'c': [Decimal('0.5599999999999999372723991087'),\r\n   Decimal('0.6999999999999999361621760841')]},\r\n {'a': [Decimal('0.4899999999999999378275106210'),\r\n   Decimal('0.5389999999999999343303080934')],\r\n  'b': [Decimal('0.5389999999999999343303080934'),\r\n   Decimal('0.5459999999999999346633750008')],\r\n  'c': [Decimal('0.5459999999999999346633750008'),\r\n   Decimal('0.5599999999999999353295088156')]},\r\n {'a': [Decimal('0.5459999999999999346633750008'),\r\n   Decimal('0.5557999999999999345079437774')],\r\n  'b': [Decimal('0.5557999999999999345079437774'),\r\n   Decimal('0.5571999999999999346522727706')],\r\n  'c': [Decimal('0.5571999999999999346522727706'),\r\n   Decimal('0.5599999999999999349409307570')]}]\r\n```\r\n\r\nHere is the binary encoder:\r\n\r\n```python\r\nprint(encoder_binary)\r\n```\r\n\r\n```python\r\n[{0: ['0.0', '0.1'], 1: ['0.1', '1.0']},\r\n {0: ['0.00', '0.01'], 1: ['0.01', '0.1']},\r\n {0: ['0.000', '0.001'], 1: ['0.001', '0.01']},\r\n {0: ['0.0010', '0.0011'], 1: ['0.0011', '0.01']},\r\n {0: ['0.00100', '0.00101'], 1: ['0.00101', '0.0011']},\r\n {0: ['0.001010', '0.001011'], 1: ['0.001011', '0.0011']},\r\n {0: ['0.0010110', '0.0010111'], 1: ['0.0010111', '0.0011']}]\r\n```\r\n\r\n## Low Precision\r\n\r\nAssume the message to be encoded is `\"abc\"*20` (i.e. `abc` repeated 20 times) while using the default precision 28. The length of the message is 60.\r\n\r\n```python\r\noriginal_msg = \"abc\"*20\r\n```\r\n\r\nHere is the code that uses this new message.\r\n\r\n```python\r\nimport pyae\r\n\r\nfrequency_table = {\"a\": 2,\r\n                   \"b\": 7,\r\n                   \"c\": 1}\r\n\r\nAE = pyae.ArithmeticEncoding(frequency_table=frequency_table,\r\n                            save_stages=True)\r\n\r\noriginal_msg = \"abc\"*20\r\nprint(\"Original Message: {msg}\".format(msg=original_msg))\r\n\r\nencoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg, \r\n                                                                          probability_table=AE.probability_table)\r\nprint(\"Encoded Message: {msg}\".format(msg=encoded_msg))\r\n\r\ndecoded_msg, decoder = AE.decode(encoded_msg=encoded_msg, \r\n                                 msg_length=len(original_msg),\r\n                                 probability_table=AE.probability_table)\r\ndecoded_msg = \"\".join(decoded_msg)\r\nprint(\"Decoded Message: {msg}\".format(msg=decoded_msg))\r\nprint(\"Message Decoded Successfully? {result}\".format(result=original_msg == decoded_msg))\r\n```\r\n\r\nBy running the previous code, here are the results of the print statements. The decoded message is different from the original message. The reason is that the current precision of 28 is not sufficient to encode a message of length 60.\r\n\r\n```\r\nOriginal Message: abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc\r\nEncoded Message: 0.1683569979716024329522342419\r\nDecoded Message: abcabcabcabcabcabcabcabcabcabcabcabcabcabcabbcbbbbbbbbbbbbbb\r\nMessage Decoded Successfully? False\r\n```\r\n\r\nIn this case, the precision should be increased. Here is how to change the precision to be 45:\r\n\r\n```python\r\nfrom decimal import getcontext\r\n\r\ngetcontext().prec = 45\r\n```\r\n\r\nHere is the new code after increasing the precision of the `Double` data type:\r\n\r\n```python\r\nimport pyae\r\nfrom decimal import getcontext\r\n\r\ngetcontext().prec = 45\r\n\r\nfrequency_table = {\"a\": 2,\r\n                   \"b\": 7,\r\n                   \"c\": 1}\r\n\r\nAE = pyae.ArithmeticEncoding(frequency_table=frequency_table,\r\n                            save_stages=True)\r\n\r\noriginal_msg = \"abc\"*20\r\nprint(\"Original Message: {msg}\".format(msg=original_msg))\r\n\r\nencoded_msg, encoder , interval_min_value, interval_max_value = AE.encode(msg=original_msg, \r\n                                                                          probability_table=AE.probability_table)\r\nprint(\"Encoded Message: {msg}\".format(msg=encoded_msg))\r\n\r\ndecoded_msg, decoder = AE.decode(encoded_msg=encoded_msg, \r\n                                 msg_length=len(original_msg),\r\n                                 probability_table=AE.probability_table)\r\ndecoded_msg = \"\".join(decoded_msg)\r\nprint(\"Decoded Message: {msg}\".format(msg=decoded_msg))\r\nprint(\"Message Decoded Successfully? {result}\".format(result=original_msg == decoded_msg))\r\n```\r\n\r\nAfter running the code, here are the results where the original message is restored successfully:\r\n\r\n```\r\nOriginal Message: abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc\r\nEncoded Message: 0.168356997971602432952234241597600194030293262\r\nDecoded Message: abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc\r\nMessage Decoded Successfully? True\r\n```\r\n\r\n# Contact Us\r\n\r\n- E-mail: [ahmed.f.gad@gmail.com](mailto:ahmed.f.gad@gmail.com)\r\n- [LinkedIn](https://www.linkedin.com/in/ahmedfgad)\r\n- [Amazon Author Page](https://amazon.com/author/ahmedgad)\r\n- [Heartbeat](https://heartbeat.fritz.ai/@ahmedfgad)\r\n- [Paperspace](https://blog.paperspace.com/author/ahmed)\r\n- [KDnuggets](https://kdnuggets.com/author/ahmed-gad)\r\n- [TowardsDataScience](https://towardsdatascience.com/@ahmedfgad)\r\n- [GitHub](https://github.com/ahmedfgad)\r\n\r\n\r\n\r\n\r\n\r\n","funding_links":["https://opencollective.com/pygad","https://donate.stripe.com/eVa5kO866elKgM0144","http://paypal.me/ahmedfgad"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmedfgad%2Farithmeticencodingpython","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fahmedfgad%2Farithmeticencodingpython","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fahmedfgad%2Farithmeticencodingpython/lists"}