{"id":13624254,"url":"https://github.com/Dragon2fly/ZipUnicode","last_synced_at":"2025-04-15T20:33:45.544Z","repository":{"id":57478396,"uuid":"266528494","full_name":"Dragon2fly/ZipUnicode","owner":"Dragon2fly","description":"Extract zip file with correct encoding. Auto detect encoding for filename that was used to archive files. Fix zip file to use UTF-8 as filename encoding.","archived":false,"fork":false,"pushed_at":"2023-03-25T11:33:46.000Z","size":223,"stargazers_count":46,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-09T15:55:24.420Z","etag":null,"topics":["encoding","extract-files","zip"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Dragon2fly.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-05-24T11:39:49.000Z","updated_at":"2025-04-08T02:33:35.000Z","dependencies_parsed_at":"2024-01-14T08:02:11.551Z","dependency_job_id":"f65bfc09-6fcf-4905-9261-717804f28c3c","html_url":"https://github.com/Dragon2fly/ZipUnicode","commit_stats":{"total_commits":5,"total_committers":1,"mean_commits":5.0,"dds":0.0,"last_synced_commit":"261a4fdf3c6e7b94b9190d8bd92e7749f8382fc2"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dragon2fly%2FZipUnicode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dragon2fly%2FZipUnicode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dragon2fly%2FZipUnicode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dragon2fly%2FZipUnicode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Dragon2fly","download_url":"https://codeload.github.com/Dragon2fly/ZipUnicode/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249148473,"owners_count":21220542,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["encoding","extract-files","zip"],"created_at":"2024-08-01T21:01:40.628Z","updated_at":"2025-04-15T20:33:45.114Z","avatar_url":"https://github.com/Dragon2fly.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# ZipUnicode\nMake extracted unreadable filename problem gone away. \n\n[![Downloads](https://pepy.tech/badge/zipunicode)](https://pepy.tech/project/zipunicode)\n[![PyPI version](https://badge.fury.io/py/zipunicode.svg)](https://pypi.org/project/zipunicode/)\n[![GitHub license](https://img.shields.io/github/license/Dragon2fly/zipunicode)](https://github.com/Dragon2fly/zipunicode/blob/master/LICENSE)\n\n## Install:\nUsing pip: `pip install ZipUnicode`\n\nBeside installing `zip_unicode` package, \nthis will also create an executable file `zipu` in the syspath \nfor you to work with `zip` file directly from the console. \n\n## Filename encoding inside a zip file\nEveryone agrees what a zip file is and how to make one. \nThat is the way to turn a collection of files into a sequence of bytes \nand put a `.zip` at the end of the name of a newly created file. \nBut no one said anything about how filename should be handled. \nSo it is up to the zip extracting program to interpret that sequence of bytes into filename.  \n\nMost OS use UTF-8 for filename encoding and flip a bit in the zip file to indicate that.\nHowever, Windows is not a case. For different languages, Windows uses different `code page`s\nto encode filename. So, if you create a zip file containing a file named `ê.txt` on Linux and \nextract it on Windows, you may got something like `├¬.txt` or `ﾃｪ.txt`. \n\nThe exact filename depends on the `code page` or `language` that Windows is using. \nThe same thing also happens when a zip file was created on Windows,\ncontains non-ascii filename, and then extracted on Linux or on Windows that use different `code page`s.\n\nAll that means if the filename wasn't encoded by `UTF-8` `encoding (or code page)`,\nthen there is no easy way to knows which `encoding` that was used when extracting the file. \n\n## Overview\nYou will use `zipu` to interact with zip file.\n\n```bash\n$ zipu -h\n```\n\n```bash\nusage: zipu [-h] [--extract] [--fix] [--encoding ENCODING]\n            [--password PASSWORD]\n            zipfile [destination]\n\nFix filename encoding error inside a zip file.\n\npositional arguments:\n  zipfile               path to zip file\n  destination           folder path to extract zip file\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --extract, -x         extract the zipfile to specified destination\n  --fix, -f             create a new zip file with UTF-8 file names\n  --encoding ENCODING, -enc ENCODING\n                        zip file used encoding: shift-jis, cp932...\n  --password PASSWORD, -pwd PASSWORD\n                        password to extract zip file\n```\n\nExtracting a zip file is as simple as `zipu -x file.zip`. \nFiles are extracted into the folder that has the same name as `file.zip` without `.zip`\nand stays on the same folder path as `file.zip`. Filename `encoding` is handled automatically.\n\nYou can also ensure your zip file being opened correctly on all computers by `zipu -f file.zip`.\nThis will create a new `file_fixed.zip` contains all file names encoded with `UTF-8`.\n\n## Usage:\n1. View content of the zip file:\n   \n   You simply point `zipu` to your zip file's path as follow:\n   \n    ```bash\n    zipu path/to/file.zip\n    ```\n   \n   This makes `zipu` do the following:\n    * automatically guess the encoding that was used to encode file names\n    * check if the file was password encrypted \n    * give you a default extract destination if you don't provide any\n    \n   Then, it will show a summarization of the contents of that zip file, \n   something similar to the following:\n   \n       D:\\tmp\u003ezipu 20200524_ドラゴンフライト.zip\n   \n   ```bash\n     * Detected encoding  :  SHIFT_JIS | Language:Japanese | Confidence:99%\n     * Default destination:  D:\\tmp\n     * Password protected :  False\n    --------------------------- try encoding: SHIFT_JIS ---------------------------\n    20200524_ドラゴンフライト/\n    20200524_ドラゴンフライト/テストレポート＿リナックスノード.txt\n    20200524_ドラゴンフライト/太陽バッテリーver5.txt\n    20200524_ドラゴンフライト/経営報告_桜ちゃん.txt\n    -------------------------------------------------------------------------------\n    Add '-enc ENCODING' to see filename shown in encoding ENCODING (mbcs, cp932, shift-jis,...)\n    Add '-x' flag to extract all files to default destination\n   ```\n   \n   If there is a root folder inside and it has the same name as the zip file as above example, \n   `default destination` will be the parent folder of the zip file.\n   Otherwise, `default destination` will point to a subdirectory \n   that has the name of the zip file as the following case:\n   \n       D:\\tmp\u003ezipu 20200524_ドラゴンボール.zip\n   \n   ```bash\n     * Detected encoding  :  SHIFT_JIS | Language:Japanese | Confidence:99%\n     * Default destination:  D:\\tmp\\20200524_ドラゴンボール\n     * Password protected :  False\n    --------------------------- try encoding: SHIFT_JIS ---------------------------\n    テストレポート＿リナックスノード.txt\n    太陽バッテリーver5.txt\n    経営報告_桜ちゃん.txt\n    -------------------------------------------------------------------------------\n    Add '-enc ENCODING' to see filename shown in encoding ENCODING (mbcs, cp932, shift-jis,...)\n    Add '-x' flag to extract all files to default destination\n   ```\n\n2. View content with a specific encoding:\n\n   Encoding auto-detection is not always correct. When the sample is too little\n   and some parts of `A` encoding are in `B` encoding, `B` may be wrongly detected\n   instead of `A`. In such cases, you can specify the encoding which you believe\n   is the correct one with `-enc ENCODING` switch.\n   \n       D:\\tmp\u003ezipu 20200524_ドラゴンボール.zip -enc cp932\n   \n   ```bash\n     * Default destination:  D:\\tmp\\20200524_ドラゴンボール\n     * Password protected :  False\n    --------------------------- try encoding: cp932 ---------------------------\n    テストレポート＿リナックスノード.txt\n    太陽バッテリーver5.txt\n    経営報告_桜ちゃん.txt\n    ---------------------------------------------------------------------------\n    Add '-enc ENCODING' to see filename shown in encoding ENCODING (mbcs, cp932, shift-jis,...)\n    Add '-x' flag to extract all files to default destination\n   ```\n   \n   In case that your specified `ENCODING` is wrong and cannot decode some bytes,\n   these unknown bytes will be replaced by a lot of `�`.\n   \n       D:\\tmp\u003ezipu 20200524_ドラゴンボール.zip -enc ascii\n   \n   ```bash\n     * Default destination:  D:\\tmp\\20200524_ドラゴンボール\n     * Password protected :  False\n    --------------------------- try encoding: ascii ---------------------------\n    �e�X�g���|�[�g�Q���i�b�N�X�m�[�h.txt\n    ���z�o�b�e���[ver5.txt\n    �o�c��_�������.txt\n    ---------------------------------------------------------------------------\n    Add '-enc ENCODING' to see filename shown in encoding ENCODING (mbcs, cp932, shift-jis,...)\n    Add '-x' flag to extract all files to default destination\n   ```\n   \n   Or those bytes are mapped into completely different characters:\n   \n       D:\\tmp\u003ezipu 20200524_ドラゴンボール.zip -enc utf16\n   \n   ```bash\n     * Default destination:  D:\\tmp\\20200524_ドラゴンボール\n     * Password protected :  False\n    --------------------------- try encoding: utf16 ---------------------------\n    斃境枃貃粃宁枃冁誃榃抃亃境涃宁梃琮瑸\n    뺑窗澃抃斃誃宁敶㕲琮瑸\n    澌掉邍赟苷芿苡⻱硴�\n    ---------------------------------------------------------------------------\n    Add '-enc ENCODING' to see filename shown in encoding ENCODING (mbcs, cp932, shift-jis,...)\n    Add '-x' flag to extract all files to default destination\n   ```\n   \n   Only when auto-detection failed, it is your responsibility to decide which `ENCODING` is the correct one.\n   \n   **Warning**: If your console uses non-full `UTF-8` font as in the case of Windows,\n   some `UTF-8` characters are shown as a dot `・`. \n   This is not a result of wrong encoding but rather unsupported characters by the font.\n   \n3. Extract the zip file:\n\n    Usually, encoding auto-detection works just fine so you can jump right to extraction with \u003cbr\u003e\n    `zipu -x path/to/file.zip`. The `-x` argument can be either placed **before or after** the path to the zip file.\n    \n        D:\\tmp\u003ezipu 20200524_ドラゴンフライト.zip -x\n    \n    ```bash\n     * Detected encoding  :  SHIFT_JIS | Language:Japanese | Confidence:99%\n    Extracting: 20200524_ドラゴンフライト/テストレポート＿リナックスノード.txt\n    Extracting: 20200524_ドラゴンフライト/太陽バッテリーver5.txt\n    Extracting: 20200524_ドラゴンフライト/経営報告_桜ちゃん.txt\n    Finished\n   ``` \n   \n   As mentioned before, without specifying the `destination`, zip file is extracted to\n   the directory in the same path and has the name of that zip file.\u003cbr\u003e \n   In the above example, that would be `D:\\tmp\\20200524_ドラゴンフライト`.\n   \n   When extract `destination` is specified, you add it right after the zip file's path as:\n   \n       zipu -x path/to/file.zip path/to/extract \n   \n   If the output file names are unreadable, \n   you have to guess the `ENCODING` with `-enc` switch as described in **2. View content with a specified encoding**.\n   Then you can use that `ENCODING` to extract zip file:\n   \n       zipu path/to/file.zip -x -enc ENCODING\n   \n4. A Password protected zip file:\n\n    If a zip file is encrypted, ` * Password protected :  True` will show up when viewing its content. \n    When extracting the zip file, you will be asked for `password` if you haven't provided any.\n    You can also specify password directly in the command as follows:\n    \n        zipu path/to/file.zip -x -pwd PASSWORD  \n\n5. Mixed contents:\n\n   Some zip files are very tricky. It contains file names of different encodings. Some `UTF-8`, some not.\n   For `UTF-8` marked files, `zipu` will leave it as is while trying different `ENCODING` on other files.\n   `UTF-8` encoded filename has `(UTF-8) ` string prefixed in the content view:\n   \n       D:\\tmp\u003ezipu ミックス.zip\n   \n   ```bash\n    * Detected encoding  :  SHIFT_JIS | Language:Japanese | Confidence:63%\n    * Default destination:  D:\\tmp\\ミックス\n    * Password protected :  False\n   --------------------------- try encoding: SHIFT_JIS ---------------------------\n   (UTF-8) Vùng Trời Bình Yên.txt\n   бореиская.txt\n   テストレポート＿リナックスノード.txt\n   太陽バッテリーver5.txt\n   経営報告_桜ちゃん.txt\n   -------------------------------------------------------------------------------\n   Add '-enc ENCODING' to see filename shown in encoding ENCODING (mbcs, cp932, shift-jis,...)\n   Add '-x' flag to extract all files to default destination\n   ```\n   \n   When extracting, `UTF-8` encoded filename will not wrongly be decoded with detected `ENCODING` \n   so that you can read it as is. \n   \n   **Warning**: `zipu` cannot handle zip file that contains three or more encodings, or two encodings\n   but neither is `UTF-8`. In such cases, you have to extract the zip file for each encoding.\n\n6. Fixing a zip file:\n\n   If you make a zip file contains file names which are not in `UTF-8` nor `ASCII` encoding, \n   then you can ensure that your colleagues who use computers of different language can \n   open the zip just fine as follows:\n   \n   ```bash\n   zipu -f path/to/file.zip\n   ```\n   \n   This first extracts your zip file (and convert all file names to `UTF-8`). \n   Then it compresses extracted contents and adds `_fixed` suffix to the zip filename.\n   The fixed zip file is on the same path as the original one.\n   \n   **Warning**: `zipu` cannot create password encrypted zip file. \n   With these files you have to first extract it by `zipu` and then re-zip it \n   with your conventional tool.\n\n## Changelog\n### 1.1.0\n   * Handle malformed zip file: Some zip files contain folders but are registered as file entries.\n  These file entries have size of zero by and are extracted as zero-byte files.\n  Since the OS doesn't allow creating file and folder of the same name \n  within the same directory, `zipu` cannot continue to create the folder and extract the file inside.\n  Now `zipu` will check for those malformed entries and skip it.\n   * Fixing zip file from commandline with `zipu -f` now work normally.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDragon2fly%2FZipUnicode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDragon2fly%2FZipUnicode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDragon2fly%2FZipUnicode/lists"}