{"id":23921573,"url":"https://github.com/workofstan/xlsx-compare","last_synced_at":"2026-06-09T20:31:45.784Z","repository":{"id":270846405,"uuid":"911618965","full_name":"WorkOfStan/xlsx-compare","owner":"WorkOfStan","description":"Python script to compare two Excel files","archived":false,"fork":false,"pushed_at":"2025-12-01T09:07:15.000Z","size":40,"stargazers_count":0,"open_issues_count":4,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-03T20:44:53.565Z","etag":null,"topics":["excel","python","xlsx"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WorkOfStan.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-03T12:52:55.000Z","updated_at":"2025-06-10T11:49:38.000Z","dependencies_parsed_at":"2026-04-15T17:02:56.748Z","dependency_job_id":null,"html_url":"https://github.com/WorkOfStan/xlsx-compare","commit_stats":null,"previous_names":["workofstan/xlsx-compare"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/WorkOfStan/xlsx-compare","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WorkOfStan%2Fxlsx-compare","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WorkOfStan%2Fxlsx-compare/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WorkOfStan%2Fxlsx-compare/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WorkOfStan%2Fxlsx-compare/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WorkOfStan","download_url":"https://codeload.github.com/WorkOfStan/xlsx-compare/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WorkOfStan%2Fxlsx-compare/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34125332,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["excel","python","xlsx"],"created_at":"2025-01-05T16:18:58.619Z","updated_at":"2026-06-09T20:31:45.779Z","avatar_url":"https://github.com/WorkOfStan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# xlsx_compare\n\n`XLSX Compare` is a Python script that compares two Excel files, sheet by sheet, and identifies differences. This tool is particularly useful for quickly comparing large Excel files and generating an organized output of the differences.\n\n---\n\n## Features\n\n1. **Sheet Comparison**:\n\n   - Identifies sheets that exist only in one file.\n   - Compares the contents of sheets present in both files.\n\n2. **Handles Differences**:\n\n   - Highlights cell-by-cell differences for sheets present in both files.\n   - Writes differences into a separate sheet named `df-\u003csheet_name\u003e`.\n   - CLI output contains difference count (in red).\n\n3. **Generates a Summary Sheet**:\n\n   - Creates a `COMPARISON` sheet summarizing:\n     - Sheets that exist only in `file1` or `file2`.\n     - Sheets with no differences.\n     - Sheets with differences.\n\n4. **Organized Output**:\n\n   - Sheets that exist in only one file are noted in the `COMPARISON` sheet.\n   - Sheets with differences are written with only the changed cells in a new sheet.\n\n5. **Performance Optimization**:\n\n   - Uses `read_only=True` mode with `openpyxl` for processing large files efficiently.\n   - Handles different sheet sizes by padding smaller sheets to match dimensions.\n\n6. **User-Friendly CLI**:\n\n   - Accepts file paths as parameters.\n   - Optionally, allows specifying the output filename.\n\n7. **Option to select sheets to compare**:\n\n   - Comma-separated list of sheet names to compare (default: all shared and unique sheets)\n\n---\n\n## Installation\n\n### Prerequisites\n\n- Python 3.7 or later.\n- Install the required dependencies:\n\n```bash\npip install pandas openpyxl\n```\n\n---\n\n## Usage\n\n### Command-Line Interface\n\n```bash\npython xlsx_compare.py \u003cfile1.xlsx\u003e \u003cfile2.xlsx\u003e [output.xlsx] [--sheets Sheet1,Sheet2]\n```\n\n- `\u003cfile1.xlsx\u003e`: Path to the first Excel file.\n- `\u003cfile2.xlsx\u003e`: Path to the second Excel file.\n- `[output.xlsx]` (optional): Name of the output file. Defaults to `comparison_output.xlsx`.\n- `[--sheets Sheet1,Sheet2]` (optional): Switch with comma separated sheet names to compare.\n\n---\n\n### Example\n\n```bash\npython xlsx_compare.py example/a.xlsx example/b.xlsx example/comparison.xlsx\n# see the example folder, how it looks\n```\n\n## Example Output\n\n### COMPARISON Sheet\n\n| **Sheet Name** | **Status**        |\n| -------------- | ----------------- |\n| `File 1`       | file1.xlsx        |\n| `File 2`       | file2.xlsx        |\n|                |                   |\n| `Sheet1`       | No differences    |\n| `Sheet2`       | Only in file1     |\n| `Sheet3`       | Only in file2     |\n| `Sheet4`       | Differences found |\n\n### Sheet with Differences (e.g., `df-Sheet4`)\n\n| **Cell A1**        | **Cell A2** |\n| ------------------ | ----------- |\n| `Value1 -\u003e Value2` | `...`       |\n\n---\n\n## Debugging\n\n- **Row Count Issue**:\n  - By default, `pandas.read_excel` treats the first row as the header. To compare all rows, the script uses `header=None`.\n- **Performance**:\n  - The script uses `read_only=True` for large files to reduce memory usage.\n- **Shapes Mismatch**:\n  - Pads the smaller DataFrame with empty values to match the dimensions of the larger DataFrame.\n\n---\n\n## Color-Coded Logs\n\nThe script provides color-coded logs for better readability:\n\n- **Cyan**: Processing progress.\n- **Yellow**: Sheets only in one file. (TODO)\n- **Green**: Sheets being compared. (TODO)\n- **Blue**: Sheets with no differences. (TODO)\n- **Red**: Sheets with differences.\n\n---\n\n## Known Issues\n\n1. **DataFrames with Different Shapes**:\n\n   - Handled by padding smaller DataFrames with empty values.\n\n2. **Hidden Characters in Data**:\n   - Automatically strips leading/trailing whitespace.\n\n---\n\n## Future Enhancements\n\n1. Improve comparison speed by leveraging optimized pandas operations. E.g. developing this idea:\n\n```python\ndef compare_dataframes_cell_by_cell(df_lft, df_rgt, sheet_handle: str) -\u003e pd.DataFrame:\n    \"\"\"\n    Compares two dataframes cell-by-cell and returns a dataframe with differences.\n    \"\"\"\n    # TODO potential to speed up the comparison\n    # if df_lft.shape == df_rgt.shape:\n    #    print(f\"{Colors.YELLOW}Comparing same shape{Colors.RESET}\")\n    #    comparison = df_lft.compare(df_rgt, keep_shape=True, keep_equal=False)\n    #    return comparison.dropna(how='all')\n    # BUT THE RETURNED VALUE SHOULD BE PROCESSED LIKE THIS:\n    #    if differences.empty:\n    #        # If there's no difference, just add an info\n    #        pd.DataFrame({\"Info\": [\"No differences found\"]})\n    #            .to_excel(output_writer, sheet_name=f\"eq-{sheet_handle[:28]}\", index=False)\n    #    else:\n    #        # Save the differences\n    #        differences.to_excel(output_writer, sheet_name=f\"df-{sheet_handle[:28]}\", index=True)\n\n    # Normalize data for consistent comparison\n```\n\n---\n\n## Contributing\n\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\n---\n\n## License\n\nThis project is licensed under the MIT License. See the `LICENSE` file for details.\n\n---\n\nLet me know if you'd like any further customization! 😊\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworkofstan%2Fxlsx-compare","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fworkofstan%2Fxlsx-compare","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworkofstan%2Fxlsx-compare/lists"}