{"id":19797389,"url":"https://github.com/pyexcel/pyexcel-xlsxr","last_synced_at":"2025-05-01T03:31:41.678Z","repository":{"id":26935160,"uuid":"111249741","full_name":"pyexcel/pyexcel-xlsxr","owner":"pyexcel","description":"Read big xlsx files that openpyxl, xlrd could not do efficiently","archived":false,"fork":false,"pushed_at":"2024-11-11T08:59:22.000Z","size":129,"stargazers_count":4,"open_issues_count":2,"forks_count":4,"subscribers_count":2,"default_branch":"dev","last_synced_at":"2024-11-11T09:32:16.596Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pyexcel.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.rst","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"chfw","patreon":"chfw"}},"created_at":"2017-11-19T00:00:33.000Z","updated_at":"2024-11-11T08:53:53.000Z","dependencies_parsed_at":"2022-08-07T12:01:19.589Z","dependency_job_id":null,"html_url":"https://github.com/pyexcel/pyexcel-xlsxr","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyexcel%2Fpyexcel-xlsxr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyexcel%2Fpyexcel-xlsxr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyexcel%2Fpyexcel-xlsxr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyexcel%2Fpyexcel-xlsxr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pyexcel","download_url":"https://codeload.github.com/pyexcel/pyexcel-xlsxr/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224233800,"owners_count":17277843,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T07:25:07.015Z","updated_at":"2025-05-01T03:31:41.667Z","avatar_url":"https://github.com/pyexcel.png","language":"Python","readme":"================================================================================\npyexcel-xlsxr - Let you focus on data, instead of xlsx format\n================================================================================\n\n.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png\n   :target: https://www.patreon.com/chfw\n\n.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel-mobans/master/images/awesome-badge.svg\n   :target: https://awesome-python.com/#specific-formats-processing\n\n.. image:: https://codecov.io/gh/pyexcel/pyexcel-xlsxr/branch/master/graph/badge.svg\n   :target: https://codecov.io/gh/pyexcel/pyexcel-xlsxr\n\n.. image:: https://badge.fury.io/py/pyexcel-xlsxr.svg\n   :target: https://pypi.org/project/pyexcel-xlsxr\n\n\n\n.. image:: https://pepy.tech/badge/pyexcel-xlsxr/month\n   :target: https://pepy.tech/project/pyexcel-xlsxr\n\n\n.. image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg\n   :target: https://gitter.im/pyexcel/Lobby\n\n.. image:: https://img.shields.io/static/v1?label=continuous%20templating\u0026message=%E6%A8%A1%E7%89%88%E6%9B%B4%E6%96%B0\u0026color=blue\u0026style=flat-square\n    :target: https://moban.readthedocs.io/en/latest/#at-scale-continous-templating-for-open-source-projects\n\n.. image:: https://img.shields.io/static/v1?label=coding%20style\u0026message=black\u0026color=black\u0026style=flat-square\n    :target: https://github.com/psf/black\n\n**pyexcel-xlsxr** is a specialized xlsx reader using lxml. It does partial reading, meaning\nit wont load all content into memory.\n\n\nlxml installation\n=================\n\nThis library depends on lxml. Because its availablity, the use of this library is restricted.\n\nfor PyPy, lxml == 3.4.4 are tested to work well. But lxml above 3.4.4 is difficult to get installed.\n\nfor Python 3.7, please use lxml==4.1.1.\n\nOtherwise, this library works OK with lxml 3.4.4 or above.\n\n\n\nSupport the project\n================================================================================\n\nIf your company uses pyexcel and its components in a revenue-generating product,\nplease consider supporting the project on GitHub or\n`Patreon \u003chttps://www.patreon.com/bePatron?u=5537627\u003e`_. Your financial\nsupport will enable me to dedicate more time to coding, improving documentation,\nand creating engaging content.\n\n\nKnown constraints\n==================\n\nFonts, colors and charts are not supported.\n\nNor to read password protected xls, xlsx and ods files.\n\nInstallation\n================================================================================\n\n\nYou can install pyexcel-xlsxr via pip:\n\n.. code-block:: bash\n\n    $ pip install pyexcel-xlsxr\n\n\nor clone it and install it:\n\n.. code-block:: bash\n\n    $ git clone https://github.com/pyexcel/pyexcel-xlsxr.git\n    $ cd pyexcel-xlsxr\n    $ python setup.py install\n\nUsage\n================================================================================\n\nAs a standalone library\n--------------------------------------------------------------------------------\n\n.. testcode::\n   :hide:\n\n    \u003e\u003e\u003e import os\n    \u003e\u003e\u003e import sys\n    \u003e\u003e\u003e from io import BytesIO\n    \u003e\u003e\u003e from collections import OrderedDict\n\n\n.. testcode::\n   :hide:\n\n    \u003e\u003e\u003e from pyexcel_xlsxw import save_data\n    \u003e\u003e\u003e data = OrderedDict() # from collections import OrderedDict\n    \u003e\u003e\u003e data.update({\"Sheet 1\": [[1, 2, 3], [4, 5, 6]]})\n    \u003e\u003e\u003e data.update({\"Sheet 2\": [[\"row 1\", \"row 2\", \"row 3\"]]})\n    \u003e\u003e\u003e save_data(\"your_file.xlsx\", data)\n\n\nRead from an xlsx file\n********************************************************************************\n\nHere's the sample code:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from pyexcel_xlsxr import get_data\n    \u003e\u003e\u003e data = get_data(\"your_file.xlsx\")\n    \u003e\u003e\u003e import json\n    \u003e\u003e\u003e print(json.dumps(data))\n    {\"Sheet 1\": [[1, 2, 3], [4, 5, 6]], \"Sheet 2\": [[\"row 1\", \"row 2\", \"row 3\"]]}\n\n\n\n.. testcode::\n   :hide:\n\n    \u003e\u003e\u003e data = OrderedDict()\n    \u003e\u003e\u003e data.update({\"Sheet 1\": [[1, 2, 3], [4, 5, 6]]})\n    \u003e\u003e\u003e data.update({\"Sheet 2\": [[7, 8, 9], [10, 11, 12]]})\n    \u003e\u003e\u003e io = StringIO()\n    \u003e\u003e\u003e save_data(io, data)\n    \u003e\u003e\u003e unused = io.seek(0)\n    \u003e\u003e\u003e # do something with the io\n    \u003e\u003e\u003e # In reality, you might give it to your http response\n    \u003e\u003e\u003e # object for downloading\n\n\n\n\nRead from an xlsx from memory\n********************************************************************************\n\nContinue from previous example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e # This is just an illustration\n    \u003e\u003e\u003e # In reality, you might deal with xlsx file upload\n    \u003e\u003e\u003e # where you will read from requests.FILES['YOUR_XLSX_FILE']\n    \u003e\u003e\u003e data = get_data(io)\n    \u003e\u003e\u003e print(json.dumps(data))\n    {\"Sheet 1\": [[1, 2, 3], [4, 5, 6]], \"Sheet 2\": [[7, 8, 9], [10, 11, 12]]}\n\n\nPagination feature\n********************************************************************************\n\n\n\nLet's assume the following file is a huge xlsx file:\n\n.. code-block:: python\n\n   \u003e\u003e\u003e huge_data = [\n   ...     [1, 21, 31],\n   ...     [2, 22, 32],\n   ...     [3, 23, 33],\n   ...     [4, 24, 34],\n   ...     [5, 25, 35],\n   ...     [6, 26, 36]\n   ... ]\n   \u003e\u003e\u003e sheetx = {\n   ...     \"huge\": huge_data\n   ... }\n   \u003e\u003e\u003e save_data(\"huge_file.xlsx\", sheetx)\n\nAnd let's pretend to read partial data:\n\n.. code-block:: python\n\n   \u003e\u003e\u003e partial_data = get_data(\"huge_file.xlsx\", start_row=2, row_limit=3)\n   \u003e\u003e\u003e print(json.dumps(partial_data))\n   {\"huge\": [[3, 23, 33], [4, 24, 34], [5, 25, 35]]}\n\nAnd you could as well do the same for columns:\n\n.. code-block:: python\n\n   \u003e\u003e\u003e partial_data = get_data(\"huge_file.xlsx\", start_column=1, column_limit=2)\n   \u003e\u003e\u003e print(json.dumps(partial_data))\n   {\"huge\": [[21, 31], [22, 32], [23, 33], [24, 34], [25, 35], [26, 36]]}\n\nObvious, you could do both at the same time:\n\n.. code-block:: python\n\n   \u003e\u003e\u003e partial_data = get_data(\"huge_file.xlsx\",\n   ...     start_row=2, row_limit=3,\n   ...     start_column=1, column_limit=2)\n   \u003e\u003e\u003e print(json.dumps(partial_data))\n   {\"huge\": [[23, 33], [24, 34], [25, 35]]}\n\n.. testcode::\n   :hide:\n\n   \u003e\u003e\u003e os.unlink(\"huge_file.xlsx\")\n\n\nAs a pyexcel plugin\n--------------------------------------------------------------------------------\n\nNo longer, explicit import is needed since pyexcel version 0.2.2. Instead,\nthis library is auto-loaded. So if you want to read data in xlsx format,\ninstalling it is enough.\n\n\nReading from an xlsx file\n********************************************************************************\n\nHere is the sample code:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e import pyexcel as pe\n    \u003e\u003e\u003e sheet = pe.get_book(file_name=\"your_file.xlsx\")\n    \u003e\u003e\u003e sheet\n    Sheet 1:\n    +---+---+---+\n    | 1 | 2 | 3 |\n    +---+---+---+\n    | 4 | 5 | 6 |\n    +---+---+---+\n    Sheet 2:\n    +-------+-------+-------+\n    | row 1 | row 2 | row 3 |\n    +-------+-------+-------+\n\n\n\n.. testcode::\n   :hide:\n\n    \u003e\u003e\u003e sheet.save_as(\"another_file.xlsx\")\n\n\n\nReading from a IO instance\n********************************************************************************\n\nYou got to wrap the binary content with stream to get xlsx working:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e # This is just an illustration\n    \u003e\u003e\u003e # In reality, you might deal with xlsx file upload\n    \u003e\u003e\u003e # where you will read from requests.FILES['YOUR_XLSX_FILE']\n    \u003e\u003e\u003e xlsxfile = \"another_file.xlsx\"\n    \u003e\u003e\u003e with open(xlsxfile, \"rb\") as f:\n    ...     content = f.read()\n    ...     r = pe.get_book(file_type=\"xlsx\", file_content=content)\n    ...     print(r)\n    ...\n    Sheet 1:\n    +---+---+---+\n    | 1 | 2 | 3 |\n    +---+---+---+\n    | 4 | 5 | 6 |\n    +---+---+---+\n    Sheet 2:\n    +-------+-------+-------+\n    | row 1 | row 2 | row 3 |\n    +-------+-------+-------+\n\n\n\n\nLicense\n================================================================================\n\nNew BSD License\n\nDeveloper guide\n==================\n\nDevelopment steps for code changes\n\n#. git clone https://github.com/pyexcel/pyexcel-xlsxr.git\n#. cd pyexcel-xlsxr\n\nUpgrade your setup tools and pip. They are needed for development and testing only:\n\n#. pip install --upgrade setuptools pip\n\nThen install relevant development requirements:\n\n#. pip install -r rnd_requirements.txt # if such a file exists\n#. pip install -r requirements.txt\n#. pip install -r tests/requirements.txt\n\nOnce you have finished your changes, please provide test case(s), relevant documentation\nand update changelog.yml\n\n.. note::\n\n    As to rnd_requirements.txt, usually, it is created when a dependent\n    library is not released. Once the dependency is installed\n    (will be released), the future\n    version of the dependency in the requirements.txt will be valid.\n\n\nHow to test your contribution\n--------------------------------------------------------------------------------\n\nAlthough `nose` and `doctest` are both used in code testing, it is advisable\nthat unit tests are put in tests. `doctest` is incorporated only to make sure\nthe code examples in documentation remain valid across different development\nreleases.\n\nOn Linux/Unix systems, please launch your tests like this::\n\n    $ make\n\nOn Windows, please issue this command::\n\n    \u003e test.bat\n\n\nBefore you commit\n------------------------------\n\nPlease run::\n\n    $ make format\n\nso as to beautify your code otherwise your build may fail your unit test.\n\n\n\n.. testcode::\n   :hide:\n\n   \u003e\u003e\u003e import os\n   \u003e\u003e\u003e os.unlink(\"your_file.xlsx\")\n   \u003e\u003e\u003e os.unlink(\"another_file.xlsx\")\n","funding_links":["https://github.com/sponsors/chfw","https://patreon.com/chfw","https://www.patreon.com/chfw","https://www.patreon.com/bePatron?u=5537627"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyexcel%2Fpyexcel-xlsxr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpyexcel%2Fpyexcel-xlsxr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyexcel%2Fpyexcel-xlsxr/lists"}