{"id":19797386,"url":"https://github.com/pyexcel/pyexcel-htmlr","last_synced_at":"2025-05-01T03:31:37.633Z","repository":{"id":62580256,"uuid":"97990651","full_name":"pyexcel/pyexcel-htmlr","owner":"pyexcel","description":"Read tables in html page as excel data","archived":false,"fork":false,"pushed_at":"2025-04-18T07:53:01.000Z","size":81,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"dev","last_synced_at":"2025-04-30T12:23:34.556Z","etag":null,"topics":["html"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pyexcel.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.rst","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":"chfw","patreon":"chfw"}},"created_at":"2017-07-21T22:20:09.000Z","updated_at":"2025-04-18T07:53:05.000Z","dependencies_parsed_at":"2022-11-03T21:02:19.340Z","dependency_job_id":null,"html_url":"https://github.com/pyexcel/pyexcel-htmlr","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyexcel%2Fpyexcel-htmlr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyexcel%2Fpyexcel-htmlr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyexcel%2Fpyexcel-htmlr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pyexcel%2Fpyexcel-htmlr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pyexcel","download_url":"https://codeload.github.com/pyexcel/pyexcel-htmlr/tar.gz/refs/heads/dev","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251818231,"owners_count":21648858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html"],"created_at":"2024-11-12T07:25:05.200Z","updated_at":"2025-05-01T03:31:37.354Z","avatar_url":"https://github.com/pyexcel.png","language":"Python","readme":"================================================================================\npyexcel-htmlr - Let you focus on data, instead of html format\n================================================================================\n\n.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png\n   :target: https://www.patreon.com/chfw\n\n.. image:: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg\n   :target: https://awesome-python.com/#specific-formats-processing\n\n.. image:: https://travis-ci.org/pyexcel/pyexcel-htmlr.svg?branch=master\n   :target: http://travis-ci.org/pyexcel/pyexcel-htmlr\n\n.. image:: https://codecov.io/gh/pyexcel/pyexcel-htmlr/branch/master/graph/badge.svg\n   :target: https://codecov.io/gh/pyexcel/pyexcel-htmlr\n\n.. image:: https://badge.fury.io/py/pyexcel-htmlr.svg\n   :target: https://pypi.org/project/pyexcel-htmlr\n\n\n.. image:: https://pepy.tech/badge/pyexcel-htmlr/month\n   :target: https://pepy.tech/project/pyexcel-htmlr/month\n\n\n.. image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg\n   :target: https://gitter.im/pyexcel/Lobby\n\n.. image:: https://readthedocs.org/projects/pyexcel-htmlr/badge/?version=latest\n   :target: http://pyexcel-htmlr.readthedocs.org/en/latest/\n\n\nSupport the project\n================================================================================\n\nIf your company has embedded pyexcel and its components into a revenue generating\nproduct, please support me on github, `patreon \u003chttps://www.patreon.com/bePatron?u=5537627\u003e`_\nor `bounty source \u003chttps://salt.bountysource.com/teams/chfw-pyexcel\u003e`_ to maintain\nthe project and develop it further.\n\nIf you are an individual, you are welcome to support me too and for however long\nyou feel like. As my backer, you will receive\n`early access to pyexcel related contents \u003chttps://www.patreon.com/pyexcel/posts\u003e`_.\n\nAnd your issues will get prioritized if you would like to become my patreon as `pyexcel pro user`.\n\nWith your financial support, I will be able to invest\na little bit more time in coding, documentation and writing interesting posts.\n\n\nKnown constraints\n==================\n\nFonts, colors and charts are not supported.\n\nInstallation\n================================================================================\n\n\nYou can install pyexcel-htmlr via pip:\n\n.. code-block:: bash\n\n    $ pip install pyexcel-htmlr\n\n\nor clone it and install it:\n\n.. code-block:: bash\n\n    $ git clone https://github.com/pyexcel/pyexcel-htmlr.git\n    $ cd pyexcel-htmlr\n    $ python setup.py install\n\nUsage\n================================================================================\n\nAs a standalone library\n--------------------------------------------------------------------------------\n\n.. testcode::\n   :hide:\n\n    \u003e\u003e\u003e import os\n    \u003e\u003e\u003e import sys\n    \u003e\u003e\u003e if sys.version_info[0] \u003c 3:\n    ...     from StringIO import StringIO\n    ... else:\n    ...     from io import BytesIO as StringIO\n    \u003e\u003e\u003e PY2 = sys.version_info[0] == 2\n    \u003e\u003e\u003e if PY2 and sys.version_info[1] \u003c 7:\n    ...      from ordereddict import OrderedDict\n    ... else:\n    ...     from collections import OrderedDict\n    \u003e\u003e\u003e import pyexcel as pe\n    \u003e\u003e\u003e book_data = {\"Sheet 1\": [[1, 2, 3], [4, 5, 6]], \"Sheet 2\": [[\"row 1\", \"row 2\", \"row 3\"]]}\n    \u003e\u003e\u003e pe.save_book_as(bookdict=book_data, dest_file_name=\"your_file.html\")\n\n\nRead from an html file\n********************************************************************************\n\nHere's the sample code:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from pyexcel_htmlr import get_data\n    \u003e\u003e\u003e data = get_data(\"your_file.html\")\n    \u003e\u003e\u003e import json\n    \u003e\u003e\u003e print(json.dumps(data))\n    {\"Table 1\": [[1, 2, 3], [4, 5, 6]], \"Table 2\": [[\"row 1\", \"row 2\", \"row 3\"]]}\n\n\n\n\nRead from an html from memory\n********************************************************************************\n\nContinue from previous example:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e # This is just an illustration\n    \u003e\u003e\u003e # In reality, you might deal with html file upload\n    \u003e\u003e\u003e # where you will read from requests.FILES['YOUR_HTML_FILE']\n    \u003e\u003e\u003e with open('your_file.html', 'r') as html_file:\n    ...    io = StringIO(html_file.read().encode())\n    ...    data = get_data(io)\n    \u003e\u003e\u003e print(json.dumps(data))\n    {\"Table 1\": [[1, 2, 3], [4, 5, 6]], \"Table 2\": [[\"row 1\", \"row 2\", \"row 3\"]]}\n\nPagination feature\n********************************************************************************\n\n\n\nLet's assume the following file is a huge html file:\n\n.. code-block:: python\n\n   \u003e\u003e\u003e huge_data = [\n   ...     [1, 21, 31],\n   ...     [2, 22, 32],\n   ...     [3, 23, 33],\n   ...     [4, 24, 34],\n   ...     [5, 25, 35],\n   ...     [6, 26, 36]\n   ... ]\n   \u003e\u003e\u003e sheetx = {\n   ...     \"Table 1\": huge_data\n   ... }\n   \u003e\u003e\u003e pe.save_book_as(dest_file_name=\"huge_file.html\", bookdict=sheetx)\n\nAnd let's pretend to read partial data:\n\n.. code-block:: python\n\n   \u003e\u003e\u003e partial_data = get_data(\"huge_file.html\", start_row=2, row_limit=3)\n   \u003e\u003e\u003e print(json.dumps(partial_data))\n   {\"Table 1\": [[3, 23, 33], [4, 24, 34], [5, 25, 35]]}\n\nAnd you could as well do the same for columns:\n\n.. code-block:: python\n\n   \u003e\u003e\u003e partial_data = get_data(\"huge_file.html\", start_column=1, column_limit=2)\n   \u003e\u003e\u003e print(json.dumps(partial_data))\n   {\"Table 1\": [[21, 31], [22, 32], [23, 33], [24, 34], [25, 35], [26, 36]]}\n\nObvious, you could do both at the same time:\n\n.. code-block:: python\n\n   \u003e\u003e\u003e partial_data = get_data(\"huge_file.html\",\n   ...     start_row=2, row_limit=3,\n   ...     start_column=1, column_limit=2)\n   \u003e\u003e\u003e print(json.dumps(partial_data))\n   {\"Table 1\": [[23, 33], [24, 34], [25, 35]]}\n\n.. testcode::\n   :hide:\n\n   \u003e\u003e\u003e os.unlink(\"huge_file.html\")\n\n\nAs a pyexcel plugin\n--------------------------------------------------------------------------------\n\nNo longer, explicit import is needed since pyexcel version 0.2.2. Instead,\nthis library is auto-loaded. So if you want to read data in html format,\ninstalling it is enough.\n\n\nReading from an html file\n********************************************************************************\n\nHere is the sample code:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e import pyexcel as pe\n    \u003e\u003e\u003e sheet = pe.get_book(file_name=\"your_file.html\")\n    \u003e\u003e\u003e sheet\n    Table 1:\n    +---+---+---+\n    | 1 | 2 | 3 |\n    +---+---+---+\n    | 4 | 5 | 6 |\n    +---+---+---+\n    Table 2:\n    +-------+-------+-------+\n    | row 1 | row 2 | row 3 |\n    +-------+-------+-------+\n\n\n\n\nReading from a IO instance\n********************************************************************************\n\nYou got to wrap the binary content with stream to get html working:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e # This is just an illustration\n    \u003e\u003e\u003e # In reality, you might deal with html file upload\n    \u003e\u003e\u003e # where you will read from requests.FILES['YOUR_HTML_FILE']\n    \u003e\u003e\u003e htmlfile = \"your_file.html\"\n    \u003e\u003e\u003e with open(htmlfile, \"rb\") as f:\n    ...     content = f.read()\n    ...     r = pe.get_book(file_type=\"html\", file_content=content)\n    ...     print(r)\n    ...\n    Table 1:\n    +---+---+---+\n    | 1 | 2 | 3 |\n    +---+---+---+\n    | 4 | 5 | 6 |\n    +---+---+---+\n    Table 2:\n    +-------+-------+-------+\n    | row 1 | row 2 | row 3 |\n    +-------+-------+-------+\n\n\n\n\nLicense\n================================================================================\n\nNew BSD License\n\nDeveloper guide\n==================\n\nDevelopment steps for code changes\n\n#. git clone https://github.com/pyexcel/pyexcel-htmlr.git\n#. cd pyexcel-htmlr\n\nUpgrade your setup tools and pip. They are needed for development and testing only:\n\n#. pip install --upgrade setuptools pip\n\nThen install relevant development requirements:\n\n#. pip install -r rnd_requirements.txt # if such a file exists\n#. pip install -r requirements.txt\n#. pip install -r tests/requirements.txt\n\nOnce you have finished your changes, please provide test case(s), relevant documentation\nand update CHANGELOG.rst.\n\n.. note::\n\n    As to rnd_requirements.txt, usually, it is created when a dependent\n    library is not released. Once the dependecy is installed\n    (will be released), the future\n    version of the dependency in the requirements.txt will be valid.\n\n\nHow to test your contribution\n------------------------------\n\nAlthough `nose` and `doctest` are both used in code testing, it is adviable that unit tests are put in tests. `doctest` is incorporated only to make sure the code examples in documentation remain valid across different development releases.\n\nOn Linux/Unix systems, please launch your tests like this::\n\n    $ make\n\nOn Windows systems, please issue this command::\n\n    \u003e test.bat\n\n\nBefore you commit\n------------------------------\n\nPlease run::\n\n    $ make format\n\nso as to beautify your code otherwise travis-ci may fail your unit test.\n\n\nAnd make sure you would have run moban command\n---------------------------------------------------------\n\nAdditional steps are required:\n\n#. pip install moban\n#. make your changes in `.moban.d` directory, then issue command `moban`\n#. moban\n\notherwise travis-ci may also fail your unit test.\n\nWhat is .moban.d\n---------------------------------\n\n`.moban.d` stores the specific meta data for the library.\n\n\n.. testcode::\n   :hide:\n\n   \u003e\u003e\u003e import os\n   \u003e\u003e\u003e os.unlink(\"your_file.html\")\n","funding_links":["https://github.com/sponsors/chfw","https://patreon.com/chfw","https://www.patreon.com/chfw","https://www.patreon.com/bePatron?u=5537627","https://www.patreon.com/pyexcel/posts"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyexcel%2Fpyexcel-htmlr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpyexcel%2Fpyexcel-htmlr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpyexcel%2Fpyexcel-htmlr/lists"}