{"id":15017688,"url":"https://github.com/thombashi/pytablereader","last_synced_at":"2025-04-07T17:09:11.772Z","repository":{"id":62583868,"uuid":"71712658","full_name":"thombashi/pytablereader","owner":"thombashi","description":"A Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.","archived":false,"fork":false,"pushed_at":"2023-06-25T04:15:35.000Z","size":981,"stargazers_count":105,"open_issues_count":0,"forks_count":10,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-05-17T00:01:48.830Z","etag":null,"topics":["csv","excel","google-sheets","html","json","ltsv","markdown","mediawiki","pandas","pandas-dataframe","python-library","reader","sqlite","table","tsv"],"latest_commit_sha":null,"homepage":"https://pytablereader.rtfd.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thombashi.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":"thombashi","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2016-10-23T15:45:24.000Z","updated_at":"2024-05-17T00:01:48.831Z","dependencies_parsed_at":"2022-11-03T21:58:01.406Z","dependency_job_id":null,"html_url":"https://github.com/thombashi/pytablereader","commit_stats":{"total_commits":1078,"total_committers":4,"mean_commits":269.5,"dds":"0.39053803339517623","last_synced_commit":"b2a6a3db3ef52f5db942340ae75a6905df64a960"},"previous_names":[],"tags_count":110,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thombashi%2Fpytablereader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thombashi%2Fpytablereader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thombashi%2Fpytablereader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thombashi%2Fpytablereader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thombashi","download_url":"https://codeload.github.com/thombashi/pytablereader/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247365299,"owners_count":20927301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","excel","google-sheets","html","json","ltsv","markdown","mediawiki","pandas","pandas-dataframe","python-library","reader","sqlite","table","tsv"],"created_at":"2024-09-24T19:50:51.619Z","updated_at":"2025-04-07T17:09:11.749Z","avatar_url":"https://github.com/thombashi.png","language":"Python","readme":".. contents:: **pytablereader**\n   :backlinks: top\n   :depth: 2\n\nSummary\n=========\n`pytablereader \u003chttps://github.com/thombashi/pytablereader\u003e`__ is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.\n\n.. image:: https://badge.fury.io/py/pytablereader.svg\n    :target: https://badge.fury.io/py/pytablereader\n    :alt: PyPI package version\n\n.. image:: https://img.shields.io/pypi/pyversions/pytablereader.svg\n    :target: https://pypi.org/project/pytablereader\n    :alt: Supported Python versions\n\n.. image:: https://img.shields.io/pypi/implementation/pytablereader.svg\n    :target: https://pypi.org/project/pytablereader\n    :alt: Supported Python implementations\n\n.. image:: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml/badge.svg\n    :target: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml\n    :alt: CI status of Linux/macOS/Windows\n\n.. image:: https://coveralls.io/repos/github/thombashi/pytablereader/badge.svg?branch=master\n    :target: https://coveralls.io/github/thombashi/pytablereader?branch=master\n    :alt: Test coverage\n\n.. image:: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql/badge.svg\n    :target: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql\n    :alt: CodeQL\n\nFeatures\n--------\n- Extract structured tabular data from various data format:\n    - CSV / Tab separated values (TSV) / Space separated values (SSV)\n    - Microsoft Excel :superscript:`TM` file\n    - `Google Sheets \u003chttps://www.google.com/intl/en_us/sheets/about/\u003e`_\n    - HTML (``table`` tags)\n    - JSON\n    - `Labeled Tab-separated Values (LTSV) \u003chttp://ltsv.org/\u003e`__\n    - `Line-delimited JSON(LDJSON) \u003chttps://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON\u003e`__ / NDJSON / JSON Lines\n    - Markdown\n    - MediaWiki\n    - SQLite database file\n- Supported data sources are:\n    - Files on a local file system\n    - Accessible URLs\n    - ``str`` instances\n- Loaded table data can be used as:\n    - `pandas.DataFrame \u003chttps://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html\u003e`__ instance\n    - ``dict`` instance\n\nExamples\n==========\nLoad a CSV table\n------------------\n:Sample Code:\n    .. code-block:: python\n\n        import pytablereader as ptr\n        import pytablewriter as ptw\n\n\n        # prepare data ---\n        file_path = \"sample_data.csv\"\n        csv_text = \"\\n\".join([\n            '\"attr_a\",\"attr_b\",\"attr_c\"',\n            '1,4,\"a\"',\n            '2,2.1,\"bb\"',\n            '3,120.9,\"ccc\"',\n        ])\n\n        with open(file_path, \"w\") as f:\n            f.write(csv_text)\n\n        # load from a csv file ---\n        loader = ptr.CsvTableFileLoader(file_path)\n        for table_data in loader.load():\n            print(\"\\n\".join([\n                \"load from file\",\n                \"==============\",\n                \"{:s}\".format(ptw.dumps_tabledata(table_data)),\n            ]))\n\n        # load from a csv text ---\n        loader = ptr.CsvTableTextLoader(csv_text)\n        for table_data in loader.load():\n            print(\"\\n\".join([\n                \"load from text\",\n                \"==============\",\n                \"{:s}\".format(ptw.dumps_tabledata(table_data)),\n            ]))\n\n\n:Output:\n    .. code-block::\n\n        load from file\n        ==============\n        .. table:: sample_data\n\n            ======  ======  ======\n            attr_a  attr_b  attr_c\n            ======  ======  ======\n                 1     4.0  a\n                 2     2.1  bb\n                 3   120.9  ccc\n            ======  ======  ======\n\n        load from text\n        ==============\n        .. table:: csv2\n\n            ======  ======  ======\n            attr_a  attr_b  attr_c\n            ======  ======  ======\n                 1     4.0  a\n                 2     2.1  bb\n                 3   120.9  ccc\n            ======  ======  ======\n\nGet loaded table data as pandas.DataFrame instance\n----------------------------------------------------\n\n:Sample Code:\n    .. code-block:: python\n\n        import pytablereader as ptr\n\n        loader = ptr.CsvTableTextLoader(\n            \"\\n\".join([\n                \"a,b\",\n                \"1,2\",\n                \"3.3,4.4\",\n            ]))\n        for table_data in loader.load():\n            print(table_data.as_dataframe())\n\n:Output:\n    .. code-block::\n\n             a    b\n        0    1    2\n        1  3.3  4.4\n\nFor more information\n----------------------\nMore examples are available at \nhttps://pytablereader.rtfd.io/en/latest/pages/examples/index.html\n\nInstallation\n============\n\nInstall from PyPI\n------------------------------\n::\n\n    pip install pytablereader\n\nSome of the formats require additional dependency packages, you can install the dependency packages as follows:\n\n- Excel\n    - ``pip install pytablereader[excel]``\n- Google Sheets\n    - ``pip install pytablereader[gs]``\n- Markdown\n    - ``pip install pytablereader[md]``\n- Mediawiki\n    - ``pip install pytablereader[mediawiki]``\n- SQLite\n    - ``pip install pytablereader[sqlite]``\n- Load from URLs\n    - ``pip install pytablereader[url]``\n- All of the extra dependencies\n    - ``pip install pytablereader[all]``\n\nInstall from PPA (for Ubuntu)\n------------------------------\n::\n\n    sudo add-apt-repository ppa:thombashi/ppa\n    sudo apt update\n    sudo apt install python3-pytablereader\n\n\nDependencies\n============\n- Python 3.7+\n- `Python package dependencies (automatically installed) \u003chttps://github.com/thombashi/pytablereader/network/dependencies\u003e`__\n\n\nOptional Python packages\n------------------------------------------------\n- ``logging`` extras\n    - `loguru \u003chttps://github.com/Delgan/loguru\u003e`__: Used for logging if the package installed\n- ``excel`` extras\n    - `excelrd \u003chttps://github.com/thombashi/excelrd\u003e`__\n- ``md`` extras\n    - `Markdown \u003chttps://github.com/Python-Markdown/markdown\u003e`__\n- ``mediawiki`` extras\n    - `pypandoc \u003chttps://github.com/bebraw/pypandoc\u003e`__\n- ``sqlite`` extras\n    - `SimpleSQLite \u003chttps://github.com/thombashi/SimpleSQLite\u003e`__\n- ``url`` extras\n    - `retryrequests \u003chttps://github.com/thombashi/retryrequests\u003e`__\n- `pandas \u003chttps://pandas.pydata.org/\u003e`__\n    - required to get table data as a pandas data frame\n- `lxml \u003chttps://lxml.de/installation.html\u003e`__\n\nOptional packages (other than Python packages)\n------------------------------------------------\n- ``libxml2`` (faster HTML conversion)\n- `pandoc \u003chttps://pandoc.org/\u003e`__ (required when loading MediaWiki file)\n\nDocumentation\n===============\nhttps://pytablereader.rtfd.io/\n\nRelated Project\n=================\n- `pytablewriter \u003chttps://github.com/thombashi/pytablewriter\u003e`__\n    - Tabular data loaded by ``pytablereader`` can be written another tabular data format with ``pytablewriter``.\n\nSponsors\n====================================\n.. image:: https://avatars.githubusercontent.com/u/44389260?s=48\u0026u=6da7176e51ae2654bcfd22564772ef8a3bb22318\u0026v=4\n   :target: https://github.com/chasbecker\n   :alt: Charles Becker (chasbecker)\n.. image:: https://avatars.githubusercontent.com/u/46711571?s=48\u0026u=57687c0e02d5d6e8eeaf9177f7b7af4c9f275eb5\u0026v=4\n   :target: https://github.com/Arturi0\n   :alt: onetime: Arturi0\n.. image:: https://avatars.githubusercontent.com/u/3658062?s=48\u0026v=4\n   :target: https://github.com/b4tman\n   :alt: onetime: Dmitry Belyaev (b4tman)\n\n`Become a sponsor \u003chttps://github.com/sponsors/thombashi\u003e`__\n\n","funding_links":["https://github.com/sponsors/thombashi"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthombashi%2Fpytablereader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthombashi%2Fpytablereader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthombashi%2Fpytablereader/lists"}