https://github.com/thombashi/pytablereader
A Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
https://github.com/thombashi/pytablereader
csv excel google-sheets html json ltsv markdown mediawiki pandas pandas-dataframe python-library reader sqlite table tsv
Last synced: 17 days ago
JSON representation
A Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
- Host: GitHub
- URL: https://github.com/thombashi/pytablereader
- Owner: thombashi
- License: mit
- Created: 2016-10-23T15:45:24.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-06-25T04:15:35.000Z (almost 2 years ago)
- Last Synced: 2024-05-17T00:01:48.830Z (11 months ago)
- Topics: csv, excel, google-sheets, html, json, ltsv, markdown, mediawiki, pandas, pandas-dataframe, python-library, reader, sqlite, table, tsv
- Language: Python
- Homepage: https://pytablereader.rtfd.io/
- Size: 958 KB
- Stars: 105
- Watchers: 9
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
.. contents:: **pytablereader**
:backlinks: top
:depth: 2Summary
=========
`pytablereader `__ is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV... image:: https://badge.fury.io/py/pytablereader.svg
:target: https://badge.fury.io/py/pytablereader
:alt: PyPI package version.. image:: https://img.shields.io/pypi/pyversions/pytablereader.svg
:target: https://pypi.org/project/pytablereader
:alt: Supported Python versions.. image:: https://img.shields.io/pypi/implementation/pytablereader.svg
:target: https://pypi.org/project/pytablereader
:alt: Supported Python implementations.. image:: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml/badge.svg
:target: https://github.com/thombashi/pytablereader/actions/workflows/lint_and_test.yml
:alt: CI status of Linux/macOS/Windows.. image:: https://coveralls.io/repos/github/thombashi/pytablereader/badge.svg?branch=master
:target: https://coveralls.io/github/thombashi/pytablereader?branch=master
:alt: Test coverage.. image:: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql/badge.svg
:target: https://github.com/thombashi/pytablereader/actions/workflows/github-code-scanning/codeql
:alt: CodeQLFeatures
--------
- Extract structured tabular data from various data format:
- CSV / Tab separated values (TSV) / Space separated values (SSV)
- Microsoft Excel :superscript:`TM` file
- `Google Sheets `_
- HTML (``table`` tags)
- JSON
- `Labeled Tab-separated Values (LTSV) `__
- `Line-delimited JSON(LDJSON) `__ / NDJSON / JSON Lines
- Markdown
- MediaWiki
- SQLite database file
- Supported data sources are:
- Files on a local file system
- Accessible URLs
- ``str`` instances
- Loaded table data can be used as:
- `pandas.DataFrame `__ instance
- ``dict`` instanceExamples
==========
Load a CSV table
------------------
:Sample Code:
.. code-block:: pythonimport pytablereader as ptr
import pytablewriter as ptw# prepare data ---
file_path = "sample_data.csv"
csv_text = "\n".join([
'"attr_a","attr_b","attr_c"',
'1,4,"a"',
'2,2.1,"bb"',
'3,120.9,"ccc"',
])with open(file_path, "w") as f:
f.write(csv_text)# load from a csv file ---
loader = ptr.CsvTableFileLoader(file_path)
for table_data in loader.load():
print("\n".join([
"load from file",
"==============",
"{:s}".format(ptw.dumps_tabledata(table_data)),
]))# load from a csv text ---
loader = ptr.CsvTableTextLoader(csv_text)
for table_data in loader.load():
print("\n".join([
"load from text",
"==============",
"{:s}".format(ptw.dumps_tabledata(table_data)),
])):Output:
.. code-block::load from file
==============
.. table:: sample_data====== ====== ======
attr_a attr_b attr_c
====== ====== ======
1 4.0 a
2 2.1 bb
3 120.9 ccc
====== ====== ======load from text
==============
.. table:: csv2====== ====== ======
attr_a attr_b attr_c
====== ====== ======
1 4.0 a
2 2.1 bb
3 120.9 ccc
====== ====== ======Get loaded table data as pandas.DataFrame instance
----------------------------------------------------:Sample Code:
.. code-block:: pythonimport pytablereader as ptr
loader = ptr.CsvTableTextLoader(
"\n".join([
"a,b",
"1,2",
"3.3,4.4",
]))
for table_data in loader.load():
print(table_data.as_dataframe()):Output:
.. code-block::a b
0 1 2
1 3.3 4.4For more information
----------------------
More examples are available at
https://pytablereader.rtfd.io/en/latest/pages/examples/index.htmlInstallation
============Install from PyPI
------------------------------
::pip install pytablereader
Some of the formats require additional dependency packages, you can install the dependency packages as follows:
- Excel
- ``pip install pytablereader[excel]``
- Google Sheets
- ``pip install pytablereader[gs]``
- Markdown
- ``pip install pytablereader[md]``
- Mediawiki
- ``pip install pytablereader[mediawiki]``
- SQLite
- ``pip install pytablereader[sqlite]``
- Load from URLs
- ``pip install pytablereader[url]``
- All of the extra dependencies
- ``pip install pytablereader[all]``Install from PPA (for Ubuntu)
------------------------------
::sudo add-apt-repository ppa:thombashi/ppa
sudo apt update
sudo apt install python3-pytablereaderDependencies
============
- Python 3.7+
- `Python package dependencies (automatically installed) `__Optional Python packages
------------------------------------------------
- ``logging`` extras
- `loguru `__: Used for logging if the package installed
- ``excel`` extras
- `excelrd `__
- ``md`` extras
- `Markdown `__
- ``mediawiki`` extras
- `pypandoc `__
- ``sqlite`` extras
- `SimpleSQLite `__
- ``url`` extras
- `retryrequests `__
- `pandas `__
- required to get table data as a pandas data frame
- `lxml `__Optional packages (other than Python packages)
------------------------------------------------
- ``libxml2`` (faster HTML conversion)
- `pandoc `__ (required when loading MediaWiki file)Documentation
===============
https://pytablereader.rtfd.io/Related Project
=================
- `pytablewriter `__
- Tabular data loaded by ``pytablereader`` can be written another tabular data format with ``pytablewriter``.Sponsors
====================================
.. image:: https://avatars.githubusercontent.com/u/44389260?s=48&u=6da7176e51ae2654bcfd22564772ef8a3bb22318&v=4
:target: https://github.com/chasbecker
:alt: Charles Becker (chasbecker)
.. image:: https://avatars.githubusercontent.com/u/46711571?s=48&u=57687c0e02d5d6e8eeaf9177f7b7af4c9f275eb5&v=4
:target: https://github.com/Arturi0
:alt: onetime: Arturi0
.. image:: https://avatars.githubusercontent.com/u/3658062?s=48&v=4
:target: https://github.com/b4tman
:alt: onetime: Dmitry Belyaev (b4tman)`Become a sponsor `__