Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/danieljdufour/table-extractor
Extract normalized tables from CSVs, Excel Spreadsheets, Word Docs, and Web Pages
https://github.com/danieljdufour/table-extractor
csv doc docx excel tsv word-documents
Last synced: about 1 month ago
JSON representation
Extract normalized tables from CSVs, Excel Spreadsheets, Word Docs, and Web Pages
- Host: GitHub
- URL: https://github.com/danieljdufour/table-extractor
- Owner: DanielJDufour
- License: apache-2.0
- Created: 2017-03-05T17:01:06.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-02-17T18:06:00.000Z (almost 7 years ago)
- Last Synced: 2024-12-06T07:35:59.033Z (2 months ago)
- Topics: csv, doc, docx, excel, tsv, word-documents
- Language: Python
- Homepage:
- Size: 31.3 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Build Status](https://travis-ci.org/DanielJDufour/table-extractor.svg?branch=master)](https://travis-ci.org/DanielJDufour/table-extractor)
# table-extractor
Extract normalized tables from CSVs, Excel Spreadsheets, Word Docs, and Web PagesA table is basically a list of rows. And a row is basically a list of values.
# Installation
```
pip install table-extractor
```# Use
```
from table_extractor import extract_tables
tables = extract_tables("/tmp/top_5_movies.docx")
# [[["Name", "Rating"], ["The Shawshank Redemption", 9.2], ["The Godfather", 9.2], ["The Godfather: Part II", 9.2], ["The Dark Knight", 8.9], ["12 Angry Men", 8.9]]]
```
# Testing
To test the package run
```
python3 -m unittest table_extractor.tests.test
```