Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sensiblecodeio/xypath
Navigating around a grid of cells like XPath for spreadsheets; supports Python 3.5+
https://github.com/sensiblecodeio/xypath
csv excel python spreadsheet xls xlsx
Last synced: about 2 months ago
JSON representation
Navigating around a grid of cells like XPath for spreadsheets; supports Python 3.5+
- Host: GitHub
- URL: https://github.com/sensiblecodeio/xypath
- Owner: sensiblecodeio
- License: bsd-2-clause
- Created: 2013-05-23T15:35:12.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2023-02-01T14:11:16.000Z (almost 2 years ago)
- Last Synced: 2024-08-06T03:03:23.576Z (5 months ago)
- Topics: csv, excel, python, spreadsheet, xls, xlsx
- Language: Python
- Homepage: https://sensiblecode.io
- Size: 3.5 MB
- Stars: 46
- Watchers: 11
- Forks: 7
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - sensiblecodeio/xypath - Navigating around a grid of cells like XPath for spreadsheets; supports Python 3.5+ (Python)
README
## xypath
Select different parts of a spreadsheet via functions.
```
----------------------------------------------------------------------
| animal | habitat | max age | population 2012 | Population 2013 |
| zebra | grasslands | 40 | 20M | 21M |
| baboon | jungle | 45 | 35M | 40M |
| narwhal | ocean | 115 | 1M | 500,000 |
----------------------------------------------------------------------
| AnimalSheet |
* not strictly accurate
```## Example
Question: **What is the change in population for each type of animal between 2012 and 2013?**
First, we need to load the spreadsheet. Supposing we've got it on disk already.
```python
import xypath
table = xypath.Table.from_filename("animals.xls", table_name='AnimalSheet')
```Next we need to find the *row* and *column* headers. In this case the row headers are "zebra", "baboon" and "narwhal" and the column headers are "population 2012" and "Population 2013" (note the change in case!)
One way of getting the row headers would be to find "animal" and *fill down* to get the actual animals:
```python
animals = table.filter('animal').assert_one().fill(xypath.DOWN)
```Let's break that down. First we use the table ``filter`` method which finds one or more cells matching the given argument. If you simply provide a string like ``animal`` it will search for exact matches. ``filter`` returns a ``Bag`` which is a cell container with some useful properties.
``assert_one`` blows up if the Bag doesn't contain exactly one cell.
We then do ``fill(xypath.DOWN)`` which gets all the cells below and *excluding* the "animal" cell.
Next we need to find the column containing the year. For this we will use a regular expression.
```python
years = table.filter(re.compile('[Pp]opulation \d{4}'))
```The ``filter`` function recognises that it's been passed a regular expression and uses it to find matching cells in the table.
Now we've got two bags:
1. ``animal`` contains "zebra", "baboon" and "narwhal"
2. ``years`` contains "population 2012" and "Population 2013"Next we want to find the cells which line up with each of those animal and year cells.
```python
for (animal, year, population) = animals.junction(years):
print("{0} : {1} : {2}".format(
animal.value, year.value, population.value))
```This would print:
```
zebra : population in 2012 : 20M
baboon : population in 2012 : 35M
narwhal : population in 2012 : 1M
zebra : Population in 2013 : 21M
baboon : Population in 2013 : 40M
narwhal : Population in 2013 : 500,000
```You can ``junction`` two Bags of header cells together the intersection between each pair of cells. Junction always yields a three-long tuple, in this case containing ``(animal, year, population)``.
It's left up to the reader to extract the year from the population string and convert the millions to reasonable numbers!
## Loading tables
```python
table = xypath.Table.from_filename("mysheet.xls", table_name="Sheet 1")
```By using a file-object:
```python
with open('spreadsheet.xls') as f:
table = xypath.Table.from_file_object(f, table_index=0)
```Note that you need to specify which table in the spreadsheet ``table_name`` (relating to ie Excel sheet name) or ``table_index`` (where 0 is the first sheet found)
From a URL (via a StringIO file-like-object):
```python
import requests
from cStringIO import StringIOresponse = requests.get("http://example.io/mysheet.xls")
f = StringIO(response.content)
table = xypath.Table.from_file_object(f, table_name='Sheet1')
```Or using the underlying messytables library directly:
```python
xypath_tables = []
with open('spreadsheet.xls', 'rb') as f:
for messy_table in messytables.excel.XLSTableSet(f).tables:
xypath_table = xypath.Table.from_messy(messy_table)
```## Filtering
You can use the ``filter(..)`` method to get ``Bags`` of cells which match certain criteria.
```python
bag.filter("kitten") # cell.value is exactly 'kitten'
table.filter(re.compile(".a.*e")) # regular expression of cell.value
table.filter(hamcrest.ends_with("c")) # any pyhamcrest matcher on cell.value
```All the above matchers act exclusively on the *value* of the cell.
It's also possible to pass any callable to ``filter``. In this case, filter passes the ``cell`` to the callable
so it's possible to access properties such as ``cell.x`` and ``cell.value``:```python
def is_header_cell(cell):
return cell.x == 0 and cell.value.lower().startswith('population ')table.filter(is_header_cell)
table.filter(lambda cell: cell.x == 2) # explicit lambda function on each cell
```## Other Selection Methods
Select different cells in the table based on those currently in the bag
```python
def has_same_text(table_cell, bag_cell):
return table_cell.value == bag_cell.valuebag.select(has_same_text) # cells with same value
bag.shift(x=-2) # cells two to the left of the current cells
bag_b = bag_a.fill(xypath.LEFT) # cells to the LEFT (RIGHT, UP, DOWN, UP_RIGHT ...) excluding bag_a
```## Method Chaining
Chain methods together; everything returns a bag!
```python
dollars = table.filter("Amount").assert_one().fill(xypath.DOWN).filter(re.search("$"))
```## Singleton Bags
When a Bag contains only one cell (a *singleton* Bag), you can call ``bag.value`` and get the value of the single cell inside the bag. This is also true for ``cell.x``, ``cell.y`` etc.
Get the value (of a bag containing only one cell) or
```python
lonely_cell.value
```Get cells which are at the intersection of two other cells:
```
triplets = row_header_bag.junction(column_header_bag)
```------
## Running the tests
Set up a virtual environment, and install `requirements.txt`:
```shell
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
```Then run the tests using `nosetests`:
```shell
nosetests # runs all tests
nosetests test/test_bag.py # runs a single test
```