{"id":13857237,"url":"https://github.com/cantabular/xypath","last_synced_at":"2026-01-21T13:30:51.269Z","repository":{"id":8607705,"uuid":"10247036","full_name":"sensiblecodeio/xypath","owner":"sensiblecodeio","description":"Navigating around a grid of cells like XPath for spreadsheets; supports Python 3.5+","archived":false,"fork":false,"pushed_at":"2023-02-01T14:11:16.000Z","size":3672,"stargazers_count":46,"open_issues_count":8,"forks_count":7,"subscribers_count":11,"default_branch":"master","last_synced_at":"2024-08-06T03:03:23.576Z","etag":null,"topics":["csv","excel","python","spreadsheet","xls","xlsx"],"latest_commit_sha":null,"homepage":"https://sensiblecode.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sensiblecodeio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-05-23T15:35:12.000Z","updated_at":"2024-01-29T23:54:04.000Z","dependencies_parsed_at":"2023-02-17T06:15:56.220Z","dependency_job_id":null,"html_url":"https://github.com/sensiblecodeio/xypath","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sensiblecodeio%2Fxypath","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sensiblecodeio%2Fxypath/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sensiblecodeio%2Fxypath/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sensiblecodeio%2Fxypath/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sensiblecodeio","download_url":"https://codeload.github.com/sensiblecodeio/xypath/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225920238,"owners_count":17545450,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","excel","python","spreadsheet","xls","xlsx"],"created_at":"2024-08-05T03:01:31.058Z","updated_at":"2025-07-13T21:32:07.363Z","avatar_url":"https://github.com/sensiblecodeio.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"## xypath\n\nSelect different parts of a spreadsheet via functions.\n\n```\n----------------------------------------------------------------------\n| animal  | habitat    | max age | population 2012 | Population 2013 |\n| zebra   | grasslands | 40      | 20M             | 21M             |\n| baboon  | jungle     | 45      | 35M             | 40M             |\n| narwhal | ocean      | 115     | 1M              | 500,000         |\n----------------------------------------------------------------------\n| AnimalSheet |\n                                               * not strictly accurate\n```\n\n## Example\n\nQuestion: **What is the change in population for each type of animal between 2012 and 2013?**\n\nFirst, we need to load the spreadsheet. Supposing we've got it on disk already.\n\n```python\nimport xypath\ntable = xypath.Table.from_filename(\"animals.xls\", table_name='AnimalSheet')\n```\n\nNext we need to find the *row* and *column* headers. In this case the row headers are \"zebra\", \"baboon\" and \"narwhal\" and the column headers are \"population 2012\" and \"Population 2013\" (note the change in case!)\n\nOne way of getting the row headers would be to find \"animal\" and *fill down* to get the actual animals:\n\n```python\nanimals = table.filter('animal').assert_one().fill(xypath.DOWN)\n```\n\nLet's break that down. First we use the table ``filter`` method which finds one or more cells matching the given argument. If you simply provide a string like ``animal`` it will search for exact matches. ``filter`` returns a ``Bag`` which is a cell container with some useful properties.\n\n``assert_one`` blows up if the Bag doesn't contain exactly one cell.\n\nWe then do ``fill(xypath.DOWN)`` which gets all the cells below and *excluding* the \"animal\" cell.\n\nNext we need to find the column containing the year. For this we will use a regular expression.\n```python\nyears = table.filter(re.compile('[Pp]opulation \\d{4}'))\n```\n\nThe ``filter`` function recognises that it's been passed a regular expression and uses it to find matching cells in the table.\n\nNow we've got two bags:\n\n1. ``animal`` contains \"zebra\", \"baboon\" and \"narwhal\"\n2. ``years`` contains \"population 2012\" and \"Population 2013\"\n\nNext we want to find the cells which line up with each of those animal and year cells.\n\n```python\nfor (animal, year, population) = animals.junction(years):\n    print(\"{0} : {1} : {2}\".format(\n        animal.value, year.value, population.value))\n```\n\nThis would print:\n```\nzebra : population in 2012 : 20M\nbaboon : population in 2012 : 35M\nnarwhal : population in 2012 : 1M\nzebra : Population in 2013 : 21M\nbaboon : Population in 2013 : 40M\nnarwhal : Population in 2013 : 500,000\n```\n\nYou can ``junction`` two Bags of header cells together the intersection between each pair of cells. Junction always yields a three-long tuple, in this case containing ``(animal, year, population)``.\n\nIt's left up to the reader to extract the year from the population string and convert the millions to reasonable numbers!\n\n## Loading tables\n\n```python\ntable = xypath.Table.from_filename(\"mysheet.xls\", table_name=\"Sheet 1\")\n```\n\nBy using a file-object:\n\n```python\nwith open('spreadsheet.xls') as f:\n    table = xypath.Table.from_file_object(f, table_index=0)\n```\n\nNote that you need to specify which table in the spreadsheet ``table_name`` (relating to ie Excel sheet name) or ``table_index`` (where 0 is the first sheet found)\n\n\nFrom a URL (via a StringIO file-like-object):\n\n```python\nimport requests\nfrom cStringIO import StringIO\n\nresponse = requests.get(\"http://example.io/mysheet.xls\")\nf = StringIO(response.content)\ntable = xypath.Table.from_file_object(f, table_name='Sheet1')\n```\n\n\nOr using the underlying messytables library directly:\n\n```python\nxypath_tables = []\nwith open('spreadsheet.xls', 'rb') as f:\n    for messy_table in messytables.excel.XLSTableSet(f).tables:\n        xypath_table = xypath.Table.from_messy(messy_table)\n```\n\n## Filtering\n\nYou can use the ``filter(..)`` method to get ``Bags`` of cells which match certain criteria.\n\n```python\nbag.filter(\"kitten\")                     # cell.value is exactly 'kitten'\ntable.filter(re.compile(\".a.*e\"))        # regular expression of cell.value\ntable.filter(hamcrest.ends_with(\"c\"))    # any pyhamcrest matcher on cell.value\n```\n\nAll the above matchers act exclusively on the *value* of the cell.\n\nIt's also possible to pass any callable to ``filter``. In this case, filter passes the ``cell`` to the callable\nso it's possible to access properties such as ``cell.x`` and ``cell.value``:\n\n```python\ndef is_header_cell(cell):\n    return cell.x == 0 and cell.value.lower().startswith('population ')\n\ntable.filter(is_header_cell)\n\ntable.filter(lambda cell: cell.x == 2)   # explicit lambda function on each cell\n```\n\n## Other Selection Methods\n\nSelect different cells in the table based on those currently in the bag\n```python\ndef has_same_text(table_cell, bag_cell):\n    return table_cell.value == bag_cell.value\n\nbag.select(has_same_text)  # cells with same value\nbag.shift(x=-2)            # cells two to the left of the current cells\nbag_b = bag_a.fill(xypath.LEFT)      # cells to the LEFT (RIGHT, UP, DOWN, UP_RIGHT ...) excluding bag_a\n```\n\n## Method Chaining\n\nChain methods together; everything returns a bag!\n```python\ndollars = table.filter(\"Amount\").assert_one().fill(xypath.DOWN).filter(re.search(\"$\"))\n```\n\n## Singleton Bags\n\nWhen a Bag contains only one cell (a *singleton* Bag), you can call ``bag.value`` and get the value of the single cell inside the bag. This is also true for ``cell.x``, ``cell.y`` etc.\n\nGet the value (of a bag containing only one cell) or\n```python\nlonely_cell.value\n```\n\nGet cells which are at the intersection of two other cells:\n```\ntriplets = row_header_bag.junction(column_header_bag)\n```\n\n\n------\n\n## Running the tests\n\nSet up a virtual environment, and install `requirements.txt`:\n\n```shell\nvirtualenv venv\n. venv/bin/activate\npip install -r requirements.txt\n```\n\nThen run the tests using `nosetests`:\n\n```shell\nnosetests # runs all tests\nnosetests test/test_bag.py # runs a single test\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcantabular%2Fxypath","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcantabular%2Fxypath","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcantabular%2Fxypath/lists"}