https://github.com/frictionlessdata/tableschema-pandas-py
Generate Pandas frames, load and extract data, based on JSON Table Schema descriptors.
https://github.com/frictionlessdata/tableschema-pandas-py
Last synced: 9 months ago
JSON representation
Generate Pandas frames, load and extract data, based on JSON Table Schema descriptors.
- Host: GitHub
- URL: https://github.com/frictionlessdata/tableschema-pandas-py
- Owner: frictionlessdata
- License: lgpl-3.0
- Created: 2016-05-06T06:50:34.000Z (about 10 years ago)
- Default Branch: main
- Last Pushed: 2021-06-01T12:56:17.000Z (about 5 years ago)
- Last Synced: 2024-05-03T06:22:28.923Z (about 2 years ago)
- Language: Python
- Homepage:
- Size: 67.4 KB
- Stars: 51
- Watchers: 9
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# tableschema-pandas-py
[](https://travis-ci.org/frictionlessdata/tableschema-pandas-py)
[](https://coveralls.io/r/frictionlessdata/tableschema-pandas-py?branch=master)
[](https://pypi.python.org/pypi/tableschema-pandas)
[](https://github.com/frictionlessdata/tableschema-pandas-py)
[](https://gitter.im/frictionlessdata/chat)
Generate and load Pandas data frames [Table Schema](http://specs.frictionlessdata.io/table-schema/) descriptors.
## Features
- implements `tableschema.Storage` interface
## Contents
- [Getting Started](#getting-started)
- [Installation](#installation)
- [Documentation](#documentation)
- [API Reference](#api-reference)
- [`Storage`](#storage)
- [Contributing](#contributing)
- [Changelog](#changelog)
## Getting Started
### Installation
The package use semantic versioning. It means that major versions could include breaking changes. It's highly recommended to specify `package` version range in your `setup/requirements` file e.g. `package>=1.0,<2.0`.
```
$ pip install tableschema-pandas
```
## Documentation
```python
# pip install datapackage tableschema-pandas
from datapackage import Package
# Save to Pandas
package = Package('http://data.okfn.org/data/core/country-list/datapackage.json')
storage = package.save(storage='pandas')
print(type(storage['data']))
#
print(storage['data'].head())
# Name Code
# 0 Afghanistan AF
# 1 Åland Islands AX
# 2 Albania AL
# 3 Algeria DZ
# 4 American Samoa AS
# Load from Pandas
package = Package(storage=storage)
print(package.descriptor)
print(package.resources[0].read())
```
Storage works as a container for Pandas data frames. You can define new data frame inside storage using `storage.create` method:
```python
>>> from tableschema_pandas import Storage
>>> storage = Storage()
```
```python
>>> storage.create('data', {
... 'primaryKey': 'id',
... 'fields': [
... {'name': 'id', 'type': 'integer'},
... {'name': 'comment', 'type': 'string'},
... ]
... })
>>> storage.buckets
['data']
>>> storage['data'].shape
(0, 0)
```
Use `storage.write` to populate data frame with data:
```python
>>> storage.write('data', [(1, 'a'), (2, 'b')])
>>> storage['data']
id comment
1 a
2 b
```
Also you can use [tabulator](https://github.com/frictionlessdata/tabulator-py) to populate data frame from external data file. As you see, subsequent writes simply appends new data on top of existing ones:
```python
>>> import tabulator
>>> with tabulator.Stream('data/comments.csv', headers=1) as stream:
... storage.write('data', stream)
>>> storage['data']
id comment
1 a
2 b
1 good
```
## API Reference
### `Storage`
```python
Storage(self, dataframes=None)
```
Pandas storage
Package implements
[Tabular Storage](https://github.com/frictionlessdata/tableschema-py#storage)
interface (see full documentation on the link):

> Only additional API is documented
__Arguments__
- __dataframes (object[])__: list of storage dataframes
## Contributing
> The project follows the [Open Knowledge International coding standards](https://github.com/okfn/coding-standards).
Recommended way to get started is to create and activate a project virtual environment.
To install package and development dependencies into active environment:
```bash
$ make install
```
To run tests with linting and coverage:
```bash
$ make test
```
## Changelog
Here described only breaking and the most important changes. The full changelog and documentation for all released versions could be found in nicely formatted [commit history](https://github.com/frictionlessdata/tableschema-pandas-py/commits/master).
#### v1.1
- Added support for composite primary keys (loading to pandas)
#### v1.0
- Initial driver implementation