Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aliyun/aliyun-odps-python-sdk
ODPS Python SDK and data analysis framework
https://github.com/aliyun/aliyun-odps-python-sdk
Last synced: 3 months ago
JSON representation
ODPS Python SDK and data analysis framework
- Host: GitHub
- URL: https://github.com/aliyun/aliyun-odps-python-sdk
- Owner: aliyun
- License: apache-2.0
- Created: 2015-10-30T07:07:59.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2024-03-27T11:12:51.000Z (8 months ago)
- Last Synced: 2024-04-14T07:49:31.444Z (7 months ago)
- Language: Python
- Homepage: http://pyodps.readthedocs.io
- Size: 6.2 MB
- Stars: 405
- Watchers: 52
- Forks: 91
- Open Issues: 14
-
Metadata Files:
- Readme: README.md
- License: License
Awesome Lists containing this project
README
# ODPS Python SDK and data analysis framework
[![PyPI version](https://img.shields.io/pypi/v/pyodps.svg?style=flat-square)](https://pypi.python.org/pypi/pyodps) [![Docs](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat-square)](http://pyodps.readthedocs.org/) [![License](https://img.shields.io/pypi/l/pyodps.svg?style=flat-square)](https://github.com/aliyun/aliyun-odps-python-sdk/blob/master/License) ![Implementation](https://img.shields.io/pypi/implementation/pyodps.svg?style=flat-square)
-----------------Elegent way to access ODPS API. [Documentation](http://pyodps.readthedocs.org/)
## Installation
The quick way:
```
pip install pyodps[full]
```If you don't need to use Jupyter, just type
```
pip install pyodps
```The dependencies will be installed automatically.
Or from source code (not recommended for production use):
```shell
$ virtualenv pyodps_env
$ source pyodps_env/bin/activate
$ pip install git+https://github.com/aliyun/aliyun-odps-python-sdk.git
```## Dependencies
* Python (>=2.7), including Python 3+, pypy, Python 3.7 recommended
* setuptools (>=3.0)## Run Tests
- install pytest
- copy conf/test.conf.template to odps/tests/test.conf, and fill it with your account
- run `pytest odps`## Usage
```python
>>> import os
>>> from odps import ODPS
>>> # Make sure environment variable ALIBABA_CLOUD_ACCESS_KEY_ID already set to Access Key ID of user
>>> # while environment variable ALIBABA_CLOUD_ACCESS_KEY_SECRET set to Access Key Secret of user.
>>> # Not recommended to hardcode Access Key ID or Access Key Secret in your code.
>>> o = ODPS(
>>> os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
>>> os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
>>> project='**your-project**',
>>> endpoint='**your-endpoint**',
>>> )
>>> dual = o.get_table('dual')
>>> dual.name
'dual'
>>> dual.table_schema
odps.Schema {
c_int_a bigint
c_int_b bigint
c_double_a double
c_double_b double
c_string_a string
c_string_b string
c_bool_a boolean
c_bool_b boolean
c_datetime_a datetime
c_datetime_b datetime
}
>>> dual.creation_time
datetime.datetime(2014, 6, 6, 13, 28, 24)
>>> dual.is_virtual_view
False
>>> dual.size
448
>>> dual.table_schema.columns
[,
,
,
,
,
,
,
,
,
]
```## DataFrame API
```python
>>> from odps.df import DataFrame
>>> df = DataFrame(o.get_table('pyodps_iris'))
>>> df.dtypes
odps.Schema {
sepallength float64
sepalwidth float64
petallength float64
petalwidth float64
name string
}
>>> df.head(5)
|==========================================| 1 / 1 (100.00%) 0s
sepallength sepalwidth petallength petalwidth name
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
>>> df[df.sepalwidth > 3]['name', 'sepalwidth'].head(5)
|==========================================| 1 / 1 (100.00%) 12s
name sepalwidth
0 Iris-setosa 3.5
1 Iris-setosa 3.2
2 Iris-setosa 3.1
3 Iris-setosa 3.6
4 Iris-setosa 3.9
```## Command-line and IPython enhancement
```
In [1]: %load_ext odpsIn [2]: %enter
Out[2]:In [3]: %sql select * from pyodps_iris limit 5
|==========================================| 1 / 1 (100.00%) 2s
Out[3]:
sepallength sepalwidth petallength petalwidth name
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
```## Python UDF Debugging Tool
```python
#file: plus.py
from odps.udf import annotate@annotate('bigint,bigint->bigint')
class Plus(object):
def evaluate(self, a, b):
return a + b
``````
$ cat plus.input
1,1
3,2
$ pyou plus.Plus < plus.input
2
5
```## Contributing
For a development install, clone the repository and then install from source:
```
git clone https://github.com/aliyun/aliyun-odps-python-sdk.git
cd pyodps
pip install -r requirements.txt -e .
```If you need to modify the frontend code, you need to install [nodejs/npm](https://www.npmjs.com/). To build and
install your frontend code, use```
python setup.py build_js
python setup.py install_js
```## License
Licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)