https://github.com/apache/paimon-python

Apache Paimon Python The Python implementation of Apache Paimon.
https://github.com/apache/paimon-python

big-data data-ingestion flink paimon real-time-analytics spark streaming-datalake table-store

Last synced: 5 months ago
JSON representation

Apache Paimon Python The Python implementation of Apache Paimon.

Host: GitHub
URL: https://github.com/apache/paimon-python
Owner: apache
License: apache-2.0
Created: 2024-08-07T15:00:09.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-12-11T10:15:14.000Z (7 months ago)
Last Synced: 2025-01-30T10:34:35.147Z (5 months ago)
Topics: big-data, data-ingestion, flink, paimon, real-time-analytics, spark, streaming-datalake, table-store
Language: Python
Homepage: https://paimon.apache.org/
Size: 73.1 MB
Stars: 6
Watchers: 8
Forks: 6
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        ![Paimon](https://github.com/apache/paimon/blob/master/docs/static/paimon-simple.png)

[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)

This repo is for Apache Paimon Python SDK.

# Development Notice

## Checkstyle

[Flake8](https://pypi.org/project/flake8/) is used to enforce some coding guidelines.

1. Install flake8 for your Python interpreter using `pip install flake8`.

2. In PyCharm go to "Settings" → "Tools" → "External Tools".

3. Select the "+" button to add a new external tool.

4. Set "Name" to "flake8".

5. Set "Description" to "Code Style Check".

6. Set "Program" to the path of your Python interpreter, e.g. `/usr/bin/python`.

7. Set "Arguments" to `-m flake8 --config=tox.ini`.

8. Set "Working Directory" to `$ProjectFileDir$`.

You can verify the setup by right-clicking on any file or folder in the flink-python project

and running "External Tools" → "flake8".

# Usage

## Java-Based Implementation

We can use `py4j` to leverage Java code to read Paimon data. This section describes how to use this implementation.

### Set Environment Variables

`py4j` need to access a JVM, so we should set JVM arguments (optional) and Java classpath. A convenient way is using

`os` packages to set environment variables which only affect current process.

```python

import os

os.environ['PYPAIMON_JAVA_CLASSPATH'] = '/path/to/dependent_jars/*'

os.environ['_PYPAIMON_JVM_ARGS'] = 'jvm_arg1 jvm_arg2 ...'

```

NOTE: the package has set paimon core and hadoop dependencies. If you just test in local or run code in hadoop, you doesn't

need to set classpath. If you need other dependencies such as OSS/S3 filesystem jars, or special catalog which isn't implemented 

in paimon core, please download jars and set classpath.

# API Reference

TODO

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/apache/paimon-python

Awesome Lists containing this project

README