https://github.com/hengfengli/kcpy
Python library for consuming Kinesis Data Stream.
https://github.com/hengfengli/kcpy
kinesis kinesis-consumer python stream
Last synced: 6 months ago
JSON representation
Python library for consuming Kinesis Data Stream.
- Host: GitHub
- URL: https://github.com/hengfengli/kcpy
- Owner: hengfengli
- License: mit
- Created: 2017-11-26T11:16:41.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T10:54:56.000Z (almost 3 years ago)
- Last Synced: 2025-03-24T09:47:07.229Z (7 months ago)
- Topics: kinesis, kinesis-consumer, python, stream
- Language: Python
- Homepage:
- Size: 199 KB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Kinesis Consumer in Python
[![alt text][build_status]][build_status_url]
[![alt text][mit_license]][mit_license_url]
[![alt text][wheel]][wheel_url]
[![alt text][pyversion]][pyversion_url]
[![alt text][pyimp]][pyimp_url]A kinesis consumer is purely written in python. This is a lightweight wrapper
on top of AWS python library [boto3](https://github.com/boto/boto3). You also can
consume records from Kinesis Data Stream (KDS) via:* Lambda function: I have a demo [kinesis-lambda-sqs-demo](https://github.com/HengfengLi/kinesis-lambda-sqs-demo)
showing how to consume records in a serverless and real-time way.
* [Kinesis Firehose](https://aws.amazon.com/kinesis/firehose/): This is a AWS managed service and easily save records
into different sinks, like S3, ElasticSearch, Redshift.## Installation
Install the package via `pip`:
```bash
pip install kcpy
```## Getting started
```python
from kcpy import StreamConsumer
consumer = StreamConsumer('my_stream_name')
for record in consumer:
print(record)
```The output would look like:
```bash
{
'ApproximateArrivalTimestamp': datetime.datetime(2018, 11, 13, 11, 57, 55, 117807),
'Data': b'Jessica Walter',
'PartitionKey': 'Jessica Walter',
'SequenceNumber': '1'
}
```Or, you can consume stream data with checkpointing:
```python
from kcpy import StreamConsumer
consumer = StreamConsumer('my_stream_name', consumer_name='my_consumer', checkpoint=True)
for record in consumer:
print(record)
```## Checkpointing
Below shows the schema of checkpointing:
```
producer
[stream_1] |
+---------------+---+---+---+---+---+---+---+---+ |
| shard_1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |...| <-------------------+
+---------------+---+---+---+---+---+---+---+---+ |
| shard_2 | 1 | 2 | 3 | 4 | 5 |...| <---------------------------+
+---------------+---+---+---+---+---+---+---+---+---+ |
| shard_3 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |...| <---------------+
+---------------+---+---+---+---+---+---+---+---+---+
^ ^
| |
consumer_1 consumer_2
| |
| +---------+
| |
+------------------+ |
| |
v |
+---------------+-------------+----------+--------+ |
| consumer_name | stream_name | shard_id | seq_no | |
+---------------+-------------+----------+--------+ |
| consumer_1 | stream_1 | shard_1 | 5 | |
| consumer_1 | stream_1 | shard_2 | 15 | |
| consumer_1 | stream_1 | ... | 15 | |
| consumer_1 | stream_1 | shard_N | XX | |
| consumer_2 | stream_1 | shard_1 | 6 | <---+
+---------------+-------------+----------+--------+
```## Features
* Read records from a stream with multiple shards
* Save checkpoint for each shard consumer for a stream## Todo
* ~~Add type checking with mypy~~
* ~~Add tox for automating multiple testing environments~~
* ~~Add the config for travis CI~~
* Support other storage solutions (mysql, dynamodb, redis, etc.) for checkpointing
* Rebalance when the number of shards changes
* Allow kcpy to run on multiple machines## Changelog
### 0.1.7
* Add travis CI config and remove python3.5.
### 0.1.6
* Fix some issues in setup.py.
### 0.1.5
* Add consumer checkpointing with a simple sqlite storage solution.
### 0.1.4
* Pass aws configurations into boto3 client directly.
### 0.1.3
* Update the README.
### 0.1.2
* Add markdown support for long description.
### 0.1.1
* Add a long description.
### 0.1.0
* First version of kcpy.
## License
Copyright (c) 2018 Hengfeng Li. It is free software, and may
be redistributed under the terms specified in the [LICENSE] file.[LICENSE]: /LICENSE
[build_status]: https://secure.travis-ci.org/hengfengli/kcpy.png?branch=master "Build status"
[build_status_url]: https://travis-ci.org/hengfengli/kcpy[mit_license]: https://img.shields.io/pypi/l/kcpy.svg "MIT License"
[mit_license_url]: https://opensource.org/licenses/MIT[wheel]: https://img.shields.io/pypi/wheel/kcpy.svg "kcpy can be installed via wheel"
[wheel_url]: http://pypi.org/project/kcpy/[pyversion]: https://img.shields.io/pypi/pyversions/kcpy.svg "Supported Python versions."
[pyversion_url]: http://pypi.org/project/kcpy/[pyimp]: https://img.shields.io/pypi/implementation/kcpy.svg "Support Python implementations."
[pyimp_url]: http://pypi.org/project/kcpy/