Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/manesioz/rilly
Distributed change data capture (CDC) framework for Google BigQuery
https://github.com/manesioz/rilly
change-data-capture distributed-systems google-bigquery kafka pubsub python3
Last synced: 24 days ago
JSON representation
Distributed change data capture (CDC) framework for Google BigQuery
- Host: GitHub
- URL: https://github.com/manesioz/rilly
- Owner: manesioz
- License: mit
- Created: 2019-12-12T14:21:32.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-01-15T03:32:03.000Z (almost 5 years ago)
- Last Synced: 2024-09-29T15:11:57.932Z (about 1 month ago)
- Topics: change-data-capture, distributed-systems, google-bigquery, kafka, pubsub, python3
- Language: Python
- Homepage:
- Size: 28.3 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Distributed change data capture (CDC) platform for Google BigQuery
### What is Change Data Capture?
Change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data. Instead of continuously polling a database for changes (which is costly if you do it often and inaccurate if you don't), `rilly` uses the log-based approach (as does [`debezium`](https://debezium.io) and all other major CDC frameworks).### Why `rilly`?
There is currently no CDC plug-in for BigQuery that I am aware of, and certainly none for Python. The goal of this package is to be as simple and non-opinionated as possible to allow developers to have full control over how they want to stream and parse their change events.### Installation
```python
pip install rilly
```### Authentication
This library uses Google's PubSub and Stackdriver APIs, so follow the authentication process [here](https://cloud.google.com/pubsub/docs/reference/libraries#setting_up_authentication).### Usage
Say you want to track all update/delete/insert events in your BigQuery dataset. After authenticating the Google Python Client APIs:```python
from rilly import logging, stream#create a PubSub topic to send your change events to
stream.create_pubsub_topic('my-project-id', 'pubsub-topic')#create sink to send logs to PubSub topic
logging.create_sink('sink-id', 'my-project-id', 'my-dataset-id', pubsub_topic='pubsub-topic')#custom callback function to perform some action on each event
def custom_callback(message: str) -> str:
print('Received message data: {}'.format(message))
return message
#create subscription to PubSub topic, apply custom_callback() to each streamed log
stream.subscribe('my-project-id', 'pubsub-topic', 'cdc-subscription', 30, custom_callback)
```