Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hirosystems/stacks-event-replay

Last synced: 3 days ago
JSON representation

Host: GitHub
URL: https://github.com/hirosystems/stacks-event-replay
Owner: hirosystems
License: gpl-3.0
Created: 2023-06-08T19:15:05.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-03-20T12:01:09.000Z (8 months ago)
Last Synced: 2024-10-10T14:48:30.142Z (about 1 month ago)
Language: Python
Size: 120 KB
Stars: 2
Watchers: 7
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Stacks Event Replay

## Problem

The Stacks blockchain is only able to emit events live as they happen. This poses a problem in the scenario where the Stacks API needs to be upgraded and its database cannot be migrated to a new schema. For example, if there are breaking changes in the Stacks API's sql schema, like adding a new column which requires events to be replayed.

## Solution

One way to handle this upgrade is to wipe both the Stacks API's database and Stacks node working directory. But that approach would need a re-sync from scratch/genesis.

Alternatively, an event replay feature is possible where the Stacks API keeps stored the HTTP POST requests from the Stacks node event emitter, then streams these events back (replay) to itself. Essentially simulating a wipe & full re-sync, but much quicker.

The Stacks Event Replay is composed of 2 components: parquet generator and events ingestion.

## Installation

The Stacks Event Replay tooling is based on Python. Thus, make sure that you have Python instaled on your system before following the instructions below.

### Installing dependencies

```shell
$ make init
```

## Usage

### Running the Parquet Generator

1. Download a TSV file from the Hiro archive.

```shell
$ curl -L https://archive.hiro.so/mainnet/stacks-blockchain-api/mainnet-stacks-blockchain-api-latest.gz -o ./mainnet-stacks-blockchain-api-latest.gz
```

2. Run the parquet generator using the TSV file as input

```shell
$ python3 -m event_replay --tsv-file mainnet-stacks-blockchain-api-latest.gz
```

An `events` folder is generated with a `dataset` that consists into subfolders and partitioned Parquet files for each event type present in the TSV file.

### Running the Events Ingestor

1. Run the events ingestion inside the [stacks-blockchain-api](https://github.com/hirosystems/stacks-blockchain-api) root folder.

```shell
$ STACKS_EVENTS_DIR="" NODE_OPTIONS="--max-old-space-size=8192" STACKS_CHAIN_ID= node ./lib/index.js from-parquet-events --workers=
```

where:

`STACKS_EVENTS_DIR` is the path to the `events` folder.

`PARALLEL_WORKERS` is the number of workers that will run in parallel. Tests were done using a range between 4 and 8. Spawning several workers can lead to exhaustion of computational resources.

## REMARKS

**WARNING:** Running the event-replay will **wipe out** the stacks-blockchain-api postgres database. The event-replay is a process that deals with a huge ammount of data. Thus, is need a database in a mint state.