{"id":22591615,"url":"https://github.com/stratafoundation/strata-data-pipelines","last_synced_at":"2025-04-10T23:22:36.414Z","repository":{"id":44738042,"uuid":"394772095","full_name":"StrataFoundation/strata-data-pipelines","owner":"StrataFoundation","description":"wumbo-data-pipelines","archived":false,"fork":false,"pushed_at":"2022-01-27T21:55:43.000Z","size":581,"stargazers_count":15,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-24T20:11:18.289Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/StrataFoundation.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-08-10T20:20:33.000Z","updated_at":"2022-08-07T14:55:24.000Z","dependencies_parsed_at":"2022-08-12T11:21:44.606Z","dependency_job_id":null,"html_url":"https://github.com/StrataFoundation/strata-data-pipelines","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StrataFoundation%2Fstrata-data-pipelines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StrataFoundation%2Fstrata-data-pipelines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StrataFoundation%2Fstrata-data-pipelines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/StrataFoundation%2Fstrata-data-pipelines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/StrataFoundation","download_url":"https://codeload.github.com/StrataFoundation/strata-data-pipelines/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248313183,"owners_count":21082815,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-08T09:13:18.146Z","updated_at":"2025-04-10T23:22:36.389Z","avatar_url":"https://github.com/StrataFoundation.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"Strata Data Pipelines\n====================\n\n## Build\n\n```\ndocker build . -t data-pipelines:latest\n```\n\n## Run Anchor Localnet\n\nClone your anchor repo (if capturing events for anchor). Otherwise, follow similar steps for you Solana setup.\n\nRun\n\n```\nanchor localnet\n```\n\nThen upload your idl(s) with\n\n```\nanchor idl init \u003cprogram-id\u003e --filepath \u003cidl.json\u003e --provider.cluster localnet\n```\n\n## Run Data Pipelines\n\nFirst, update `ACCOUNTS` and `ANCHOR_IDLS` in docker-compose.yml to the programs you would like to capture, and the anchor programs you would like to parse.\n\nRun\n\n```\ndocker-compose up\n```\n\nIn this repo. You can also run a subset, for example only run up to the event transformer:\n\n```\ndocker-compose up event-transformer\n```\n\n## Run Strata\n\nIf you're doing local dev for strata, you'll want our leaderboards. \n\nFirst, clone strata api and build:\n\n```\ncd strata-api \u0026\u0026 docker build . -t strata-api:latest\n```\n\n```\ncd strata-compose \u0026\u0026 docker-compose up\n```\n\n## Setup kSQL\n\n\n# Components\n\nSee (and render) architecture.puml for a birds-eye view of the system.\n\n## Kafka S3 Slot Identifier\n\nIdentifies contiguous solana slots and pushes them to a heavily partitioned kafka topic\n\n## Kafka S3 Block Uploader\n\nThis utility pulls blocks for each contiguous Solana slot (as idneitified by the slot identifier) and inserts them into S3. \n\nIt then sends an event pointing to that s3 location to Kafka. We avoid sending the full block to kafka as it may be too large of a message.\n\nNote that because slot identifier slots are partitioned, we can horizontally scale this uploader as many times as there are partitions. We found we needed 3-4 to keep up with mainnet.\n\n## Event Transformer\n\nReads the events from Kafka S3 Block Uploader, pulls the blocks from S3, and transforms the transaction data into usable JSON events. Each event has common fields like `type`, `blockTime`, `slot`.\n\nThis gives us a fat topic of all events occurring on the blockchain\n\n## ksqlDB\n\nLooking at `ksql/`, you can see all of our [ksqlDB](https://docs.ksqldb.io/en/latest) queries. These queries turn the firehose of `json.solana.events` topic into useful tables and streams.\n\nThe main usecase right now for these streams is to create leaderboards both on holders of individual accounts, and top tokens leaderboards\n\n## Leaderboard Redis Inserters\n\nThese read from streams generated by ksqlDB and insert them into Redis sorted sets so that we can power a fast graphQL API.\n\n# Deploying\n\nYou should use the `strata-terraform` repo to deploy the full pipeline. We use app.terraform.io to provision and launch terraform objects on AWS.\n\n# Local Development\n\nBoot up docker compose, but excluding the services you don't need. You can do this by passing args\n\n```bash\ndocker-compose up minio kafka redis kowl\n```\n\nNow, you can launch whatever utility you want using vscode tasks that exist for this purpose. \n\nYou can use kowl at localhost:8080 to see what's going into the topics.\n\n# Trophies\n\nTo test trophy sending, you can run \n\n```\njq -rc . tests/resources/trophy.json | kafka-console-producer.sh --topic json.solana.trophies --bootstrap-server localhost:29092\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstratafoundation%2Fstrata-data-pipelines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstratafoundation%2Fstrata-data-pipelines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstratafoundation%2Fstrata-data-pipelines/lists"}