{"id":15065890,"url":"https://github.com/vegaprotocol/data-node","last_synced_at":"2025-04-10T13:34:44.630Z","repository":{"id":37919372,"uuid":"384078522","full_name":"vegaprotocol/data-node","owner":"vegaprotocol","description":"A rich API server for Vega Protocol","archived":false,"fork":false,"pushed_at":"2022-07-28T14:21:37.000Z","size":33361,"stargazers_count":3,"open_issues_count":2,"forks_count":1,"subscribers_count":14,"default_branch":"develop","last_synced_at":"2025-03-24T12:13:42.821Z","etag":null,"topics":["api-server","graphql-server","grpc","postgres","vega-protocol"],"latest_commit_sha":null,"homepage":"https://vega.xyz","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vegaprotocol.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null}},"created_at":"2021-07-08T09:55:13.000Z","updated_at":"2024-09-22T02:36:04.000Z","dependencies_parsed_at":"2022-07-10T15:33:32.762Z","dependency_job_id":null,"html_url":"https://github.com/vegaprotocol/data-node","commit_stats":null,"previous_names":[],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vegaprotocol%2Fdata-node","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vegaprotocol%2Fdata-node/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vegaprotocol%2Fdata-node/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vegaprotocol%2Fdata-node/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vegaprotocol","download_url":"https://codeload.github.com/vegaprotocol/data-node/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248225869,"owners_count":21068078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-server","graphql-server","grpc","postgres","vega-protocol"],"created_at":"2024-09-25T00:56:58.088Z","updated_at":"2025-04-10T13:34:44.591Z","avatar_url":"https://github.com/vegaprotocol.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data node\n\nVersion 0.53.0\n\nA service exposing read only APIs built on top of [Vega](https://github.com/vegaprotocol/vega) platform.\n\n**Data node** provides the following core features:\n\n- Consume all events from Vega core\n- Aggregates received events and stores the aggregated data\n- Serves stored data via [APIs](#apis)\n- Allows advanced configuration [Configure a node](#configuration)\n\n## Links\n\n- For **new developers**, see [Getting Started](GETTING_STARTED.md).\n- For **updates**, see the [Change log](CHANGELOG.md) for major updates.\n- For **architecture**, please read the [documentation](docs/index.md) to learn about the design for the system and its architecture.\n- Please [open an issue](https://github.com/vegaprotocol/data-node/issues/new) if anything is missing or unclear in this documentation.\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cstrong\u003eTable of Contents\u003c/strong\u003e (click to expand)\u003c/summary\u003e\n\n\u003c!-- toc --\u003e\n\n- [Installation](#installation)\n- [Configuration](#configuration)\n- [APIs](#apis)\n- [Provisioning](#provisioning)\n- [Benchmarks](#benchmarks)\n- [Troubleshooting \u0026 debugging](#troubleshooting--debugging)\n\n\u003c!-- tocstop --\u003e\n\n\u003c/details\u003e\n\n## Installation\n\nTo install see [Getting Started](GETTING_STARTED.md).\n\n## Configuration\n\nData node is initialised with a set of default configuration with the command `data-node init`. To override any of the defaults edit your `config.toml` typically found in the `~/.data-node` directory. Example:\n\n```toml\n[Matching]\n  Level = 0\n  ProRataMode = false\n  LogPriceLevelsDebug = false\n  LogRemovedOrdersDebug = false\n```\n\n## PostgreSQL\nAs of version 0.53, data node uses [PostgreSQL](https://www.postgresql.org) as its storage back end instead of the previous mix of in-memory and BadgerDB file stores. We also make use of Postgres extension called [TimescaleDB](https://www.timescale.com), which adds a number of time series specific features.\n\nPostgres is not an embedded database, but a separate server application that needs to be running before datanode starts, and a side effect of this transition is a little bit of setup is required by the data node operator.\n\nBy default, data node will attempt to connect to a database called `vega` listening on `localhost:5432`, using the username and password `vega`. This is of course all configurable in data node’s `config.toml` file.\n\nWe are developing using `PostgreSQL 14.2` and `Timescale 2.7.1` and _strongly recommend_ that you also use the same versions.\n\n```json\n​​[SQLStore]\n UseEmbedded = false\n [SQLStore.ConnectionConfig]\n   Host = \"localhost\"\n   Port = 5432\n   Username = \"vega\"\n   Password = \"vega\"\n   Database = \"vega\"\n   UseTransactions = true\n\n```\n### Persistence\nCurrently the database is destroyed if it exists and recreated at data node start-up, though we expect this to change in the not too distant future once the schema has settled down and we add support for starting/stopping data nodes without replaying the entire chain.\n\nThere are a few different ways you can get postgres \u0026 timescale up and running.\n\n### Using docker\nThis is probably the most straightforward and reliable way to get up and running.\n\nTimescale supply a docker image, so assuming you [already have](https://www.docker.com/get-started/) docker installed, it is a simple matter of:\n\n```sh\ndocker run --rm \\\n           -d\n           -e POSTGRES_USER=vega \\\n           -e POSTGRES_PASSWORD=vega \\\n           -e POSTGRES_DB=vega \\\n           -p 5432:5432 \\\n           timescale/timescaledb:2.7.1-pg14\n```\n\n### Using your operating system's native packages\n\nTimescale [have a set of instructions](https://docs.timescale.com/install/latest/self-hosted/) for installing Postgres/Timescale using `.deb` or `.rpm` they have built. If you follow these and get postgres running as a system service you'll then have to create a database, user, and password for the data node to use. For example:\n\n```sql\n➜  ~ sudo -u postgres psql\npsql (14.3 (Ubuntu 14.3-0ubuntu0.22.04.1))\nType \"help\" for help.\n\n\npostgres=# create database vega;\nCREATE DATABASE\n\npostgres=# create user vega with password 'vega';\nCREATE ROLE\n\npostgres=# grant all privileges on database vega to vega;\nGRANT\n```\n\n### Using 'embedded' PostgreSQL\nAs mentioned above, PostgreSQL is not an embedded database. However, the good folks over at [embedded-postgres-go](https://github.com/fergusstrange/embedded-postgres) didn't let that stop them trying.\n\nThis go package allows us to start a PostgreSQL server from the data-node. It does this by\n- Examining your system to figure out what platform/architecture it is\n- Downloading an appropriate PostgreSQL binary installation\n- Unpacking it to a temporary location\n- Configuring and launching Postgres as a child process of data-node\n\nembedded-postgres-go doesn't come with support for TimescaleDB so we forked it and built a set of our own binaries for a limited set of platforms which we [host on GitHub](https://github.com/vegaprotocol/embedded-postgres-binaries/releases/).\n\nWe use it for running integration tests and it works quite well however, we haven't tested it on a wide range of platforms, and ran into a few odd issues usually related to linking to various system libraries or sometimes not shutting down cleanly.\n\nYou can launch postgres in this way either with the command either using\n\n```sh\ndata-node postgres run\n```\n\nWhich will launch embedded postgres in it's own process or\n\nOr by setting\n```json\n​​[SQLStore]\n  UseEmbedded = true\n```\n\nWhich will cause data-node to launch Postgres as it starts up, and stop it when it exits. While convenient, if data-node is forcefully killed and doesn't have chance to shutdown it is possible for postgres to keep on running. Postgres then needs to be manually killed to prevent 'unable to bind to port' errors on the next start.\n\nIn both cases, the files for the database will be stored in your 'state' directory, e.g. `~/.local/state/vega/data-node/` on Linux.\n\n### Building from source\n\nIt's quite straightforward; if this is your preferred option you probably already know how to do it. There are instructions on the timescale website.\n\n### Using a cloud database provider\n\nThis isn't something we've tested yet, but it's something we plan to investigate in the future. Feel very free to give it a try; our main concern is that the latency of the connection may cause data-node to be unable to process blocks as fast as they are produced.\n\nTimescale provide a hosted service, I believe `AWS` do as well.\n## Vega core streaming\n\nData requires an instance of Vega core node for it's meaningful function. Please see [Vega Getting Started](https://github.com/vegaprotocol/vega/blob/develop/GETTING_STARTED.md).\nThe data node will listen on default port `3002` for incoming connections from Vega core node.\n\n## APIs\n\nIn order for clients to communicate with data nodes, we expose a set of APIs and methods for reading data.\n\nThere are currently three protocols to communicate with the data node APIs:\n\n### gRPC\n\ngRPC is an open source remote procedure call (RPC) system initially developed at Google. In data node the gRPC API features streaming of events in addition to standard procedure calls.\n\nThe default port (configurable) for the gRPC API is `3007` and matches the [gRPC protobuf definition](https://github.com/vegaprotocol/protos).\n\n### GraphQL\n\n[GraphQL](https://graphql.org/) is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data, originally developed at Facebook. The [Console](https://github.com/vegaprotocol/console) uses the GraphQL API to retrieve data including streaming of events.\n\nThe GraphQL API is defined by a [schema](gateway/graphql/schema.graphql). External clients will use this schema to communicate with Vega.\n\nQueries can be tested using the GraphQL playground app which is bundled with a node. The default port (configurable) for the playground app is `3008` accessing this in a web browser will show a web app for testing custom queries, mutations and subscriptions.\n\n#### GraphQL SSL\n\n**GraphQL subscriptions do not work properly unless the HTTPS is enabled**.\n\nTo enable TLS on the GraphQL port, set\n```toml\n  [Gateway.GraphQL]\n    HTTPSEnabled = true\n```\n\nYou will need your data node to be reachable over the internet with a proper fully qualified domain name, and a matching certificate. If you already have a certificate and corresponding private key file, you can specify them as follows:\n```toml\n  [Gateway.GraphQL]\n    CertificateFile = \"/path/to/certificate/file\"\n    KeyFile = \"/path/to/key/file\"\n```\n\nIf you prefer, the data node can manage this for you by automatically generating a certificate and using `LetsEncrypt` to sign it for you.\n\n```toml\n  [Gateway.GraphQL]\n    HTTPSEnabled = true\n    AutoCertDomain = \"my.lovely.domain.com\"\n```\n\nHowever, it is a requirement of the `LetsEncrypt` validation process that the the server answering its challenge is running on the standard HTTPS port (443). This means you must either\n- Forward port 443 on your machine to the GraphQL port (3008 by default) using `iptables` or similar\n- Directly use port 443 for the GraphQL server in data-node by specifying\n```toml\n  [Gateway.GraphQL]\n    Port = 443\n```\nNote that Linux systems generally require processes listening on ports under 1024 to either\n  - run as root, or\n  - be specifically granted permission, e.g. by launching with\n  ```\n  setcap cap_net_bind_service=ep data-node run\n  ```\n\n### REST\n\nREST provides a standard between computer systems on the web, making it easier for systems to communicate with each other. It is arguably simpler to work with than gRPC and GraphQL. In Vega the REST API is a reverse proxy to the gRPC API, however it does not support streaming.\n\nThe default port (configurable) for the REST API is `3009` and we use a reverse proxy to the gRPC API to deliver the REST API implementation.\n\n## Troubleshooting \u0026 debugging\n\nThe application has structured logging capability, the first port of call for a crash is probably the Vega and Tendermint logs which are available on the console if running locally or by journal plus syslog if running on test networks. Default location for log files:\n\n* `/var/log/vega.log`\n\nEach internal Go package has a logging level that can be set at runtime by configuration. Setting the logging `Level` to `-1` for a package will enable all debugging messages for the package which can be useful when trying to analyse a crash or issue.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvegaprotocol%2Fdata-node","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvegaprotocol%2Fdata-node","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvegaprotocol%2Fdata-node/lists"}