{"id":22959561,"url":"https://github.com/probe-lab/parsec","last_synced_at":"2025-08-13T05:32:09.774Z","repository":{"id":96817962,"uuid":"605028341","full_name":"probe-lab/parsec","owner":"probe-lab","description":"፨ Parsec is a DHT performance measurement tool","archived":false,"fork":false,"pushed_at":"2024-11-26T16:05:18.000Z","size":2811,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-26T17:20:56.775Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/probe-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-22T09:40:58.000Z","updated_at":"2024-11-26T16:05:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"d75b198e-afa9-43f4-8c22-e312bfd6c9e8","html_url":"https://github.com/probe-lab/parsec","commit_stats":null,"previous_names":["dennis-tra/parsec","probe-lab/parsec","plprobelab/parsec"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/probe-lab%2Fparsec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/probe-lab%2Fparsec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/probe-lab%2Fparsec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/probe-lab%2Fparsec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/probe-lab","download_url":"https://codeload.github.com/probe-lab/parsec/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229737918,"owners_count":18116536,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-14T18:19:16.554Z","updated_at":"2024-12-14T18:20:23.841Z","avatar_url":"https://github.com/probe-lab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `parsec`\n\n`parsec` is a DHT lookup performance measurement tool. It specifically measures the `PUT` and `GET` performance of the\nIPFS public DHT but could also be configured to measure\nother [libp2p-kad-dht](https://github.com/libp2p/specs/blob/master/kad-dht/README.md) networks.\nThe setup is split into two components: a scheduler and a server.\n\nThe server is just a normal libp2p peer that supports and participates in the public IPFS DHT and exposes a [lean HTTP\nAPI](./server.yaml) that allows the scheduler to issue publication and retrieval operations. Currently, in [ProbeLab's](https://probelab.io/tools/parsec/)\ndeployment, the scheduler goes around all seven server nodes, instructs one to publish provider records for a random data\nblob and asks the other six to look them up. All seven servers take timing measurements about the publication or retrieval latencies and\nreport back the results to the scheduler. The scheduler then tracks this information in a database for later analysis.\n\n## Table of Contents\n\n- [`parsec`](#parsec)\n  - [Table of Contents](#table-of-contents)\n  - [Concepts](#concepts)\n  - [Running](#running)\n  - [Implementing a Server](#implementing-a-server)\n    - [Expose the HTTP API](#expose-the-http-api)\n    - [Node Information](#node-information)\n    - [Heartbeat](#heartbeat)\n    - [Optional: Prometheus Metrics](#optional-prometheus-metrics)\n    - [`ECS_CONTAINER_METADATA_URI_V4` response:](#ecs_container_metadata_uri_v4-response)\n  - [Maintainers](#maintainers)\n  - [Contributing](#contributing)\n  - [License](#license)\n\n\n## Concepts\n\nNext to the concept of servers and schedulers there's the concept of a `fleet`. A fleet is a set of server nodes that\nhave a common configuration. For example, we are running three different fleets with seven nodes each (in different regions): 1) `default` 2) `optprov` 3) `fullrt`.\nEach of these three fleets are configured differently. The `default` fleet uses the default configuration in the `go-libp2p-kad-dht` repository, the `optprov` fleet uses the optimistic provide configuration to publish data into the DHT, and the `fullrt` fleet uses the accelerated DHT client.\n\nSchedulers are then configured to interface with any combination of fleets. Right now, we have one scheduler for each fleet. As said above, it asks one node to publish content, then instructs the others to find the provider records, and then repeats the process with the next peer. However,\nwe could configure a scheduler that does the same thing but with nodes from multiple fleets e.g., `default`+`fullrt` to check if content that's published with one implementation is reachable with another one.\n\n## Running\n\nYou can run\n\n```shell\ndocker compose up\n```\n\nto start two servers and one scheduler and see them interact.\n\n## Implementing a Server\n\nRight now, the server component is implemented in Go and uses the [go-libp2p-kad-dht](https://github.com/libp2p/go-libp2p-kad-dht) implementation.\nIt consequently measures the Go implementations performance. However, other implementations exist that support the DHT protocol. These\nimplementations can be easily integrated with this measurement infrastructure. They just need to behave as a parsec server.\nThe existing schedulers can then be reused.\n\nHere are the things that a new implementation would need to do:\n\n1. Expose an HTTP interface with three endpoints\n2. Upon startup write general information about the node configuration to a postgres database.\n3. Regularly refresh the heartbeat field in the database.\n\n### Expose the HTTP API\n\nYou can find the OpenAPI specification in the [`./server.yaml`](./server.yaml).\n\n### Node Information\n\nThe new server would need to initialize a postgres client. The default environment variables to configure the client are as follows:\n\n```env\nPARSEC_DATABASE_HOST\nPARSEC_DATABASE_PORT\nPARSEC_DATABASE_NAME\nPARSEC_DATABASE_PASSWORD\nPARSEC_DATABASE_USER\nPARSEC_DATABASE_SSL_MODE\n```\n\nThen upon startup, the server needs to write a row into the `nodes_ecs` table. The definition looks like this:\n\n```sql\nCREATE TABLE nodes_ecs\n(\n    -- auto generated, doesn't need to be set manually\n    id             INT GENERATED ALWAYS AS IDENTITY,\n    -- available CPUs\n    cpu            INT         NOT NULL,\n    -- available memory rounded to the nearest MB\n    memory         INT         NOT NULL,\n    -- the peer ID of the libp2p host\n    peer_id        TEXT        NOT NULL,\n    -- in which region does this server/node run? Given via the AWS_REGION environment var\n    region         TEXT        NOT NULL,\n    -- os.Args - with which arguments was this server run?\n    cmd            TEXT        NOT NULL,\n    -- a fleet identifier (see section `Concepts` above)\n    fleet          TEXT        NOT NULL,\n    -- a JSON document with no enforced schema. Could be anything really. It's intended\n    -- to give information about the exact dependencies, and especially kad-dht implementation\n    -- that the server uses. \n    dependencies   JSONB       NOT NULL,\n    -- the private IP address of the server. The scheduler will query this table and use this\n    -- ip address to contact the HTTP API. In the ECS context it's provided via an environment\n    -- variable called `ECS_CONTAINER_METADATA_URI_V4`. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html\n    -- see below for the JSON structure\n    ip_address     INET        NOT NULL,\n    -- under which port is the HTTP server reachable\n    server_port    SMALLINT    NOT NULL,\n    -- which port does the libp2p host use\n    peer_port      SMALLINT    NOT NULL,\n    -- a timestamp of the last heartbeat\n    last_heartbeat TIMESTAMPTZ,\n    -- a timestamp since when the node is offline (set from the scheduler if the node is unreachable, e.g., crashed)\n    -- but should also be set from server when shutdown gracefully.\n    offline_since  TIMESTAMPTZ,\n    -- when was this node row created.\n    created_at     TIMESTAMPTZ NOT NULL,\n\n    PRIMARY KEY (id)\n);\n```\n\nThis row serves two purposes: 1) it tracks the exact configuration and dependencies of the server and 2) is used for service discovery.\nThe scheduler will query this table for all nodes that matches the `fleet` that the scheduler is after where the `offline_since` field is null and\nthe `last_heartbeat` is not null. In AWS, we're using [VPC peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) between different AWS regions, so we can use private IP addresses for connectivity between the scheduler, servers, and database.\n\n### Heartbeat\n\nThe server must update the `node_ecs` row every minute to indicate it's still alive and happy to accept requests.\n\n### Optional: Prometheus Metrics\n\nTo expose real time metrics about the publication and retrieval performance the Go server exposes a few prometheus metrics:\n\n```\nmetric: parsec_durations\ntype: summary\nbuckets: 50th, 90th, 95th percentile\nmaxAge: 24h\nlabels:\n  type: retrieval_ttfpr | provide_duration\n  success: true | false\n  scheduler: default | optprov | fullrt\n```\n\n```\nmetric: parsec_http_requests_total\ntype: summary\nbuckets: 50th, 90th, 95th percentile\nmaxAge: 24h\nlabels:\n  method: GET | POST | ...\n  path: /retrieve | /provide \n  scheduler: default | optprov | fullrt\n```\n\n### `ECS_CONTAINER_METADATA_URI_V4` response:\n\nThe server can extract the available CPU and Memory from `Limits.CPU` and `Limits.Memory`. Further,\nits own private IP address is in the `Networks` array. Look for the first entry with `NetworkMode == awsvpc`\nand then just take the first entry of `IPv4Addresses` - that's a good enough heuristic so far.\n\n```json\n{\n    \"DockerId\": \"ea32192c8553fbff06c9340478a2ff089b2bb5646fb718b4ee206641c9086d66\",\n    \"Name\": \"curl\",\n    \"DockerName\": \"ecs-curltest-24-curl-cca48e8dcadd97805600\",\n    \"Image\": \"111122223333.dkr.ecr.us-west-2.amazonaws.com/curltest:latest\",\n    \"ImageID\": \"sha256:d691691e9652791a60114e67b365688d20d19940dde7c4736ea30e660d8d3553\",\n    \"Labels\": {\n        \"com.amazonaws.ecs.cluster\": \"default\",\n        \"com.amazonaws.ecs.container-name\": \"curl\",\n        \"com.amazonaws.ecs.task-arn\": \"arn:aws:ecs:us-west-2:111122223333:task/default/8f03e41243824aea923aca126495f665\",\n        \"com.amazonaws.ecs.task-definition-family\": \"curltest\",\n        \"com.amazonaws.ecs.task-definition-version\": \"24\"\n    },\n    \"DesiredStatus\": \"RUNNING\",\n    \"KnownStatus\": \"RUNNING\",\n    \"Limits\": {\n        \"CPU\": 10,\n        \"Memory\": 128\n    },\n    \"CreatedAt\": \"2020-10-02T00:15:07.620912337Z\",\n    \"StartedAt\": \"2020-10-02T00:15:08.062559351Z\",\n    \"Type\": \"NORMAL\",\n    \"LogDriver\": \"awslogs\",\n    \"LogOptions\": {\n        \"awslogs-create-group\": \"true\",\n        \"awslogs-group\": \"/ecs/metadata\",\n        \"awslogs-region\": \"us-west-2\",\n        \"awslogs-stream\": \"ecs/curl/8f03e41243824aea923aca126495f665\"\n    },\n    \"ContainerARN\": \"arn:aws:ecs:us-west-2:111122223333:container/0206b271-b33f-47ab-86c6-a0ba208a70a9\",\n    \"Networks\": [\n        {\n            \"NetworkMode\": \"awsvpc\",\n            \"IPv4Addresses\": [\n                \"10.0.2.100\"\n            ],\n            \"AttachmentIndex\": 0,\n            \"MACAddress\": \"0e:9e:32:c7:48:85\",\n            \"IPv4SubnetCIDRBlock\": \"10.0.2.0/24\",\n            \"PrivateDNSName\": \"ip-10-0-2-100.us-west-2.compute.internal\",\n            \"SubnetGatewayIpv4Address\": \"10.0.2.1/24\"\n        }\n    ]\n}\n```\n\n## Maintainers\n\n[@dennis-tra](https://github.com/dennis-tra).\n\n## Contributing\n\nFeel free to dive in! [Open an issue](https://github.com/probe-lab/parsec/issues/new) or submit PRs.\n\n## License\n\n[MIT](LICENSE) © Dennis Trautwein\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprobe-lab%2Fparsec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprobe-lab%2Fparsec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprobe-lab%2Fparsec/lists"}