{"id":37403528,"url":"https://github.com/oslokommune/dataplatform-status-api","last_synced_at":"2026-01-16T05:48:26.767Z","repository":{"id":37962566,"uuid":"358528003","full_name":"oslokommune/dataplatform-status-api","owner":"oslokommune","description":"CRUD API for status på opplastede filer ","archived":false,"fork":false,"pushed_at":"2025-10-29T08:07:49.000Z","size":985,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-10-29T10:17:03.372Z","etag":null,"topics":["dataplatform"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oslokommune.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-04-16T08:23:06.000Z","updated_at":"2025-10-29T08:07:52.000Z","dependencies_parsed_at":"2023-02-19T05:16:03.167Z","dependency_job_id":"c1e439ee-581b-4915-96d0-0a2e67537f53","html_url":"https://github.com/oslokommune/dataplatform-status-api","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/oslokommune/dataplatform-status-api","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oslokommune%2Fdataplatform-status-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oslokommune%2Fdataplatform-status-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oslokommune%2Fdataplatform-status-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oslokommune%2Fdataplatform-status-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oslokommune","download_url":"https://codeload.github.com/oslokommune/dataplatform-status-api/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oslokommune%2Fdataplatform-status-api/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28477420,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T03:13:13.607Z","status":"ssl_error","status_checked_at":"2026-01-16T03:11:47.863Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataplatform"],"created_at":"2026-01-16T05:48:26.618Z","updated_at":"2026-01-16T05:48:26.702Z","avatar_url":"https://github.com/oslokommune.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# status-api\nAPI for tracking status through an asynchronous system.\n\n## Deploy\n\nDeploy to both dev and prod is automatic via GitHub Actions on push to\n`main`. You can alternatively deploy from local machine with: `make deploy` or\n`make deploy-prod`.\n\n## Install/Setup\n1. Install [Serverless Framework](https://serverless.com/framework/docs/getting-started/)\n2. Setup venv\n```sh\npython3 -m venv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\n```\n3. Install Serverless plugins: `make init`\n4. Install Python toolchain: `python3 -m pip install (--user) tox black pip-tools`\n   - If running with `--user` flag, add `$HOME/.local/bin` to `$PATH`\n\n# Status architecture\nThe status API defines a set of API endpoints to track events through a system\nfrom start to end. Initially created to track the execution state of an\nasynchronous state machine in AWS, it can be used for any task that needs to\ntrack the status from start to end: both synchronous and asynchronous\nexecutions.\n\n**Example**: *As a user I want to upload a file to a dataset I own. To ensure\nthat the data has been processed, I need to know if the pipeline has executed\nsuccessfully. Since the pipeline is asynchronous I don't get immediate feedback\nafter uploading the file, I want to be able to query the system for the\ndata-processing status*.\n\nThe pseudo-code for the use-case is:\n\n* **user -\u003e upload file**\n  * `trace_id` = return value from upload process\n* **user -\u003e query status API by trace ID**\n  * Repeat query until state (`trace_status`) is `FINISHED`\n* **user -\u003e check if successful**\n  * If `trace_event_status` is `FAILED` -\u003e get a list of what has been done\n    within the pipeline and the reason for failing\n  * If `trace_event_status` is `OK` -\u003e continue with processing other data that\n    relies on data processed in the above steps\n\nEach Lambda function (see below) or step in the execution stage is responsible\nfor setting the correct status for each step.\n\n## Data lineage\nThe status API provides the status of an execution throughout a system, but it\nalso has the added benefit that data lineage can be traced through the system at\nthe same time. When using the `status_wrapper` from\n[okdata-aws](https://github.com/oslokommune/okdata-aws) (or setting them\nmanually using the `Status` class), you can generate trace events and include\nrelevant data, e.g. which files was processed in each step, and what was the\nresult of the data going out of each step in the function.\n\n## Database structure\nDatabase is setup in\n[dataplatform-config](https://github.com/oslokommune/dataplatform-config/tree/main/devops/modules/services/status-api).\n\nThe main fields in the database:\n\n| Field              | Type           | Description                                                                                           | Example                                          |\n| -----------        | -------------- | -----------                                                                                           | -----------                                      |\n| trace_id           | string         | The ID used to trace connected events throughout the system (N-entries). ***Primary partition key***. | `my-dataset-uu-ii-dd`                            |\n| trace_event_id     | uuid           | Unique ID per event (many `trace_event_id` per `trace_id`).                                           | `uu-ii-dd`                                       |\n| domain             | string         | The domain that this status pertains to, e.g. `dataset` for publishing data or events to a dataset    | `dataset`                                        |\n| domain_id          | string         | A domain specific ID to be able to look up the owner or source. Includes version number.              | `dataset.name/version`                           |\n| start_time         | time           | Start of execution. ***Primary sort key***.                                                           | `2020-03-02T12:34:23.042400`                     |\n| end_time           | time           | End of execution                                                                                      | `2020-03-02T12:34:24.042400`                     |\n| trace_status       | string         | Overall status for the trace ID.                                                                      | `CONTINUE`, `FINISHED`                           |\n| trace_event_status | string         | Status for the `trace_event_id`.                                                                      | `OK`, `FAILED`                                   |\n| user               | string         | The user that is used in `handler` to execute the event                                               | `service-user-s3-writer`, `ooo123456`            |\n| component          | string         | The component that is the source of the event.                                                        | `data-uploader`, `s3-writer`                     |\n| operation          | string         | The operation (e.g. method) performed by the component, e.g. Lambda function name                     | `copy`                                           |\n| status_body        | object         | Namespace where the component can add data relevant for the execution.                                | `{\"files_incoming\": [], \"files_outgoing\": []}`   |\n| meta               | object         | Metadata about the execution, such as Git revision of the component.                                  | `{\"git_rev\": ...}`                               |\n| s3_path            | string         | Path of the uploaded file.                                                                            | `raw/yellow/my-dataset/version/edition/file.xls` |\n| duration           | number         | Duration of execution in milliseconds.                                                                | `123`                                            |\n| exception          | object         | Details of exception that has occurred.                                                               | `ZeroDivisionError: division by zero`            |\n| errors             | object         | Error messages to be read by the end user.                                                            | `[{\"message\": {\"nb\": \"\", \"en\": \"\"}, ...]`        |\n\n**Note**: While the `trace_event_id` is unique to each event (row), the\n`trace_id` is only unique to a *group of connected events* (\"a trace\").\n\n## okdata-aws\nThe master of event data is defined in the\n[okdata-aws](https://github.com/oslokommune/okdata-aws) library:\n\n### okdata.aws.status\nExposes a\n[decorator](https://github.com/oslokommune/okdata-aws/blob/master/okdata/aws/status/wrapper.py)\nto use in Lambda functions. See\n[s3-writer](https://github.com/oslokommune/okdata-pipeline/blob/master/okdata/pipeline/writers/s3/handlers.py)\nfor an example of using `@status_wrapper`. This will send a status to the status\nAPI after execution of the handler is done. The minimum that should be done is\nto set `domain` and `domain_id` on the status object using\n`status_add(domain=\"dataset\", ...)`.\n\nSetting the `status_body` with `status_add(status_body={\"files_incoming\": [],\n\"files_outgoing\": [], \"other\": \"relevant_information\"})`, the user can retrieve\ninformation on what has happened in each step going through the system.\nGenerally it is best practice to log as much you can to the `status_body` field\nin order for the end-user to be able to trace what has happened to the data.\n\nIf the Lambda handler fails (e.g. throws an unhandled exception), the wrapper\nautomatically updates the event status to `FAILED` and trace status to\n`FINISHED`.\n\n## Data uploader\nThe [data-uploader](https://github.com/oslokommune/okdata-data-uploader) creates\na trace ID whenever a file is uploaded via the API. The trace ID is returned to\nthe user when uploaded. This trace ID is the one the pipeline-router (see below)\nwill pick up and set as execution name.\n\nTo extract the trace ID after uploading a file to a dataset via the okdata CLI:\n```sh\nokdata datasets cp /tmp/hello.xlsx ds:my-dataset --format=json | jq -r \".trace_id\"\n```\n\n## Pipeline router\nThe [pipeline-router](https://github.com/oslokommune/okdata-pipeline-router) is\nintegrated with the status API and will use the API to retrieve the trace ID for\nthe current execution.\n\nThe pipeline-router is executed based on an S3-event in AWS. The S3-path in the\nevent is sent to `http://{API}/status-from-path/{s3_path}` to retrieve the\ncorresponding trace ID. Once retrieved the trace ID is set as the execution name\non the state machine.\n\nEach execution step in the state machine will now have access to the trace ID\nvia the `execution_name`. When using the `@status_wrapper` this value will be\nextracted and populated for you.\n\nA `state-machine-event` Lambda function is hooked up to CloudFormation logs (see\n[dataplatform-config](https://github.com/oslokommune/dataplatform-config/tree/main/devops/modules/observability/cloudwatch-state-machine-events)\nfor wiring) that will pick up the end status for the state machine and post this\nto the status API. There is no need to set the end status from within the Lambda\nfunction, only the state of each execution. This means you can get the end\nstatus of a file-upload without adding anything to the state machine functions.\n\n## SDK \u0026 CLI\nThe status API is implemented in the SDK and exposed via the CLI:\n\n### Get the total status of an ID\n```sh\n$ okdata status eide-origo-ng-85c9e5de-ac38-4b37-af8a-a86f08ce2bbb\nStatus for: eide-origo-ng-85c9e5de-ac38-4b37-af8a-a86f08ce2bbb\n+------+----------------------------------------------------+--------------+--------------------+\n| Done |                     Trace ID                       | Trace status | Trace event status |\n+------+----------------------------------------------------+--------------+--------------------+\n| True | eide-origo-ng-85c9e5de-ac38-4b37-af8a-a86f08ce2bbb | FINISHED     | OK                 |\n+------+----------------------------------------------------+--------------+--------------------+\n```\n### Get the total status of an ID as pure JSON\n```sh\n$ okdata status eide-origo-ng-85c9e5de-ac38-4b37-af8a-a86f08ce2bbb --format=json\n{\n  \"done\": true,\n  \"trace_id\": \"eide-origo-ng-85c9e5de-ac38-4b37-af8a-a86f08ce2bbb\",\n  \"trace_status\": \"FINISHED\",\n  \"trace_event_status\": \"OK\"\n}\n```\n### Get just the done status\n```sh\n$ okdata status eide-origo-ng-85c9e5de-ac38-4b37-af8a-a86f08ce2bbb --format=json | jq -r \".done\"\ntrue\n```\n### Get the full history of an ID\n```sh\n$ okdata status eide-origo-ng-85c9e5de-ac38-4b37-af8a-a86f08ce2bbb --history\n```\nWill print a table with information on each step through the system. Depends on\n`@status_wrapper` being used in each Lambda function.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foslokommune%2Fdataplatform-status-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foslokommune%2Fdataplatform-status-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foslokommune%2Fdataplatform-status-api/lists"}