{"id":19130377,"url":"https://github.com/pedrodeoliveira/unbabel-bec","last_synced_at":"2026-04-17T08:03:09.814Z","repository":{"id":211126929,"uuid":"230910833","full_name":"pedrodeoliveira/unbabel-bec","owner":"pedrodeoliveira","description":"A streaming version of the Unbabel's BEC using GCP Pub/Sub and Apache Beam.","archived":false,"fork":false,"pushed_at":"2020-01-07T22:57:46.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-03T11:12:03.690Z","etag":null,"topics":["apache-beam","dataflow","docker","gitlab-ci","gke","python","streaming"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pedrodeoliveira.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-12-30T12:19:52.000Z","updated_at":"2020-09-18T15:14:56.000Z","dependencies_parsed_at":"2023-12-06T16:44:50.012Z","dependency_job_id":"27a52946-66b8-4b04-9db2-bb956b9cc347","html_url":"https://github.com/pedrodeoliveira/unbabel-bec","commit_stats":null,"previous_names":["pedrodeoliveira/unbabel-bec"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pedrodeoliveira%2Funbabel-bec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pedrodeoliveira%2Funbabel-bec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pedrodeoliveira%2Funbabel-bec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pedrodeoliveira%2Funbabel-bec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pedrodeoliveira","download_url":"https://codeload.github.com/pedrodeoliveira/unbabel-bec/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240206981,"owners_count":19765039,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-beam","dataflow","docker","gitlab-ci","gke","python","streaming"],"created_at":"2024-11-09T06:10:29.658Z","updated_at":"2025-11-13T08:01:57.333Z","avatar_url":"https://github.com/pedrodeoliveira.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Backend Engineering Challenge 2.0\n\n[![pipeline status](https://gitlab.com/psoliveira/unbabel-bec/badges/master/pipeline.svg)](https://gitlab.com/psoliveira/unbabel-bec/commits/master)\n\nThis project contains a solution to the Unbabel's [Backend Engineering Challenge](https://github.com/Unbabel/backend-engineering-challenge/blob/master/README.md) \nreformulated to a more realistic context, in which the translation events arrive in real time.  \n\n## Idea\n\nThe goal of this project is to solve the original challenge, but assuming the translation events occur in real time. In this reformulated challenge, we also assume that we care about the metrics by client. \n\nFor solving this problem the Apache Beam framework was chosen given that it is able to solve this problem in both a batch and streaming ways. We can also leverage on the GCP's Dataflow runner to run a fully scalable and managed pipeline.\n\nIn summary, we have in each folder:\n\n- **batch** the solution to the batch problem, where we process an input json file and write to an output json file.\n- **streaming** the solution to streaming problem, where we have: multiple publishers (`publisher.py`), simulating the clients translation events (Unbabel's translation API); the streaming pipeline (`streaming_pipeline.py`) performing the pipeline processing; and a subscriber (`subscriber.py`) that reads the output from the pipeline and prints the information. \n\n**Note:** For the messaging in the streaming problem, the Cloud Pub/Sub is used.\n\nMore details will be provided here in the following days.\n\n## To Do List\n\n- [x] Formulate problem to solve, choose technologies to use and define \narchitecture.\n- [x] Solution to batch pipeline problem (`batch_pipeline.py`).\n- [x] Solution to streaming pipeline problem (`streaming_pipeline.py`).\n- [x] Containerize project code ([psoliveira/unbabel-batch](https://hub.docker.com/repository/docker/psoliveira/unbabel-batch/) and [psoliveira/unbabel-streaming](https://hub.docker.com/repository/docker/psoliveira/unbabel-streaming/)).\n- [x] Setup CI using GitLab CI ([Pipelines](https://gitlab.com/psoliveira/unbabel-bec/pipelines)).\n- [ ] Write **README.md**.\n- [ ] Add Tests.\n- [ ] Add test stage to CI pipeline.\n- [x] Added `.yaml` files (`streaming/k8s`) for deploying architecture in a Kubernetes cluster.\n- [ ] Finalize **README.md**.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpedrodeoliveira%2Funbabel-bec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpedrodeoliveira%2Funbabel-bec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpedrodeoliveira%2Funbabel-bec/lists"}