{"id":24087968,"url":"https://github.com/tinybirdco/tinybird-beam","last_synced_at":"2026-06-18T14:31:19.335Z","repository":{"id":113973348,"uuid":"331979840","full_name":"tinybirdco/tinybird-beam","owner":"tinybirdco","description":"A Tinybird Apache Beam connector to ingest from a pipeline in DataFlow, Flink or Spark to a Tinybird Data Source","archived":false,"fork":false,"pushed_at":"2021-02-25T12:13:11.000Z","size":1451,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-02-27T05:25:03.881Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tinybirdco.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.rst","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-22T15:06:29.000Z","updated_at":"2023-01-04T22:36:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"50e6c96c-3a5f-4a51-96a6-3758da785922","html_url":"https://github.com/tinybirdco/tinybird-beam","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tinybirdco/tinybird-beam","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Ftinybird-beam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Ftinybird-beam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Ftinybird-beam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Ftinybird-beam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tinybirdco","download_url":"https://codeload.github.com/tinybirdco/tinybird-beam/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Ftinybird-beam/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34495377,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-10T03:56:41.859Z","updated_at":"2026-06-18T14:31:19.327Z","avatar_url":"https://github.com/tinybirdco.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# tinybird-beam\n\nA Tinybird Apache Beam connector to ingest from an Apache Beam pipeline running in DataFlow, Flink or Spark to a [Tinybird](https://www.tinybird.co/) Data Source\n\nDirectories:\n\n- `tinybird`: Contains the source code for the Apache Beam connector\n- `dataflow`: Pipeline definition to stream data from Google PubSub to Tinybird using the Apache Beam connector\n\n## Installation\n\nInstall the `tinybird-beam` [PyPI package](https://pypi.org/project/tinybird-beam/) in your Python project. See the `dataflow` directory for a usage example.\n\n```sh\npip install tinybird-beam\n```\n\n## How to use it\n\n```python\n# Import the sink class\nfrom tinybird.beam import WriteToTinybird\n\n# pipe a PCollection with elements of type `Iterable[Dict[str, Any]]` to the sink class\n# you should batch your elements beforehand using a `beam.WindowInto` or `beam.transforms.util.GroupIntoBatches`\nhost = 'https://api.tinybird.co'\ntoken = '' # get it from https://ui.tinybird.co/tokens\ndatasource = '' # name of the Data Source\ncolumns = '' # comma separated list with the name of the columns of the Data Source\nout | \"Stream to Tinybird\" \u003e\u003e WriteToTinybird(host, token, datasource, columns)\n```\n\n## Development\n\n```sh\npython3 -m venv env\nsource env/bin/activate\npip install -e .\n```\n\n## DataFlow example\n\nThe `dataflow` directory contains some sample code to deploy to DataFlow an Apache Beam pipeline that gets data from a PubSub topic and ingests to a BigQuery table and a Tinybird Data Source.\n\nFollow these steps to run the example:\n\n### Prepare the environment\n\n- Push this Data Source to your Tinybird account and name it `pubsub__invoices`:\n\n```\nSCHEMA \u003e\n    `id` UInt32,\n    `agent_id` UInt8,\n    `recipient_code` UInt32,\n    `client_id` UInt32,\n    `amount` Float32,\n    `currency` LowCardinality(String),\n    `created_at` DateTime,\n    `added_payments` String\n```\n\n- Create the Python environment:\n\n```sh\ncd dataflow\npython3 -m venv env\nsource env/bin/activate\npip install -r requirements.txt\n# Update the variables in the `sample.env` file and source it:\nsource sample.env\n```\n\n- Create the PubSub topic:\n\n```sh\ngcloud pubsub topics create demo-topic\n```\n\n- Import the `dataflow/pubsub/invoices_sample.json` file to BigQuery. We use as `dataset.table_name` -\u003e `tinybird.pubsub__invoices`.\n\n### Run the example\n\n- Push the Apache Beam pipeline to DataFlow:\n\n```sh\npython dataflow.py \\\n  --project=$PROJECT_NAME \\\n  --region=$REGION \\\n  --runner=DataflowRunner \\\n  --temp_location=$TMP_LOCATION \\\n  --input_topic=projects/$PROJECT_NAME/topics/$TOPIC \\\n  --bq_table=tinybird.pubsub__invoices \\\n  --batch_size=10000 \\\n  --batch_seconds=30 \\\n  --batch_key= \\\n  --tb_host=https://api.tinybird.co \\\n  --tb_token=$TB_TOKEN \\\n  --tb_datasource=pubsub__invoices \\\n  --tb_columns=\"id,agent_id,recipient_code,client_id,amount,currency,created_at,added_payments\"\n```\n\nThis pipeline will batch 10000 elements or a window of 30 seconds to the Tinybird Data Source.\n\nOnce running you'll see this log (it might take several minutes to deploy):\n\n```sh\nINFO:apache_beam.runners.dataflow.dataflow_runner:Job 2021-01-22_08_13_37-6419261550197294468 is in state JOB_STATE_RUNNING\nINFO:apache_beam.runners.dataflow.dataflow_runner:2021-01-22T16:13:48.981Z: JOB_MESSAGE_DETAILED: Pub/Sub resources set up for topic 'projects/---/topics/demo-topic'.\nINFO:apache_beam.runners.dataflow.dataflow_runner:2021-01-22T16:13:50.023Z: JOB_MESSAGE_DEBUG: Starting worker pool setup.\nINFO:apache_beam.runners.dataflow.dataflow_runner:2021-01-22T16:13:50.034Z: JOB_MESSAGE_BASIC: Starting 1 workers in europe-west3-b...\nINFO:apache_beam.runners.dataflow.dataflow_runner:2021-01-22T16:13:50.048Z: JOB_MESSAGE_DEBUG: Starting worker pool setup.\nINFO:apache_beam.runners.dataflow.dataflow_runner:2021-01-22T16:14:17.581Z: JOB_MESSAGE_DETAILED: Autoscaling: Raised the number of workers to 1 so that the pipeline can catch up with its backlog and keep up with its input rate.\nINFO:apache_beam.runners.dataflow.dataflow_runner:2021-01-22T16:15:02.264Z: JOB_MESSAGE_DETAILED: Workers have started successfully.\nINFO:apache_beam.runners.dataflow.dataflow_runner:2021-01-22T16:15:02.278Z: JOB_MESSAGE_DETAILED: Workers have started successfully.\n```\n\n- Start the PubSub publisher:\n\n```\ncd pubsub\npython pubsub.py\n```\n\nAfter the pipeline is deployed [the job](https://console.cloud.google.com/dataflow/jobs) would look like this:\n\n![](dataflow/dataflow.png)\n\nOnce it starts receiving data, you can check the data in your BigQuery table and Tinybird Data Source.\n\n### Clean resources\n\nStop the DataFlow pipeline from the [Google Cloud console](https://console.cloud.google.com/dataflow/jobs).\n\nClean up the PubSub topic.\n\n```sh\ngcloud pubsub topics delete demo-topic\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Ftinybird-beam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftinybirdco%2Ftinybird-beam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Ftinybird-beam/lists"}