{"id":24685272,"url":"https://github.com/guillim/embulk_microservice","last_synced_at":"2026-05-15T20:32:23.386Z","repository":{"id":128462977,"uuid":"247102013","full_name":"guillim/embulk_microservice","owner":"guillim","description":"Use Embulk to remotely connect to your databases through SSH tunneling, and do your transformations from one database to another databse (on different servers). used in DGM","archived":false,"fork":false,"pushed_at":"2024-05-27T13:59:04.000Z","size":25,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-08T08:48:54.335Z","etag":null,"topics":["embulk","microservice","ssh-tunneling"],"latest_commit_sha":null,"homepage":"","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guillim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-13T15:21:40.000Z","updated_at":"2024-05-27T13:59:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"52825897-e43a-4ce2-94a4-f16e6b7a45fd","html_url":"https://github.com/guillim/embulk_microservice","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/guillim/embulk_microservice","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guillim%2Fembulk_microservice","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guillim%2Fembulk_microservice/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guillim%2Fembulk_microservice/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guillim%2Fembulk_microservice/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guillim","download_url":"https://codeload.github.com/guillim/embulk_microservice/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guillim%2Fembulk_microservice/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33078899,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-15T20:25:35.270Z","status":"ssl_error","status_checked_at":"2026-05-15T20:25:34.732Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embulk","microservice","ssh-tunneling"],"created_at":"2025-01-26T15:18:25.952Z","updated_at":"2026-05-15T20:32:23.358Z","avatar_url":"https://github.com/guillim.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Embulk as a Micro-service\n\nThis project aims to facilitate deploying embulk as a micro-service through SSH tunneling\n\n### What is does\n\n1. Connect to your database 1\n2. Do a job like converting from Db1 to Db2 (as specified in the configuration_example.yml file)  \n3. Connect to your database 2 and write the Embulk output\n\nEvery connection is done using SSH tunneling.\n\n\nExample, with Mongo (database 1) and Postgres (database 2) :  \n![example](https://ibin.co/5FnkVGGw3Jej.png)\n\n### Can it be on hosted on PAAS ?\n\nYes, you can host it on heroku for instance, or on your own server.\n\n### How can I install it ?\n\nPre requisite : you need Docker installed on your machine\n\nThen, you have to :\n- put your ssh key (the private part) in the .ssh folder as _keyexample_ or _default_env_SSHKEY_ according to your environment_variables you will define next step =\u003e this key will allow this machine to connect to the remote database so you need also to make sure the remote machines will allow the connection with a public key\n- customize the environment variables in the environment_variables.txt file according to the different IP of your servers etc...\n- modify configuration_example.yml according to your needs (see [embulk website](https://www.embulk.org/docs/) for more details)\n- run `docker build --build-arg CONFIGURATION_FILE=configuration_example.yml --build-arg DIFF_FILE=diff.yml --tag embulk_container .` to launch the build process of your docker image\n- run `docker run --env-file=environment_variables.txt -it embulk_container bash` only later if you want to start the process again. If you change environment_variables.txt of your configuration_example.yml you will need to run the other one in order to build again the docker image\n\n\n#### Note :\n\n- For better use, I suggest renaming configuration_example.yml to configuration.yml and since it is gitignored you can leave it in the repo. Another example can be found named configuration_example_2.yml\n- for incremental update, we need to keep \"diff.yml\" (see [embulk doc](https://www.embulk.org/docs/recipe/scheduled-csv-load-to-elasticsearch-kibana5.html#scheduling-loading-by-cron)) from one run to another. In order to do so, we set up a Docker Volume to keep it persistent. This is donc adding `-v $PWD:/work` to the docker `run command`. So here is the command:  \n`docker run --env-file=environment_variables.txt -v $PWD:/work -it embulk_container bash`\n- If, for some unkwnown reason, you cannot _merge_ the first time, try to _insert_ instead, and manually specify the primary key on your output database\n- you may encounter some database error _Sort operation used more than the maximum XXXXXX bytes of RAM_ in case of incremental_field while you haven't indexed your database on this field\n- using the java:8 docker image was triggering an out of RAM problem. we switched to this image FROM fabric8/java-jboss-openjdk8-jdk:1.4.0 in order to have the ability to limit Java Ram usage `docker run -m 600m -e JAVA_OPTIONS='-Xmx300m' [...]`. This issue was inherent to Java, unable to use cgroup memory limits : whatever the container Ram limit was, Java container was using all the machine ressource, causing big errors.\n- when running in production, don't forget to remove `-it` because no TTY will ba available if you trigger it from a CRON job for instance\n\n### TroubleShooting\n- If you still get prompt password, you have an issue with your SSH auth, It can be that your key has too wide permission. try\n```bash\nchmod 600  .ssh/keyexample\n```\n\n### Examples:\n\n##### From Mongo to Postgres\nsee this [example](configuration_example.yml)\n\n##### From Mongo to Postgres, with transformation\nsee this [example](configuration_example_2.yml)\n\n##### From Postgres to BigQuery\nsee this [example](configuration_example_3.yml)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguillim%2Fembulk_microservice","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguillim%2Fembulk_microservice","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguillim%2Fembulk_microservice/lists"}