{"id":15984664,"url":"https://github.com/aappddeevv/loader","last_synced_at":"2025-04-04T20:46:05.612Z","repository":{"id":97609835,"uuid":"73432965","full_name":"aappddeevv/loader","owner":"aappddeevv","description":"ETL data into a database with an easy to use DSL.","archived":false,"fork":false,"pushed_at":"2016-11-28T22:01:57.000Z","size":76,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-10T05:25:58.255Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aappddeevv.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-11-11T00:41:09.000Z","updated_at":"2016-11-11T00:44:59.000Z","dependencies_parsed_at":"2023-06-26T00:32:37.981Z","dependency_job_id":null,"html_url":"https://github.com/aappddeevv/loader","commit_stats":{"total_commits":11,"total_committers":1,"mean_commits":11.0,"dds":0.0,"last_synced_commit":"d428021348fa1d8e8650760c1f8e141263c1dd03"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aappddeevv%2Floader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aappddeevv%2Floader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aappddeevv%2Floader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aappddeevv%2Floader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aappddeevv","download_url":"https://codeload.github.com/aappddeevv/loader/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247249602,"owners_count":20908211,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-08T02:10:01.703Z","updated_at":"2025-04-04T20:46:05.594Z","avatar_url":"https://github.com/aappddeevv.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"##Purpose\nAn application that loads data into an RDBMS\nusing parallel loads and a simple DSL inspired by ETL tools\nto specify the attribute mappings.\n\nThe DSL for schema definition and mapping transforms is \ngeneric and can be used in many environments, including\nspark.  You can define your schema using the DSL then\ndefine your transformation rules using the rules DSL and\napply them to a dataframe. The rules are automatically\ntranslated into spark-friendly code.\n\nUsing the DSL requires some knowledge of scala, but not much.\n\n##History\n\nThe project started out many years ago as a java program\nhosted on sourceforge but I moved it over to github and\nupdated it to use scala about a year ago. I received\na large number of private updates from various versions\nin between and recently was prompted privately to publish\nthem into github.\n\nIts been tested\nin production environments to be \"good enough\" to work on\nlarge loads. Use your ETL tool or bulk loaders\nspecific to your RDBMS first, but otherwise you may find this\nsimple application useful.\n\n##Mappings Development\nCreate a new sbt project then include this project\nas a dependency. \n\nYou must first publish this project locally using\n```sh\nsbt publishLocal\n```\nthen add the published file as a dependency to your\nproject.\n```scala\nlibraryDependencies ++= Seq(\n\"org.im.loader\" %% \"csv\" % \"latest.version\"\n)\n```\nThis automatically pulls in `org.im.loader.core`.\n\nOnce you have specified this project as a dependency\nyou need to:\n* Create your main program\n* Create your command line options. There are some\noptions available to you using the program.parser value.\n* Develop your mappings (see below).\n* Call the `program.runloader(..)` function providing\nyour command line parser (derived from (2)), the\ndefault configuration derived from `org.im.loader.Config`\nwith your list of mappings and the command arguments\nfrom your main class.\n\nThat's it!\n\nTip: To create a command line parser from the one\nprovided in the program object just do:\n```scala\nval yourparser = new scopt.OptionParser[Config](\"loader\") { \n   options ++= program.parser.stdargs // don't retype them\n   ...\n   \u003cmore of your options here\u003e\n}\n```\n\n##Mapping Development\nTo create your mappings, derive from the mappings\nobject in org.im.loader and specify mappings using\nthe DSL.\n```scala\nobject table1mappings extends mappings(\"table1\", \"table1\", Some(\"theschema\")) {\n    import sourcefirst._\n    import org.im.loader.Implicits._\n    import com.lucidchart.open.relate.interp.Parameter._ \n\n    string(\"cola\").directMove\n    long(\"colb)\".to(\"colbtarget\")\n    ...\n    to[Long](\"colc\").rule(0){ ctx =\u003e\n        ctx.success(ctx.input.get(\"funkycolcsource\"))\n    }\n}\n```\nYou can also define the schema in the mappings to help\nwith type conversions before your rule receive your data.\nSubclassing the mappings object allows you to add your\nconvenience combinator methods to the mappings object.\nFor example, you could add a 'lookup' combinator or\na `.directMoveButOnlyUnderCertainConditions` combinator.\n\n\"Source first\" mappings are mappings that start with the\nsource such as `string(\"cola\")`. That says that the mapping\nshould have the source attribute come from the attribute `cola1`\nin the input record. \n\nIt's better to specify a \"target first\" mapping\nsuch as `to[..](..)` and then specify processing rules. Rules\nhave a priority and are run in priority order. See the\ndsltests.scala file in the test directory for examples\nof mappings and how to specify the rules.\n\n\n##Mapping Testing\nThe typical development  model is to leave your project open\nin your editor, edit your mappings, then run the load from\nthe sbt command line for unit tests. Once the mappings\nare complete, bundle up \"your\" project and deploy it. Since\nthis library is not deployed to maven, download it,\nthen create your IDE's configuration using\n```sh\nsbt eclipse with-source=true\n```\nDevelop and test your mappings. Then deploy the entire\napplication via a zip file.\n\nCheck out the `dsltest.scala` test file for examples of how\nto specify your mappings.\n\nYou will want to drop your favorite jdbc lib into the lib directory\nor include it in the dependencies inside build.sbt.\n\n\n##Deploying\n\nThe application can be packaged by typing\n```sh\nsbt universal:packageBin \n```\nto obtain a zip file that can be installed. You will want\nto have the same plugins specified in this library\nin your own project's project/plugins.sbt to make this work.\n\n\n##Spark Support\nSpark support is in the mix and the code will be refactored\nso that the ETL-style approach expressed in the DSL\nworks well with Spark dataframes. This includes schema definition.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faappddeevv%2Floader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faappddeevv%2Floader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faappddeevv%2Floader/lists"}