{"id":26233976,"url":"https://github.com/teragrep/pth_10","last_synced_at":"2026-02-09T12:14:33.812Z","repository":{"id":173631636,"uuid":"650541015","full_name":"teragrep/pth_10","owner":"teragrep","description":"Data Processing Language (DPL) translator for Apache Spark","archived":false,"fork":false,"pushed_at":"2026-02-06T07:22:44.000Z","size":2079,"stargazers_count":1,"open_issues_count":427,"forks_count":9,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-02-06T15:35:31.643Z","etag":null,"topics":["apache-spark","data-processing-language","dpl","programming-language-translator","teragrep"],"latest_commit_sha":null,"homepage":"https://teragrep.com","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/teragrep.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-06-07T09:35:02.000Z","updated_at":"2026-02-06T06:36:48.000Z","dependencies_parsed_at":"2025-12-04T13:16:48.203Z","dependency_job_id":null,"html_url":"https://github.com/teragrep/pth_10","commit_stats":null,"previous_names":["teragrep/pth_10"],"tags_count":47,"template":false,"template_full_name":null,"purl":"pkg:github/teragrep/pth_10","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teragrep%2Fpth_10","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teragrep%2Fpth_10/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teragrep%2Fpth_10/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teragrep%2Fpth_10/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/teragrep","download_url":"https://codeload.github.com/teragrep/pth_10/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teragrep%2Fpth_10/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29264333,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-09T04:11:57.159Z","status":"ssl_error","status_checked_at":"2026-02-09T04:11:56.117Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","data-processing-language","dpl","programming-language-translator","teragrep"],"created_at":"2025-03-13T01:18:21.746Z","updated_at":"2026-02-09T12:14:33.772Z","avatar_url":"https://github.com/teragrep.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"= PTH_10: DPL to Apache Spark Translator\n\nTranslates Data Processing Language (DPL) commands to Apache Spark actions and transformations.\nUses ANTLR visitors to generate a list of step objects, which contain the actual implementations of the commands\nusing the Apache Spark API.\n\n== Features\n\n- Translates a string-based DPL command using the parse tree generated by the https://github.com/teragrep/pth_03[PTH_03]\nANTLR-based parser to Apache Spark actions and transformations.\n- Fetch data from a datasource provider (by default, https://github.com/teragrep/pth_06[PTH_06] datasource provider) and\nfilter the data with the filters specified in the DPL command.\n- Apply various transformations and actions to the data with simple easy-to-understand commands.\n- Supports parallel and sequential modes based on which kind of commands are used. If a command requires batch-based\nprocessing, sequential mode will be used. Otherwise, processing will remain on parallel mode, allowing stream processing.\n- Spark API implementations are enclosed in so-called Step objects, which take a Dataset as input and return the\ntransformed dataset as the return value, allowing for easy reusability of these objects.\n- ANTLR-based visitor functions purely gather all the necessary parameters for these objects, not containing\nany implementation logic of the commands themselves.\n\n== Documentation\n\nSee the official documentation on https://docs.teragrep.com[docs.teragrep.com].\n\n== Limitations\n\nNot all commands in the Data Processing Language are yet implemented.\n\n== How to\n\nUse:\n\n- Create a new DPLParserCatalystContext. It requires a `SparkSession` object and a `com.typesafe.config.Config`. The\nconfig is usually provided from the Zeppelin component.\n[,java]\n----\nDPLParserCatalystContext catCtx = new DPLParserCatalystContext(sparkSession, config);\n----\n- Create a new DPLParserCatalystVisitor, in which you set the DPLParserCatalystContext.\n[,java]\n----\nDPLParserCatalystVisitor catVisitor = new DPLParserCatalystVisitor(catCtx);\n\n----\n- Visit the parse tree generated by PTH_03 using the visitor functions with the DPLParserCatalystVisitor.visit() function.\n[,java]\n----\nCatalystNode n = (CatalystNode) visitor.visit(tree);\n----\n- The result of that function is a CatalystNode. It contains a DataStreamWriter, which can be started to start the execution.\n[,java]\n----\nn.getDataStreamWriter();\n----\n- Set the visitor's Consumer to a function of your liking to view or move the resulting Dataset to the desired component.\n[,java]\n----\nvisitor.setConsumer((ds, id) -\u003e {\n    ds.show();\n});\n----\n\nFor a more concrete example, check out the https://github.com/teragrep/pth_07[PTH_07] Zeppelin DPL Interpreter project.\n\nCompile:\n\n[,sh]\n----\nmvn clean install -Pbuild\n----\n\n== Contributing\n\nYou can involve yourself with our project by https://github.com/teragrep/pth_10/issues/new/choose[opening an issue]\nor submitting a pull request.\n\nContribution requirements:\n\n. *All changes must be accompanied by a new or changed test.* If you think testing is not required in your pull request, include a sufficient explanation as why you think so.\n. Security checks must pass\n. Pull requests must align with the principles and http://www.extremeprogramming.org/values.html[values] of extreme programming.\n. Pull requests must follow the principles of Object Thinking and Elegant Objects (EO).\n\nRead more in our https://github.com/teragrep/teragrep/blob/main/contributing.adoc[Contributing Guideline].\n\n=== Contributor License Agreement\n\nContributors must sign https://github.com/teragrep/teragrep/blob/main/cla.adoc[Teragrep Contributor License Agreement] before a pull request is accepted to organization's repositories.\n\nYou need to submit the CLA only once. After submitting the CLA you can contribute to all Teragrep's repositories.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteragrep%2Fpth_10","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteragrep%2Fpth_10","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteragrep%2Fpth_10/lists"}