{"id":18414921,"url":"https://github.com/xerial/td-spark-example","last_synced_at":"2025-04-13T00:59:01.518Z","repository":{"id":141494003,"uuid":"256075625","full_name":"xerial/td-spark-example","owner":"xerial","description":"An example td-spark application","archived":false,"fork":false,"pushed_at":"2020-04-16T01:30:20.000Z","size":22320,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-13T00:58:55.714Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://treasure-data.github.io/td-spark/","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xerial.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-16T01:13:14.000Z","updated_at":"2020-04-16T01:30:23.000Z","dependencies_parsed_at":"2023-07-03T15:31:28.276Z","dependency_job_id":null,"html_url":"https://github.com/xerial/td-spark-example","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xerial%2Ftd-spark-example","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xerial%2Ftd-spark-example/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xerial%2Ftd-spark-example/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xerial%2Ftd-spark-example/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xerial","download_url":"https://codeload.github.com/xerial/td-spark-example/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248650435,"owners_count":21139672,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T03:52:41.265Z","updated_at":"2025-04-13T00:59:01.495Z","avatar_url":"https://github.com/xerial.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"td-spark-example\n===\n\nAn example td-spark application.\n\n- Java8 is required as Spark 2.4.x does not support JDK11\n\ntd-spark documentation: https://treasure-data.github.io/td-spark/\n\n## Project Structure\n\n```\nlib        # Put td-spark-assembly jar file here\nsrc        # Example source code (TDSparkExample)\nbuild.sbt  # Build definition\n```\n\n## Usage\n\n```\n# Set your TD API key as an environment variable\n$ export TD_API_KEY=(Your TD API key)\n\n$ ./sbt\n\n# Run the example program\nsbt:td-spark-example\u003e run\n[info] running example.TDSparkExample\nUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties\n20/04/15 18:00:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\n2020-04-15 18:00:10.793-0700 debug [spark] Loading com.treasuredata.spark package - (package.scala:23)\n2020-04-15 18:00:10.802-0700  info [spark] td-spark version:20.4.0, revision:f6bdc8e, build_time:2020-04-10T07:03:29.264+0000 - (package.scala:24)\n2020-04-15 18:00:11.025-0700  info [TDServiceConfig] td-spark site: us - (TDServiceConfig.scala:36)\n20/04/15 18:00:11 INFO log: Logging initialized @275255ms to org.eclipse.jetty.util.log.Slf4jLog\n20/04/15 18:00:11 INFO TDClient: td-client version: unknown\n2020-04-15 18:00:11.852-0700  info [TDSparkExample] Reading a TD Table - (TDSparkExample.scala:49)\n2020-04-15 18:00:14.274-0700  info [TDRelation] Fetching the partition list of sample_datasets.www_access within time range:[2014-10-03 07:20:45Z,2014-10-03 07:23:20Z) - (TDRelation.scala:172)\n2020-04-15 18:00:15.260-0700  info [TDRelation] Retrieved 1 partition entries - (TDRelation.scala:179)\n+----------+---------------+--------------------+----+----+\n|      time|           host|                path|code|size|\n+----------+---------------+--------------------+----+----+\n|1412320978|   200.72.21.63|   /category/finance| 200|  59|\n|1412320962| 136.27.214.160|   /item/office/4216| 200|  54|\n|1412320945|104.159.186.145|    /search/?c=Games| 200|  79|\n|1412320911| 100.192.40.170|    /item/books/4494| 200|  93|\n|1412320878| 108.126.158.84|/category/electro...| 200|  75|\n|1412320845|200.129.205.208|/category/electro...| 200|  62|\n+----------+---------------+--------------------+----+----+\n\n2020-04-15 18:00:16.496-0700  info [TDSparkExample] Submitting a Presto query and reading the result - (TDSparkExample.scala:59)\n2020-04-15 18:00:18.831-0700  info [TDPrestoJDBCRDD]  - (TDPrestoRelation.scala:106)\nSubmit Presto query:\nselect time, host, path, code, size\nfrom sample_datasets.www_access\nwhere td_time_range(time, 1412320845, 1412321000)\nand size \u003e 50 and size \u003c 100\n+----------+---------------+--------------------+----+----+\n|      time|           host|                path|code|size|\n+----------+---------------+--------------------+----+----+\n|1412320978|   200.72.21.63|   /category/finance| 200|  59|\n|1412320911| 100.192.40.170|    /item/books/4494| 200|  93|\n|1412320878| 108.126.158.84|/category/electro...| 200|  75|\n|1412320845|200.129.205.208|/category/electro...| 200|  62|\n|1412320962| 136.27.214.160|   /item/office/4216| 200|  54|\n|1412320945|104.159.186.145|    /search/?c=Games| 200|  79|\n+----------+---------------+--------------------+----+----+\n\n[success] Total time: 14 s, completed Apr 15, 2020 6:00:20 PM\n```\n\n## Create a standalone Spark application\n\n```\n# Create a portable executable package into target/pack folder\n$ ./sbt pack\n\n\n# target/pack/bin folder contains a script to launch a local Spark cluster and your application:\n$ ./target/pack/bin/td-spark-example\n...\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxerial%2Ftd-spark-example","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxerial%2Ftd-spark-example","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxerial%2Ftd-spark-example/lists"}