{"id":14068720,"url":"https://github.com/DyfanJones/RAthena","last_synced_at":"2025-07-30T04:31:53.206Z","repository":{"id":35160104,"uuid":"203782469","full_name":"DyfanJones/RAthena","owner":"DyfanJones","description":"Connect R to Athena using Boto3 SDK (DBI Interface)","archived":false,"fork":false,"pushed_at":"2024-02-08T13:26:44.000Z","size":1991,"stargazers_count":37,"open_issues_count":12,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-07-03T02:07:21.661Z","etag":null,"topics":["athena","aws","boto3","database","r"],"latest_commit_sha":null,"homepage":"https://dyfanjones.github.io/RAthena/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DyfanJones.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-22T11:37:18.000Z","updated_at":"2024-12-30T17:57:03.000Z","dependencies_parsed_at":"2024-02-08T14:49:35.423Z","dependency_job_id":null,"html_url":"https://github.com/DyfanJones/RAthena","commit_stats":{"total_commits":1229,"total_committers":6,"mean_commits":"204.83333333333334","dds":0.1358828315703824,"last_synced_commit":"32317f3611deb882ced01e0084beaaa08fd31ee5"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"purl":"pkg:github/DyfanJones/RAthena","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DyfanJones%2FRAthena","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DyfanJones%2FRAthena/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DyfanJones%2FRAthena/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DyfanJones%2FRAthena/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DyfanJones","download_url":"https://codeload.github.com/DyfanJones/RAthena/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DyfanJones%2FRAthena/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267809512,"owners_count":24147480,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["athena","aws","boto3","database","r"],"created_at":"2024-08-13T07:06:21.917Z","updated_at":"2025-07-30T04:31:52.841Z","avatar_url":"https://github.com/DyfanJones.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"# RAthena\n\n[![Project Status: Active – The project has reached a stable, usable\nstate and is being actively\ndeveloped.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)\n[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version-ago/RAthena)](https://CRAN.R-project.org/package=RAthena)\n![downloads](https://cranlogs.r-pkg.org/badges/RAthena)\n[![Codecov test coverage](https://codecov.io/gh/DyfanJones/rathena/branch/master/graph/badge.svg)](https://app.codecov.io/gh/DyfanJones/rathena?branch=master)\n[![R-CMD-check](https://github.com/DyfanJones/RAthena/workflows/R-CMD-check/badge.svg)](https://github.com/DyfanJones/RAthena/actions)\n[![RAthena status badge](https://dyfanjones.r-universe.dev/badges/RAthena)](https://dyfanjones.r-universe.dev)\n\nThe goal of the `RAthena` package is to provide a DBI-compliant interface\nto Amazon’s Athena (\u003chttps://aws.amazon.com/athena/\u003e) using `Boto3` SDK.\nThis allows for an efficient, easy setup connection to Athena using the\n`Boto3` SDK as a driver.\n\n**NOTE:** *Before using `RAthena` you must have an aws account or have\naccess to aws account with permissions allowing you to use Athena.*\n\n## Installation:\n\nBefore installing `RAthena` ensure that `Python 3+` is installed onto your\nmachine: \u003chttps://www.python.org/downloads/\u003e. To install `Boto3` either it\ncan installed the pip command or using `RAthena` installation function:\n\n```\npip install boto3\n```\nRAthena Method (after `RAthena` has been installed this method can be used)\n``` r\nRAthena::install_boto()\n```\n\nTo install `RAthena` you can get it from CRAN with:\n``` r\ninstall.packages(\"RAthena\")\n```\n\nOr to get the development version from Github with:\n```r\nremotes::install_github(\"dyfanjones/rathena\")\n```\n\n## Connection Methods\n\n### Hard Coding\n\nThe most basic way to connect to AWS Athena is to hard-code your access key \nand secret access key. However this method is **not** recommended as your \ncredentials are hard-coded.\n```r\nlibrary(DBI)\n\ncon \u003c- dbConnect(RAthena::athena(),\n                aws_access_key_id='YOUR_ACCESS_KEY_ID',\n                aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',\n                s3_staging_dir='s3://path/to/query/bucket/',\n                region_name='eu-west-1')\n```\n\n### AWS Profile Name\n\nThe next method is to use profile names set up by AWS CLI or created manually \nin the `~/.aws` directory. To create the profile names manually please refer \nto: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html.\n\n##### Setting up AWS CLI\n\nRAthena is compatible with AWS CLI. This allows your aws credentials to\nbe stored and not be hard coded in your connection.\n\nTo install AWS CLI please refer to:\n\u003chttps://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html\u003e,\nto configure AWS CLI please refer to:\n\u003chttps://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html\u003e\n\nOnce AWS CLI has been set up you will be able to connect to Athena by\nonly putting the `s3_staging_dir`.\n\nUsing default profile name:\n``` r\nlibrary(DBI)\ncon \u003c- dbConnect(RAthena::athena(),\n                 s3_staging_dir = 's3://path/to/query/bucket/')\n```\nConnecting to Athena using profile name other than `default`.\n``` r\nlibrary(DBI)\ncon \u003c- dbConnect(RAthena::athena(),\n                 profile_name = \"your_profile\",\n                 s3_staging_dir = 's3://path/to/query/bucket/')\n```\n\n### Temporary Credentials with MFA Account:\n\n```r\nlibrary(RAthena)\nget_session_token(\"YOUR_PROFILE_NAME\",\n                  serial_number='arn:aws:iam::123456789012:mfa/user',\n                  token_code = \"531602\",\n                  set_env = TRUE)\n\n# Connect to Athena using temporary credentials\ncon \u003c- dbConnect(athena(),\n                s3_staging_dir = 's3://path/to/query/bucket/')\n```\n\n## Assuming ARN Role for connection\n\nAnother method in connecting to Athena is to use Amazon Resource Name (ARN) role.\n\nSetting credentials in environmental variables:\n```r\nlibrary(RAthena)\nassume_role(profile_name = \"YOUR_PROFILE_NAME\",\n            role_arn = \"arn:aws:sts::123456789012:assumed-role/role_name/role_session_name\",\n            set_env = TRUE)\n\n# Connect to Athena using temporary credentials\ncon \u003c- dbConnect(athena(),\n                s3_staging_dir = 's3://path/to/query/bucket/')\n```\nConnecting to Athena directly using ARN role:\n\n```r\nlibrary(DBI)\n con \u003c- dbConnect(athena(),\n                  profile_name = \"YOUR_PROFILE_NAME\",\n                  role_arn = \"arn:aws:sts::123456789012:assumed-role/role_name/role_session_name\",\n                  s3_staging_dir = 's3://path/to/query/bucket/')\n```\nTo change the duration of ARN role session please change the parameter `duration_seconds`. \nBy default `duration_seconds` is set to 3600 seconds (1 hour).\n\n## Usage\n\n### Basic Usage\n\nConnect to athena, and send a query and return results back to R.\n\n``` r\nlibrary(DBI)\n\ncon \u003c- dbConnect(RAthena::athena(),\n                aws_access_key_id='YOUR_ACCESS_KEY_ID',\n                aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',\n                s3_staging_dir='s3://path/to/query/bucket/',\n                region_name='eu-west-1')\n\nres \u003c- dbExecute(con, \"SELECT * FROM one_row\")\ndbFetch(res)\ndbClearResult(res)\n```\n\nTo retrieve query in 1 step.\n\n``` r\ndbGetQuery(con, \"SELECT * FROM one_row\")\n```\n\n### Intermediate Usage\n\nTo create a tables in athena, `dbExecute` will send the query to athena\nand wait until query has been executed. This makes it and idea method to\ncreate tables within athena.\n\n``` r\nquery \u003c- \n  \"CREATE EXTERNAL TABLE impressions (\n      requestBeginTime string,\n      adId string,\n      impressionId string,\n      referrer string,\n      userAgent string,\n      userCookie string,\n      ip string,\n      number string,\n      processId string,\n      browserCookie string,\n      requestEndTime string,\n      timers struct\u003cmodelLookup:string, requestTime:string\u003e,\n      threadId string,\n      hostname string,\n      sessionId string)\n  PARTITIONED BY (dt string)\n  ROW FORMAT  serde 'org.apache.hive.hcatalog.data.JsonSerDe'\n      with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' )\n  LOCATION 's3://elasticmapreduce/samples/hive-ads/tables/impressions/' ;\"\n  \ndbExecute(con, query)\n```\n\nRAthena has 2 extra function to return extra information around Athena\ntables: `dbGetParitiions` and `dbShow`\n\n`dbGetPartitions` will return all the partitions (returns data.frame):\n\n``` r\nRAthena::dbGetPartition(con, \"impressions\")\n```\n\n`dbShow` will return the table’s ddl, so you will able to see how the\ntable was constructed in Athena (returns SQL character):\n\n``` r\nRAthena::dbShow(con, \"impressions\")\n```\n\n### Advanced Usage\n\n``` r\nlibrary(DBI)\ncon \u003c- dbConnect(RAthena::athena(),\n                 s3_staging_dir = 's3://path/to/query/bucket/')\n```\n\n#### Sending data to Athena\n\nRAthena has created a method to send data.frame from R to Athena.\n\n``` r\n# Check existing tables\ndbListTables(con)\n# Upload iris to Athena\ndbWriteTable(con, \"iris\", iris, \n             partition=c(\"TIMESTAMP\" = format(Sys.Date(), \"%Y%m%d\")))\n\n# Read in iris from Athena\ndbReadTable(con, \"iris\")\n\n# Check new existing tables in Athena\ndbListTables(con)\n\n# Check if iris exists in Athena\ndbExistsTable(con, \"iris\")\n```\n\nPlease check out `RAthena` method for [`dbWriteTable`](https://dyfanjones.github.io/RAthena/reference/AthenaWriteTables.html) for more information in how to upload data to AWS Athena and AWS S3.\n\nFor more information around how to get the most out of AWS Athena when uploading data please check out: [Top 10 Performance Tuning Tips for Amazon Athena](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/)\n\n### Tidyverse Usage\n\nCreating a connection to Athena and query and already existing table\n`iris` that was created in previous example.\n\n``` r\nlibrary(DBI)\nlibrary(dplyr)\n\ncon \u003c- dbConnect(RAthena::athena(),\n                aws_access_key_id='YOUR_ACCESS_KEY_ID',\n                aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',\n                s3_staging_dir='s3://path/to/query/bucket/',\n                region_name='eu-west-1')\ntbl(con, sql(\"SELECT * FROM iris\"))\n```\n\n    # Source:   SQL [?? x 5]\n    # Database: Athena 1.9.210 [eu-west-1/default]\n       sepal_length sepal_width petal_length petal_width species\n              \u003cdbl\u003e       \u003cdbl\u003e        \u003cdbl\u003e       \u003cdbl\u003e \u003cchr\u003e  \n     1          5.1         3.5          1.4         0.2 setosa \n     2          4.9         3            1.4         0.2 setosa \n     3          4.7         3.2          1.3         0.2 setosa \n     4          4.6         3.1          1.5         0.2 setosa \n     5          5           3.6          1.4         0.2 setosa \n     6          5.4         3.9          1.7         0.4 setosa \n     7          4.6         3.4          1.4         0.3 setosa \n     8          5           3.4          1.5         0.2 setosa \n     9          4.4         2.9          1.4         0.2 setosa \n    10          4.9         3.1          1.5         0.1 setosa \n    # … with more rows\n\ndplyr provides lazy querying with allows to short hand `tbl(con, sql(\"SELECT * FROM iris\"))`\nto `tbl(con, \"iris\")`. For more information please look at \u003chttps://solutions.posit.co/connections/db/r-packages/dplyr/\u003e.\n\n``` r\ntbl(con, \"iris\")\n```\n\n    # Source:   table\u003ciris\u003e [?? x 5]\n    # Database: Athena 1.9.210 [eu-west-1/default]\n       sepal_length sepal_width petal_length petal_width species\n              \u003cdbl\u003e       \u003cdbl\u003e        \u003cdbl\u003e       \u003cdbl\u003e \u003cchr\u003e  \n     1          5.1         3.5          1.4         0.2 setosa \n     2          4.9         3            1.4         0.2 setosa \n     3          4.7         3.2          1.3         0.2 setosa \n     4          4.6         3.1          1.5         0.2 setosa \n     5          5           3.6          1.4         0.2 setosa \n     6          5.4         3.9          1.7         0.4 setosa \n     7          4.6         3.4          1.4         0.3 setosa \n     8          5           3.4          1.5         0.2 setosa \n     9          4.4         2.9          1.4         0.2 setosa \n    10          4.9         3.1          1.5         0.1 setosa \n    # … with more rows\n\nQuerying Athena with `profile_name` instead of hard coding\n`aws_access_key_id` and `aws_secret_access_key`. By using `profile_name`\nextra Meta Data is returned in the query to give users extra\ninformation.\n\n``` r\ncon \u003c- dbConnect(RAthena::athena(),\n                profile_name = \"your_profile\",\n                s3_staging_dir='s3://path/to/query/bucket/')\ntbl(con, \"iris\")) %\u003e% \n  filter(petal_length \u003c 1.3)\n```\n\n    # Source:   lazy query [?? x 5]\n    # Database: Athena 1.9.210 [your_profile@eu-west-1/default]\n       sepal_length sepal_width petal_length petal_width species\n              \u003cdbl\u003e       \u003cdbl\u003e        \u003cdbl\u003e       \u003cdbl\u003e \u003cchr\u003e  \n     1          4.7         3.2          1.3         0.2 setosa \n     2          4.3         3            1.1         0.1 setosa \n     3          5.8         4            1.2         0.2 setosa \n     4          5.4         3.9          1.3         0.4 setosa \n     5          4.6         3.6          1           0.2 setosa \n     6          5           3.2          1.2         0.2 setosa \n     7          5.5         3.5          1.3         0.2 setosa \n     8          4.4         3            1.3         0.2 setosa \n     9          5           3.5          1.3         0.3 setosa \n    10          4.5         2.3          1.3         0.3 setosa \n    # … with more rows\n\n``` r\ntbl(con, \"iris\") %\u003e% \n  select(contains(\"sepal\"), contains(\"petal\"))\n```\n\n    # Source:   lazy query [?? x 4]\n    # Database: Athena 1.9.210 [your_profile@eu-west-1/default]\n       sepal_length sepal_width petal_length petal_width\n              \u003cdbl\u003e       \u003cdbl\u003e        \u003cdbl\u003e       \u003cdbl\u003e\n     1          5.1         3.5          1.4         0.2\n     2          4.9         3            1.4         0.2\n     3          4.7         3.2          1.3         0.2\n     4          4.6         3.1          1.5         0.2\n     5          5           3.6          1.4         0.2\n     6          5.4         3.9          1.7         0.4\n     7          4.6         3.4          1.4         0.3\n     8          5           3.4          1.5         0.2\n     9          4.4         2.9          1.4         0.2\n    10          4.9         3.1          1.5         0.1\n    # … with more rows\n\nUpload data using `dplyr` function `copy_to` and `compute`.\n\n``` r\nlibrary(DBI)\nlibrary(dplyr)\n\ncon \u003c- dbConnect(RAthena::athena(),\n                profile_name = \"your_profile\",\n                s3_staging_dir='s3://path/to/query/bucket/')\n```\n\nWrite data.frame to Athena table\n```r\ncopy_to(con, mtcars,\n        s3_location = \"s3://mybucket/data/\")\n```              \n\nWrite Athena table from tbl_sql\n```r\nathena_mtcars \u003c- tbl(con, \"mtcars\")\nmtcars_filter \u003c- athena_mtcars %\u003e% filter(gear \u003e=4)\n```\n\nCreate athena with unique table name\n```r\nmtcars_filer %\u003e% compute()\n```\n\nCreate athena with specified name and s3 location\n```r\nmtcars_filer %\u003e% \n  compute(\"mtcars_filer\",\n          s3_location = \"s3://mybucket/mtcars_filer/\")\n\n# Disconnect from Athena\ndbDisconnect(con)\n```\n\n## Work Groups\n\nCreating work group:\n\n``` r\nlibrary(RAthena)\nlibrary(DBI)\n\ncon \u003c- dbConnect(RAthena::athena(),\n                profile_name = \"your_profile\",\n                encryption_option = \"SSE_S3\",\n                s3_staging_dir='s3://path/to/query/bucket/')\n\ncreate_work_group(con, \"demo_work_group\", description = \"This is a demo work group\",\n                  tags = tag_options(key= \"demo_work_group\", value = \"demo_01\"))\n```\n\nList work groups:\n\n``` r\nlist_work_groups(con)\n```\n\n    [[1]]\n    [[1]]$Name\n    [1] \"demo_work_group\"\n    \n    [[1]]$State\n    [1] \"ENABLED\"\n    \n    [[1]]$Description\n    [1] \"This is a demo work group\"\n    \n    [[1]]$CreationTime\n    2019-09-06 18:51:28.902000+01:00\n    \n    \n    [[2]]\n    [[2]]$Name\n    [1] \"primary\"\n    \n    [[2]]$State\n    [1] \"ENABLED\"\n    \n    [[2]]$Description\n    [1] \"\"\n    \n    [[2]]$CreationTime\n    2019-08-22 16:14:47.902000+01:00\n\nUpdate work group:\n\n``` r\nupdate_work_group(con, \"demo_work_group\", description = \"This is a demo work group update\")\n```\n\nReturn work group meta data:\n\n``` r\nget_work_group(con, \"demo_work_group\")\n```\n\n    $Name\n    [1] \"demo_work_group\"\n    \n    $State\n    [1] \"ENABLED\"\n    \n    $Configuration\n    $Configuration$ResultConfiguration\n    $Configuration$ResultConfiguration$OutputLocation\n    [1] \"s3://path/to/query/bucket/\"\n    \n    $Configuration$ResultConfiguration$EncryptionConfiguration\n    $Configuration$ResultConfiguration$EncryptionConfiguration$EncryptionOption\n    [1] \"SSE_S3\"\n    \n    \n    \n    $Configuration$EnforceWorkGroupConfiguration\n    [1] FALSE\n    \n    $Configuration$PublishCloudWatchMetricsEnabled\n    [1] FALSE\n    \n    $Configuration$BytesScannedCutoffPerQuery\n    [1] 10000000\n    \n    $Configuration$RequesterPaysEnabled\n    [1] FALSE\n    \n    \n    $Description\n    [1] \"This is a demo work group update\"\n    \n    $CreationTime\n    2019-09-06 18:51:28.902000+01:00\n\nConnect to Athena using work group:\n\n``` r\ncon \u003c- dbConnect(RAthena::athena(),\n                profile_name = \"your_profile\",\n                work_group = \"demo_work_group\")\n```\n\nDelete work group:\n\n``` r\ndelete_work_group(con, \"demo_work_group\")\n```\n\n# Similar Projects\n\n## Python:\n\n  - `pyAthena` - A python wrapper of the python package `Boto3` using\n    the sqlAlchemy framework:\n    \u003chttps://github.com/laughingman7743/PyAthena\u003e\n  - `pyAthenaJDBC` - A python interface into AWS Athena’s JDBC drivers:\n    \u003chttps://github.com/laughingman7743/PyAthenaJDBC\u003e\n\n## R:\n\n  - `AWR.Athena` - A R wrapper of RJDBC for the AWS Athena’s JDBC\n    drivers: \u003chttps://github.com/nfultz/AWR.Athena\u003e\n  - `noctua` - A R wrapper of the R AWS SDK [`paws`](https://github.com/paws-r/paws) to develop a DBI interface \u003chttps://github.com/DyfanJones/noctua\u003e\n  - `awsathena` - rJava Interface to AWS Athena SDK \u003chttps://github.com/hrbrmstr/awsathena\u003e\n  - `metis` - Helpers for Accessing and Querying Amazon Athena using R, Including a lightweight RJDBC shim \u003chttps://github.com/hrbrmstr/metis\u003e\n  - `metisjars` - JARs for `metis` \u003chttps://github.com/hrbrmstr/metis-jars\u003e\n  - `metis.tidy` - Access and Query Amazon Athena via the Tidyverse \u003chttps://github.com/hrbrmstr/metis-tidy\u003e\n  \n`awsathena` and `metis` family of packages are currently used in production every day to analyze petabytes of internet scan and honeypot data.\n\n## Comparison:\n\nThe reason why `RAthena` stands slightly apart from `AWR.Athena` is that\n`AWR.Athena` uses the Athena JDBC drivers and `RAthena` uses the Python\nAWS SDK `Boto3`. The ultimate goal is to provide an extra method for R\nusers to interface with AWS Athena. As `pyAthena` is the most similar\nproject, this project has used an appropriate name to reflect this …\n`RAthena`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDyfanJones%2FRAthena","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDyfanJones%2FRAthena","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDyfanJones%2FRAthena/lists"}