{"id":21268984,"url":"https://github.com/qubole/s3-sqs-connector","last_synced_at":"2025-07-11T05:30:43.713Z","repository":{"id":44461848,"uuid":"191509568","full_name":"qubole/s3-sqs-connector","owner":"qubole","description":"A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).","archived":false,"fork":false,"pushed_at":"2021-04-29T13:43:27.000Z","size":42,"stargazers_count":16,"open_issues_count":8,"forks_count":11,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-04-17T22:49:43.905Z","etag":null,"topics":["s3","scala","spark","spark-streaming","sqs","streaming","structured-streaming"],"latest_commit_sha":null,"homepage":"http://www.qubole.com","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/qubole.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-12T06:18:07.000Z","updated_at":"2024-04-17T22:49:43.906Z","dependencies_parsed_at":"2022-09-17T07:11:19.866Z","dependency_job_id":null,"html_url":"https://github.com/qubole/s3-sqs-connector","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qubole%2Fs3-sqs-connector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qubole%2Fs3-sqs-connector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qubole%2Fs3-sqs-connector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/qubole%2Fs3-sqs-connector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/qubole","download_url":"https://codeload.github.com/qubole/s3-sqs-connector/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225693821,"owners_count":17509227,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["s3","scala","spark","spark-streaming","sqs","streaming","structured-streaming"],"created_at":"2024-11-21T08:06:58.922Z","updated_at":"2024-11-21T08:06:59.566Z","avatar_url":"https://github.com/qubole.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# S3-SQS Connector\n\n[![Build Status](https://travis-ci.org/qubole/s3-sqs-connector.svg?branch=master)](https://travis-ci.org/qubole/s3-sqs-connector)\n\nA library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming.). \n\n## Linking\n\nUsing SBT:\n\n    libraryDependencies += \"com.qubole\" %% \"spark-sql-streaming-sqs_{{site.SCALA_BINARY_VERSION}}\" % \"{{site.PROJECT_VERSION}}\"\n\nUsing Maven:\n\n    \u003cdependency\u003e\n        \u003cgroupId\u003ecom.qubole\u003c/groupId\u003e\n        \u003cartifactId\u003espark-sql-streaming-sqs_{{site.SCALA_BINARY_VERSION}}\u003c/artifactId\u003e\n        \u003cversion\u003e{{site.PROJECT_VERSION}}\u003c/version\u003e\n    \u003c/dependency\u003e\n\nThis library can also be added to Spark jobs launched through `spark-shell` or `spark-submit` by using the `--packages` command line option.\nFor example, to include it when starting the spark shell:\n\n    $ bin/spark-shell --packages com.qubole:spark-sql-streaming-sqs_{{site.SCALA_BINARY_VERSION}}:{{site.PROJECT_VERSION}}\n\nUnlike using `--jars`, using `--packages` ensures that this library and its dependencies will be added to the classpath.\nThe `--packages` argument can also be used with `bin/spark-submit`.\n\nThis library is compiled for Scala 2.11 only, and intends to support Spark 2.4.0 onwards.\n\n## Building S3-SQS Connector\n\nS3-SQS Connector is built using Apache Maven](http://maven.apache.org/).\n\nTo build S3-SQS connector, clone this repository and run:\n```\nmvn -DskipTests clean package\n```\n\nThis will create `target/spark-sql-streaming-sqs_2.11-0.5.1.jar` file which contains s3-sqs connector code and associated dependencies. Make sure the Scala and Java versions correspond to those required by your Spark cluster. We have tested it with Java 7/8, Scala 2.11 and Spark version 2.4.0.\n\n\n## Configuration options\nThe configuration is obtained from parameters.\n\nName |Default | Meaning\n--- |:---:| ---\nsqsUrl|required, no default value|sqs queue url, like 'https://sqs.us-east-1.amazonaws.com/330183209093/TestQueue'\nregion|required, no default value|AWS region where queue is created\nfileFormat|required, no default value|file format for the s3 files stored on Amazon S3\nschema|required, no default value|schema of the data being read \nsqsFetchIntervalSeconds|10|time interval (in seconds) after which to fetch messages from Amazon SQS queue\nsqsLongPollingWaitTimeSeconds|20|wait time (in seconds) for long polling on Amazon SQS queue \nsqsMaxConnections|1|number of parallel threads to connect to Amazon SQS queue\nsqsMaxRetries|10|Maximum number of consecutive retries in case of a connection failure to SQS before giving up\nignoreFileDeletion|false|whether to ignore any File deleted message in SQS queue\nfileNameOnly|false|Whether to check new files based on only the filename instead of on the full path\nshouldSortFiles|true|whether to sort files based on timestamp while listing them from SQS\nuseInstanceProfileCredentials|false|Whether to use EC2 instance profile credentials for connecting to Amazon SQS\nmaxFilesPerTrigger|no default value|maximum number of files to process in a microbatch\nmaxFileAge|7d|Maximum age of a file that can be found in this directory\n\n## Example\n\nAn example to create a SQL stream which uses Amazon SQS to list files on S3,\n\n        val inputDf = sparkSession\n                          .readStream\n                          .format(\"s3-sqs\")\n                          .schema(schema)\n                          .option(\"sqsUrl\", queueUrl)\n                          .option(\"region\", awsRegion)\n                          .option(\"fileFormat\", \"json\")\n                          .option(\"sqsFetchIntervalSeconds\", \"2\")\n                          .option(\"useInstanceProfileCredentials\", \"true\")\n                          .option(\"sqsLongPollingWaitTimeSeconds\", \"5\")\n                          .load()\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqubole%2Fs3-sqs-connector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqubole%2Fs3-sqs-connector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqubole%2Fs3-sqs-connector/lists"}