{"id":23789817,"url":"https://github.com/officiallysingh/spring-boot-batch-cloud-task","last_synced_at":"2025-04-13T03:08:19.597Z","repository":{"id":214146320,"uuid":"732267183","full_name":"officiallysingh/spring-boot-batch-cloud-task","owner":"officiallysingh","description":"Spring batch job as Spring cloud task","archived":false,"fork":false,"pushed_at":"2025-03-18T05:14:06.000Z","size":601,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T03:08:13.788Z","etag":null,"topics":["fault-tolerance","job","partitioning","scalability","spring-batch","spring-batch-example","spring-batch-jobs","spring-cloud-task"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/officiallysingh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-16T05:22:15.000Z","updated_at":"2025-03-18T05:14:10.000Z","dependencies_parsed_at":"2023-12-26T07:58:59.929Z","dependency_job_id":"7801aca2-3893-448b-a629-f6ced78ce563","html_url":"https://github.com/officiallysingh/spring-boot-batch-cloud-task","commit_stats":null,"previous_names":["officiallysingh/spring-boot-batch-cloud-task"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/officiallysingh%2Fspring-boot-batch-cloud-task","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/officiallysingh%2Fspring-boot-batch-cloud-task/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/officiallysingh%2Fspring-boot-batch-cloud-task/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/officiallysingh%2Fspring-boot-batch-cloud-task/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/officiallysingh","download_url":"https://codeload.github.com/officiallysingh/spring-boot-batch-cloud-task/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248657918,"owners_count":21140846,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fault-tolerance","job","partitioning","scalability","spring-batch","spring-batch-example","spring-batch-jobs","spring-cloud-task"],"created_at":"2025-01-01T17:16:52.591Z","updated_at":"2025-04-13T03:08:19.575Z","avatar_url":"https://github.com/officiallysingh.png","language":"Java","readme":"# Spring Batch Job implementation as Spring Cloud Task\n\n[**Spring Cloud Task**](https://docs.spring.io/spring-cloud-task/docs/current/reference/html/) is a framework for creating and orchestrating short-lived microservices.\nSo It's a good fit for Spring Batch Jobs as the JVM persists until the job is completed and subsequently exits, freeing up resources.\n\n![String Batch Architecture](https://github.com/officiallysingh/spring-boot-batch-cloud-task/blob/main/Spring_Batch_Cloud_Task.jpg)\n\n## Introduction\nThis project is a simple example of how to implement a Spring Batch Job as a Spring Cloud Task.\nIt implements a hypothetical use case to generate Credit card statements \ncontaining aggregate daily transaction amounts date-wise for a particular month.\n* Reads Credit card accounts from a MongoDB collection `accounts` in database `account_db` and partition on these account numbers for high performance.\n* Reads transactions from MongoDB collection `transactions` in database `transaction_db` using pagination. Aggregates transaction amounts per day.\n* Processes the date-wise transaction amount and writes the output to MongoDB collection `statements` in database `statement_db`.\n* It is fault-tolerant i.e. try to recover from transient failures and skip bad records. \n* It supports restartability from last failure point.\n\n## Installation\nClone this repository, import in your favourite IDE as either Maven or Gradle project.\nRequires Java 21, Spring boot 3.2.0+ and Spring batch 5.1.0+.\n\n### Docker compose\nApplication is bundled with [**`Spring boot Docker compose`**](https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#features.docker-compose).\n* If you have docker installed, then simply run the application in `docker` profile by passing `spring.profiles.active=docker`\n  as program argument from your IDE.\n* Depending on your current working directory in IDE, you may need to change `spring.docker.compose.file=spring-boot-mongodb-auditing/compose.yml`\n  to `spring.docker.compose.file=compose.yml` in [**`application-docker.yml`**](src/main/resources/config/application-docker.yml)\n* Make sure the host ports mapped in [**`Docker compose file`**](compose.yml) are available or change the ports and\n  do the respective changes in database configurations [**`application-docker.yml`**](src/main/resources/config/application-docker.yml)\n\n### Explicit MongoDB and Postgres installation\nChange to your MongoDB URI in [**`application.yml`**](src/main/resources/config/application.yml) file as follows.\n```yaml\nspring:\n  datasource:\n    url: \u003cYour Postgres Database URL\u003e/\u003cYour Database name\u003e\n    username: \u003cYour Database username\u003e\n    password: \u003cYour Database password\u003e\n  data:\n    mongodb:\n      uri: \u003cYour MongoDB URI\u003e\n```\n\u003e [!IMPORTANT]\nMake sure **flyway** is enabled as Spring Batch and Spring Cloud Task needs their [`schema`](src/main/resources/db/migration/V1.1__scdf_schema.sql) to be created.\nUsed internally by the framework to persist and retrieve metadata about the jobs and tasks.\n\n### Sample Data\nOn first run, it creates schema and populates sample data for past three months into MongoDB collections. \nFor details refer to [`DataPopulator`](src/main/java/com/ksoot/batch/DataPopulator.java).\nDepending on dataset size to be created the application may take a while to start, the first time. In subsequent runs, it will start quickly.\nYou can change the number of accounts to be created as follows\n```java\n// Total number of Credit card accounts to be created\n// For each account upto 10 transactions are created for each day of last 3 months\nprivate static final int ACCOUNTS_COUNT = 1000;\n// Number of records to be created in a batch\nprivate static final int BATCH_SIZE = 1000;\n```\n\n### Job Parameters\nJob may take following optional parameters, defaults are taken if not specified. \nRefer to [`StatementJobTask`](src/main/java/com/ksoot/batch/StatementJobTask.java) for more details.\n* `cardNumbers` - Comma separated list of Credit card numbers to process. If not specified, all accounts are processed.\nExample: `cardNumbers=5038-1972-4899-4180,5752-0862-5835-3760`\n* `month` - Month (IST) in ISO format yyyy-MM, for which statement is to be generated. If not specified, last month is taken.\nExample: `month=2023-11`\n* `forceRestart` - If set to true, job is restarted even if its last execution with same parameters was successful. \nIf not specified `false` is taken as default, in this case if its last execution with same parameters was successful then Job would not execute again.\nExample: `forceRestart=true`\n\n\u003e [!IMPORTANT]\nThese parameters can be passed as program arguments from your IDE as follows.\n```shell\n--cardNumbers=5038-1972-4899-4180,5752-0862-5835-3760 --month=2023-11 --forceRestart=true\n```\n\n![IntelliJ Run Configuration](https://github.com/officiallysingh/spring-boot-batch-cloud-task/blob/main/IntelliJ_Run_Configuration.png)\n\n## Implementation\nThe application uses [**`spring-batch-commons`**](https://github.com/officiallysingh/spring-batch-commons) to avail common Spring Batch components, out of box.\nMaven\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eio.github.officiallysingh\u003c/groupId\u003e\n    \u003cartifactId\u003espring-batch-commons\u003c/artifactId\u003e\n    \u003cversion\u003e1.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\nOr Gradle\n```groovy\nimplementation 'io.github.officiallysingh:spring-batch-commons:1.0'\n```\n\n### Job Configuration\nDefines a Partitioned Job with a single step as follows. \nFor details, refer to [`StatementJobConfiguration`](src/main/java/com/ksoot/batch/job/StatementJobConfiguration.java).\nReader and Writer are self-explanatory. Processor should contain all business logic and Multiple processors can be chained together using \n[`CompositeItemProcessor`](https://docs.spring.io/spring-batch/docs/current/api/org/springframework/batch/item/support/CompositeItemProcessor.html).\n[`BeanValidatingItemProcessor`](https://docs.spring.io/spring-batch/docs/current/api/org/springframework/batch/item/validator/BeanValidatingItemProcessor.html) is used to validate the input data.\n```java\n@Configuration\n@AutoConfigureAfter(value = {BatchConfiguration.class})\nclass StatementJobConfiguration extends JobConfigurationSupport\u003cDailyTransaction, Statement\u003e {\n\n  @Bean\n  Job statementJob(\n      @Qualifier(\"statementJobPartitioner\") final AccountsPartitioner statementJobPartitioner,\n      final ItemReader\u003cDailyTransaction\u003e transactionReader,\n      final ItemProcessor\u003cDailyTransaction, Statement\u003e statementProcessor,\n      final ItemWriter\u003cStatement\u003e statementWriter)\n      throws Exception {\n    return newPartitionedJob(\n        AppConstants.STATEMENT_JOB_NAME,\n        statementJobPartitioner,\n        transactionReader,\n        statementProcessor,\n        statementWriter);\n  }\n\n  @Bean\n  @StepScope\n  AccountsPartitioner statementJobPartitioner(\n      @Qualifier(\"accountMongoTemplate\") final MongoTemplate accountMongoTemplate,\n      @Value(\"#{jobParameters['\" + AppConstants.JOB_PARAM_NAME_CARD_NUMBERS + \"']}\")\n          final List\u003cString\u003e cardNumbers) {\n    return new AccountsPartitioner(accountMongoTemplate, this.batchProperties, cardNumbers);\n  }\n\n  @Bean\n  @StepScope\n  MongoAggregationPagingItemReader\u003cDailyTransaction\u003e transactionReader(\n      @Qualifier(\"transactionMongoTemplate\") final MongoTemplate transactionMongoTemplate,\n      @Value(\"#{jobParameters['\" + AppConstants.JOB_PARAM_NAME_STATEMENT_MONTH + \"']}\")\n          final String month,\n      @Value(\"#{stepExecutionContext['\" + AppConstants.CARD_NUMBERS_KEY + \"']}\")\n          final String cardNumbers) {\n\n    final YearMonth statementMonth = YearMonth.parse(month);\n    List\u003cString\u003e cardNumbersList =\n        StringUtils.isNotBlank(cardNumbers)\n            ? Arrays.asList(cardNumbers.split(PARTITION_DATA_VALUE_SEPARATOR))\n            : Collections.emptyList();\n\n    OffsetDateTime fromDateTime =\n        statementMonth.atDay(1).atStartOfDay().atOffset(DateTimeUtils.ZONE_OFFSET_IST);\n    OffsetDateTime tillDateTime =\n        statementMonth\n            .atEndOfMonth()\n            .plusDays(1)\n            .atStartOfDay()\n            .atOffset(DateTimeUtils.ZONE_OFFSET_IST);\n    Criteria condition = null;\n    if (CollectionUtils.isNotEmpty(cardNumbersList)) {\n      condition =\n          Criteria.where(\"card_number\")\n              .in(cardNumbersList)\n              .and(\"datetime\")\n              .gte(fromDateTime)\n              .lt(tillDateTime);\n    } else {\n      condition = Criteria.where(\"datetime\").gte(fromDateTime).lt(tillDateTime);\n    }\n\n    final AggregationOperation[] aggregationOperations =\n        new AggregationOperation[] {\n          match(condition),\n          project(\"card_number\", \"amount\", \"datetime\")\n              .andExpression(\"{$toDate: '$datetime'}\")\n              .as(\"date\"),\n          group(\"card_number\", \"date\").sum(\"amount\").as(\"amount\"),\n          project(\"card_number\", \"date\", \"amount\").andExclude(\"_id\"),\n          sort(Sort.Direction.ASC, \"card_number\", \"date\")\n        };\n\n    MongoAggregationPagingItemReader\u003cDailyTransaction\u003e itemReader =\n        new MongoAggregationPagingItemReader\u003c\u003e();\n    itemReader.setName(\"transactionsReader\");\n    itemReader.setTemplate(transactionMongoTemplate);\n    itemReader.setCollection(\"transactions\");\n    itemReader.setTargetType(DailyTransaction.class);\n    itemReader.setAggregationOperation(aggregationOperations);\n    itemReader.setPageSize(this.batchProperties.getPageSize());\n    return itemReader;\n  }\n\n  @Bean\n  CompositeItemProcessor\u003cDailyTransaction, Statement\u003e statementProcessor(\n      final BeanValidatingItemProcessor\u003cDailyTransaction\u003e beanValidatingDailyTransactionProcessor) {\n    final CompositeItemProcessor\u003cDailyTransaction, Statement\u003e compositeProcessor =\n        new CompositeItemProcessor\u003c\u003e();\n    compositeProcessor.setDelegates(\n        Arrays.asList(beanValidatingDailyTransactionProcessor, new StatementProcessor()));\n    return compositeProcessor;\n  }\n\n  @Bean\n  BeanValidatingItemProcessor\u003cDailyTransaction\u003e beanValidatingDailyTransactionProcessor(\n      final LocalValidatorFactoryBean validatorFactory) {\n    return new BeanValidatingItemProcessor\u003c\u003e(validatorFactory);\n  }\n\n  // Idempotent upsert\n  @Bean\n  MongoItemWriter\u003cStatement\u003e statementWriter(\n      @Qualifier(\"mongoTemplate\") final MongoTemplate statementMongoTemplate) {\n    return MongoItemWriters.\u003cStatement\u003etemplate(statementMongoTemplate)\n        .collection(\"statements\")\n        .idGenerator(\n            (Statement item) -\u003e\n                MongoIdGenerator.compositeIdGenerator(item.cardNumber(), item.transactionDate()))\n        .build();\n  }\n}\n```\n\n\u003e [!IMPORTANT]\nAny component needing access to `stepExecutionContext` must be defined as `@StepScope` bean\nand to access `jobParameters` or `jobExecutionContext` must be defined as `@JobScope` bean\n\n### Job Partitioning\nIf specific `cardNumbers` are passed as job parameters, then the job is partitioned on these account numbers only. \nOtherwise, all accounts are processed in parallel by partitioning on account numbers.\nFor details refer to [`AccountsPartitioner`](src/main/java/com/ksoot/batch/job/AccountsPartitioner.java).\n```java\n@Slf4j\npublic class AccountsPartitioner extends AbstractPartitioner {\n\n  private final MongoTemplate accountMongoTemplate;\n\n  private final List\u003cString\u003e cardNumbers;\n\n  AccountsPartitioner(\n      @Qualifier(\"accountMongoTemplate\") final MongoTemplate accountMongoTemplate,\n      final BatchProperties batchProperties,\n      final List\u003cString\u003e cardNumbers) {\n    super(batchProperties, AppConstants.CARD_NUMBERS_KEY);\n    this.accountMongoTemplate = accountMongoTemplate;\n    this.cardNumbers = cardNumbers;\n  }\n\n  @Override\n  public List\u003cString\u003e partitioningList() {\n    final Bson condition =\n        CollectionUtils.isNotEmpty(this.cardNumbers)\n            ? in(\"card_number\", this.cardNumbers)\n            : Filters.empty();\n    return this.accountMongoTemplate\n        .getCollection(\"accounts\")\n        .find(condition)\n        .projection(fields(excludeId(), include(\"card_number\")))\n        .sort(ascending(\"card_number\"))\n        .map(doc -\u003e doc.getString(\"card_number\"))\n        .into(new ArrayList\u003c\u003e());\n  }\n}\n```\n\n### Data Sources configurations\nDifferent databases can be configured for `statement_db`, `account_db` and `transaction_db` or all can be set to same database URI as follows.\n**Converters** and **Codecs** are registered to support `OffsetDateTime` and `ZonedDateTime` types in `MongoTemplate`.\nRefer to [`MongoDBConfig`](src/main/java/com/ksoot/batch/config/MongoDBConfig.java) for details.\n```yaml\nspring:\n  data:\n    mongodb:\n      uri: \u003cStatement DB URI\u003e\n      database: statement_db\n      account:\n        uri: \u003cAccount DB URI\u003e\n        database: account_db\n      transaction:\n        uri: \u003cTransaction DB URI\u003e\n        database: transaction_db\n```\n\n## Configurations\nFollowing are the configuration properties to customize default Spring batch behaviour.\n```yaml\nbatch:\n  chunk-size: 100\n  skip-limit: 10\n  max-retries: 3\n  backoff-initial-delay: PT3S\n  backoff-multiplier: 2\n  page-size: 300\n  partition-size: 16\n  trigger-partitioning-threshold: 100\n#  task-executor: applicationTaskExecutor\n#  run-id-sequence: run_id_sequence\n```\n\n* **`batch.chunk-size`** : Number of items that are processed in a single transaction by a chunk-oriented step, Default: 100.\n* **`batch.skip-limit`** : Maximum number of items to skip as per configured Skip policy, exceeding which fails the job, Default: 10.\n* **`batch.max-retries`** : Maximum number of retry attempts as configured Retry policy, exceeding which fails the job, Default: 3.\n* **`batch.backoff-initial-delay`** : Time duration (in java.time.Duration format) to wait before the first retry attempt is made after a failure, Default: false.\n* **`batch.backoff-multiplier`** : Factor by which the delay between consecutive retries is multiplied, Default: 3.\n* **`batch.page-size`** : Number of records to be read in each page by Paging Item readers, Default: 100.\n* **`batch.partition-size`** : Number of partitions that will be used to process the data concurrently.\n  Should be optimized as per available machine resources, Default: 8.\n* **`batch.trigger-partitioning-threshold`** : Minimum number of records to trigger partitioning otherwise\n  it could be counter productive to do partitioning, Default: 100.\n* **`batch.task-executor`** : Bean name of the Task Executor to be used for executing the jobs. By default `SyncTaskExecutor` is used.\n  Set to `applicationTaskExecutor` to use `SimpleAsyncTaskExecutor` provided by Spring.\n  Or use any other custom `TaskExecutor` and set the bean name here. Don't set this property in Spring cloud task but Spring Rest applications.\n* **`batch.run-id-sequence`** : Run Id database sequence name, Default: `run_id_sequence`.\n\n\u003e [!IMPORTANT]\nIt is recommended not to set `batch.task-executor` to `AsyncTaskExecutor` as the application may not exit because of that.\nSpring cloud task should be executed synchronously.\n\n## Author\n[**Rajveer Singh**](https://www.linkedin.com/in/rajveer-singh-589b3950/), In case you find any issues or need any support, please email me at raj14.1984@gmail.com\n\n## References\n* Refer to Spring batch common components and utilities [**`spring-batch-commons`**](https://github.com/officiallysingh/spring-batch-commons).\n* Refer to Spring Batch Job implemented as Spring Rest application [**`spring-boot-batch-web`**](https://github.com/officiallysingh/spring-boot-batch-web).\n* For exception handling refer to [**`spring-boot-problem-handler`**](https://github.com/officiallysingh/spring-boot-problem-handler).\n* For Spring Data MongoDB Auditing refer to [**`spring-boot-mongodb-auditing`**](https://github.com/officiallysingh/spring-boot-mongodb-auditing).\n* For more details on Spring Batch refer to [**`Spring Batch Reference`**](https://docs.spring.io/spring-batch/reference/index.html).\n* To deploy on Spring cloud Data Flow refer to [**`Spring Cloud Data Flow Reference`**](https://spring.io/projects/spring-cloud-dataflow/).","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fofficiallysingh%2Fspring-boot-batch-cloud-task","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fofficiallysingh%2Fspring-boot-batch-cloud-task","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fofficiallysingh%2Fspring-boot-batch-cloud-task/lists"}