{"id":23122802,"url":"https://github.com/folio-org/mod-data-export-worker","last_synced_at":"2025-08-17T03:30:55.054Z","repository":{"id":37092957,"uuid":"333974380","full_name":"folio-org/mod-data-export-worker","owner":"folio-org","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-14T14:12:23.000Z","size":3060,"stargazers_count":3,"open_issues_count":3,"forks_count":8,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-08-14T14:21:21.127Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/folio-org.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-01-28T22:36:31.000Z","updated_at":"2025-08-14T14:12:27.000Z","dependencies_parsed_at":"2024-04-24T10:48:59.077Z","dependency_job_id":"cc3c46e1-1371-4e8f-897a-91806abd7190","html_url":"https://github.com/folio-org/mod-data-export-worker","commit_stats":null,"previous_names":[],"tags_count":92,"template":false,"template_full_name":null,"purl":"pkg:github/folio-org/mod-data-export-worker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/folio-org%2Fmod-data-export-worker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/folio-org%2Fmod-data-export-worker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/folio-org%2Fmod-data-export-worker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/folio-org%2Fmod-data-export-worker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/folio-org","download_url":"https://codeload.github.com/folio-org/mod-data-export-worker/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/folio-org%2Fmod-data-export-worker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270802794,"owners_count":24648646,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-17T02:00:09.016Z","response_time":129,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-17T07:30:31.815Z","updated_at":"2025-08-17T03:30:55.044Z","avatar_url":"https://github.com/folio-org.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mod-data-export-worker\n\nCopyright (C) 2021-2023 The Open Library Foundation\n\nThis software is distributed under the terms of the Apache License,\nVersion 2.0. See the file [LICENSE](LICENSE) for more information.\n\n## Introduction\nAPI for Data Export Worker module.\n\n## Additional information\nMore detail can be found on Data Export Worker wiki-page: [WIKI Data Export Worker](https://wiki.folio.org/pages/viewpage.action?pageId=52134948).\n\n### Issue tracker\nSee project [MODEXPW](https://issues.folio.org/browse/MODEXPW)\nat the [FOLIO issue tracker](https://dev.folio.org/guidelines/issue-tracker).\n\n### Other documentation\nOther [modules](https://dev.folio.org/source-code/#server-side) are described,\nwith further FOLIO Developer documentation at\n[dev.folio.org](https://dev.folio.org/)\n\n### Memory configuration\nTo stable module operating the following mod-data-export-worker configuration is required: Java args -XX:MetaspaceSize=384m -XX:MaxMetaspaceSize=512m -Xmx2048m,\nAWS container: memory - 3072, memory (soft limit) - 2600, cpu - 1024.\n\n### Environment variables\nAny S3-compatible storage (AWS S3, Minio Server) supported by the Minio Client can be used as such storage. Thus, in addition to the \nAWS configuration (AWS_URL, AWS_REGION, AWS_BUCKET, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) of the permanent storage, \none need to configure the environment settings for s3 subpathes (S3_SUB_PATH, S3_LOCAL_SUB_PATH). \nTypically, these options must specify a separate pathes.\nIt is also necessary to specify variable S3_IS_AWS to determine if AWS S3 is used as files storage. By default this variable is `false` and means that MinIO server is used as files storage.\nThis value should be `true` if AWS S3 is used as storage.\n\n| Name                                              | Default value                 | Description                                                                                                                                                                                           |\n|:--------------------------------------------------|:------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| KAFKA_HOST                                        | localhost                     | Kafka broker hostname                                                                                                                                                                                 |\n| KAFKA_PORT                                        | 9092                          | Kafka broker port                                                                                                                                                                                     |\n| KAFKA_CONSUMER_POLL_INTERVAL                      | 3600000                       | Max interval before next poll. If long record processing is in place and interval exceeded then consumer will be kicked out of the group and another consumer will start processing the same message. |\n| ENV                                               | folio                         | Environment name                                                                                                                                                                                      |\n| S3_URL                                            | http://127.0.0.1:9000/        | AWS url                                                                                                                                                                                               |\n| S3_REGION                                         | -                             | AWS region                                                                                                                                                                                            |\n| S3_BUCKET                                         | -                             | AWS bucket                                                                                                                                                                                            |\n| S3_ACCESS_KEY_ID                                  | -                             | AWS access key                                                                                                                                                                                        |\n| S3_SECRET_ACCESS_KEY                              | -                             | AWS secret key                                                                                                                                                                                        |\n| S3_SUB_PATH                                       | mod-data-export-worker/remote | S3 subpath for files storage                                                                                                                                                                          |\n| S3_LOCAL_SUB_PATH                                 | mod-data-export-worker/local  | S3 subpath for local files storage                                                                                                                                                                    |\n| S3_IS_AWS                                         | false                         | Specify if AWS S3 is used as files storage                                                                                                                                                            |\n| URL_EXPIRATION_TIME                               | 604800                        | Presigned url expiration time (in seconds)                                                                                                                                                            |\n| DATA_EXPORT_JOB_UPDATE_TOPIC_PARTITIONS           | 50                            | Number of partitions for topic                                                                                                                                                                        |\n| KAFKA_CONCURRENCY_LEVEL                           | 30                            | Concurrency level of kafka listener                                                                                                                                                                   |\n| E_HOLDINGS_BATCH_JOB_CHUNK_SIZE                   | 100                           | Specify chunk size for eHoldings export job which will be used to query data from kb-ebsco, write to database, read from database and write to file                                                   |\n| E_HOLDINGS_BATCH_KB_EBSCO_CHUNK_SIZE              | 100                           | Amount to retrieve per request to mod-kb-ebsco-java (100 is max acceptable value)                                                                                                                     |\n| AUTHORITY_CONTROL_BATCH_JOB_CHUNK_SIZE            | 100                           | Specify chunk size for authority control export job which will be used to query data from entities-links, and write to file                                                                           |\n| AUTHORITY_CONTROL_BATCH_ENTITIES_LINKS_CHUNK_SIZE | 100                           | Amount to retrieve per request to mod-entities-links                                                                                                                                                  |\n| MAX_UPLOADED_FILE_SIZE                            | 40MB                          | Specifies multipart upload file size                                                                                                                                                                  |\n| PLATFORM                                          | okapi                         | Specifies if okapi or eureka platform                                                                                                                                                                 |\n| CHUNKS                                            | 100                           | Number of items being passed to write at once                                                                                                                                                         |\n| CORE_POOL_SIZE                                    | 10                            | Maximum number of threads being created for each task before the queue is utilized                                                                                                                    |\n| MAX_POOL_SIZE                                     | 10                            | Maximum number of threads that can be created after the queue is full and before rejecting the new tasks                                                                                              |\n| BUCKET_SIZE                                       | 50                            | Size of the bucket used in partitioning parameters                                                                                                                                                    |","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffolio-org%2Fmod-data-export-worker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffolio-org%2Fmod-data-export-worker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffolio-org%2Fmod-data-export-worker/lists"}