An open API service indexing awesome lists of open source software.

https://github.com/expediagroup/apiary-metastore-docker

Docker image for Apiary Data Lake metastore
https://github.com/expediagroup/apiary-metastore-docker

apiary docker hive

Last synced: 3 months ago
JSON representation

Docker image for Apiary Data Lake metastore

Awesome Lists containing this project

README

          

# Overview

For more information please refer to the main [Apiary](https://github.com/ExpediaGroup/apiary) project page.

## Environment Variables
| Environment Variable | Required | Description |
|------------------------------------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| APIARY_S3_INVENTORY_PREFIX | No | Prefix used by S3 Inventory when creating data in the inventory bucket. Default is `EntireBucketDaily`. |
| APIARY_S3_INVENTORY_TABLE_FORMAT | No | Format of S3 inventory data. Valid options are `ORC`, `Parquet`, or `CSV`. Default is `ORC`. |
| APIARY_SYSTEM_SCHEMA | No | Name for internal system database. Default is `apiary_system`. |
| AWS_REGION | Yes | AWS region to configure various AWS clients. |
| AWS_WEB_IDENTITY_TOKEN_FILE | No | Path of the AWS Web Identity Token File for IRSA/OIDC AWS authentication. |
| DATANUCLEUS_CONNECTION_POOLING_TYPE | No | Type of connection pooling. Valid options are `BoneCP`, `DBCP`, `DBCP2`, `C3P0`, `HikariCP`. |
| DATANUCLEUS_CONNECTION_POOL_MAX_POOLSIZE | No | Maximum pool size for the connection pool. |
| DATANUCLEUS_CONNECTION_POOL_MIN_POOLSIZE | No | Minimum pool size for the connection pool. |
| DATANUCLEUS_CONNECTION_POOL_INITIAL_POOLSIZE | No | Initial pool size for the connection pool (C3P0 only). |
| DATANUCLEUS_CONNECTION_POOL_MAX_IDLE | No | Maximum idle connections for the connection pool. |
| DATANUCLEUS_CONNECTION_POOL_MIN_IDLE | No | Minimum idle connections for the connection pool. |
| DATANUCLEUS_CONNECTION_POOL_MIN_ACTIVE | No | Maximum active connections for the connection pool (DBCP/DBCP2 only). |
| DATANUCLEUS_CONNECTION_POOL_MAX_WAIT | No | Maximum wait time for the connection pool (DBCP/DBCP2 only). |
| DATANUCLEUS_CONNECTION_POOL_VALIDATION_TIMEOUT | No | Validation timeout for the connection pool (DBCP/DBCP2/HikariCP only). |
| DATANUCLEUS_CONNECTION_POOL_LEAK_DETECTION_THRESHOLD | No | Leak detection threshold for the connection pool (HikariCP only). |
| DATANUCLEUS_CONNECTION_POOL_LEAK_MAX_LIFETIME | No | Maximum lifetime for the connection pool (HikariCP only). |
| DATANUCLEUS_CONNECTION_POOL_AUTO_COMMIT | No | Auto commit for the connection pool (HikariCP only). |
| DATANUCLEUS_CONNECTION_POOL_IDLE_TIMEOUT | No | Idle timeout for the connection pool (HikariCP only). |
| DATANUCLEUS_CONNECTION_POOL_CONNECTION_WAIT_TIMEOUT | No | Connection wait timeout for the connection pool (HikariCP only). |
| DATANUCLEUS_CONNECTION_POOL_READ_ONLY | No | Read only mode for the connection pool (HikariCP only). |
| DATANUCLEUS_CONNECTION_POOL_NAME | No | Connection pool name (HikariCP only). |
| DATANUCLEUS_CONNECTION_POOL_CATALOG | No | Connection pool catalog (HikariCP only). |
| DATANUCLEUS_CONNECTION_POOL_REGISTER_MBEANS | No | Register MBeans for the connection pool (HikariCP only). |
| DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES | No | `true`/`false` value for hive.metastore.disallow.incompatible.col.type.changes, default `true`. |
| ENABLE_GLUESYNC | No | Option to turn on GlueSync Hive Metastore listener. |
| ENABLE_HIVE_LOCK_HOUSE_KEEPER | No | Option to turn on Hive Metastore Hive Lock House Keeper. |
| ENABLE_METRICS | No | Option to enable sending Hive Metastore and JMX metrics to Prometheus. |
| ENABLE_S3_INVENTORY | No | Option to create Hive tables on top of S3 inventory data if enabled in `apiary-data-lake`. Enabled if value is not null/empty. |
| ENABLE_S3_LOGS | No | Option to create Hive tables on top of S3 access logs data if enabled in `apiary-data-lake`. Enabled if value is not null/empty. |
| EXTERNAL_DATABASE | No | Option to enable external database mode, when specified it disables managing Hive Metastore MySQL database schema. |
| GLUE_PREFIX | No | Prefix added to Glue databases to handle database name collisions when synchronizing multiple Hive Metastores to the Glue catalog. |
| HADOOP_HEAPSIZE | No | Hive Metastore Java process heapsize. Default is `1024`. |
| HMS_AUTOGATHER_STATS | No | Whether or not to create basic statistics on table/partition creation. Valid values are `true` or `false`. Default is `true`. |
| LIMIT_PARTITION_REQUEST_NUMBER | No | To protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value `-1` means no limit. The limit on partitions does not affect metadata-only queries. |
| HIVE_METASTORE_ACCESS_MODE | No | Hive Metastore access mode, applicable values are: readwrite, readonly. |
| HIVE_DB_NAMES | No | Comma separated list of Hive database names, when specified Hive databases will be created and mapped to corresponding S3 buckets. |
| HIVE_METASTORE_LOG_LEVEL | No | Hive Metastore service Log4j log level. Default is `INFO`. |
| HMS_MIN_THREADS | No | Minimum size of the Hive metastore thread pool. Default is `200`. |
| HMS_MAX_THREADS | No | Maximum size of the Hive metastore thread pool. Default is `1000`. |
| INSTANCE_NAME | Yes | Apiary instance name, will be used as prefix on most AWS resources to allow multiple Apiary instance deployments. |
| KAFKA_BOOTSTRAP_SERVERS | No | Kafka Bootstrap Servers to enable Kafka Metastore listener and send Metastore events to Kafka. |
| KAFKA_CLIENT_ID | No | Kafka label you define that names the Kafka producer. |
| KAFKA_COMPRESSION_TYPE | No | Kafka Compression type, if none is specified there is no compression enabled. Values available are gzip, lz4 and snappy. Default is `1048576`. |
| KAFKA_MAX_REQUEST_SIZE | No | The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum uncompressed record batch size. |
| LDAP_BASE | No | LDAP base DN used to search for user groups. |
| LDAP_CA_CERT | No | Base64 encoded Certificate Authority Bundle to validate LDAP SSL connection. |
| LDAP_SECRET_ARN | No | LDAP bind DN SecretsManager secret ARN. |
| LDAP_URL | No | Active Directory URL to enable group mapping in metastore. |
| MYSQL_CONNECTION_DRIVER_NAME | No | Hive Metastore MySQL database JDBC connection Driver Name. Default is `com.mysql.jdbc.Driver`. |
| MYSQL_CONNECTION_POOL_SIZE | No | MySQL Connection pool size for Hive Metastore. Default is `10`. See [here](https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1181) for more info. |
| MYSQL_DB_HOST | Yes | Hive Metastore MySQL database hostname. |
| MYSQL_DB_NAME | Yes | Hive Metastore MySQL database name. |
| MYSQL_SECRET_ARN | Yes | Hive Metastore MySQL SecretsManager secret ARN. |
| MYSQL_SECRET_USERNAME_KEY | No | Hive Metastore MySQL SecretsManager secret username key. Default is `username`. |
| MYSQL_TYPE | No | Hive Metastore MySQL database Type (mariadb, mysql). Default is `mysql`. |
| MYSQL_DRIVER_JAR | No | Hive Metastore MySQL connector JAR location. Default is `/usr/share/java/mysql-connector-java.jar`. |
| RANGER_AUDIT_DB_URL | No | Ranger audit database JDBC URL. |
| RANGER_AUDIT_SECRET_ARN | No | Ranger audit database secret ARN. |
| RANGER_AUDIT_SOLR_URL | No | Ranger Solr audit URL. |
| RANGER_POLICY_MANAGER_URL | No | Ranger admin URL from where policies will be downloaded. |
| RANGER_SERVICE_NAME | No | Ranger service name used to configure RangerAuth plugin. |
| SNS_ARN | No | The SNS topic ARN to which metadata updates will be
|

# Contact

## Mailing List
If you would like to ask any questions about or discuss Apiary please join our mailing list at

[https://groups.google.com/forum/#!forum/apiary-user](https://groups.google.com/forum/#!forum/apiary-user)

# Legal
This project is available under the [Apache 2.0 License](http://www.apache.org/licenses/LICENSE-2.0.html).

Copyright 2018-2019 Expedia, Inc.