Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/exasol/virtual-schema-common-document-files
This Virtual Schemas allows you to access documents, stored in files, like any regular Exasol table.
https://github.com/exasol/virtual-schema-common-document-files
bucketfs exasol exasol-integration json virtual-schema
Last synced: 6 days ago
JSON representation
This Virtual Schemas allows you to access documents, stored in files, like any regular Exasol table.
- Host: GitHub
- URL: https://github.com/exasol/virtual-schema-common-document-files
- Owner: exasol
- License: mit
- Created: 2020-09-14T08:48:08.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-10-17T14:55:16.000Z (about 1 month ago)
- Last Synced: 2024-10-19T21:33:32.920Z (about 1 month ago)
- Topics: bucketfs, exasol, exasol-integration, json, virtual-schema
- Language: Java
- Homepage:
- Size: 634 KB
- Stars: 2
- Watchers: 7
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Virtual Schema for Files
[![Build Status](https://github.com/exasol/virtual-schema-common-document-files/actions/workflows/ci-build.yml/badge.svg)](https://github.com/exasol/virtual-schema-common-document-files/actions/workflows/ci-build.yml)
[![Maven Central – Virtual Schema for document data in files](https://img.shields.io/maven-central/v/com.exasol/virtual-schema-common-document-files)](https://search.maven.org/artifact/com.exasol/virtual-schema-common-document-files)[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Avirtual-schema-common-document-files&metric=alert_status)](https://sonarcloud.io/dashboard?id=com.exasol%3Avirtual-schema-common-document-files)
[![Security Rating](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Avirtual-schema-common-document-files&metric=security_rating)](https://sonarcloud.io/dashboard?id=com.exasol%3Avirtual-schema-common-document-files)
[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Avirtual-schema-common-document-files&metric=reliability_rating)](https://sonarcloud.io/dashboard?id=com.exasol%3Avirtual-schema-common-document-files)
[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Avirtual-schema-common-document-files&metric=sqale_rating)](https://sonarcloud.io/dashboard?id=com.exasol%3Avirtual-schema-common-document-files)
[![Technical Debt](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Avirtual-schema-common-document-files&metric=sqale_index)](https://sonarcloud.io/dashboard?id=com.exasol%3Avirtual-schema-common-document-files)[![Code Smells](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Avirtual-schema-common-document-files&metric=code_smells)](https://sonarcloud.io/dashboard?id=com.exasol%3Avirtual-schema-common-document-files)
[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Avirtual-schema-common-document-files&metric=coverage)](https://sonarcloud.io/dashboard?id=com.exasol%3Avirtual-schema-common-document-files)
[![Duplicated Lines (%)](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Avirtual-schema-common-document-files&metric=duplicated_lines_density)](https://sonarcloud.io/dashboard?id=com.exasol%3Avirtual-schema-common-document-files)
[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Avirtual-schema-common-document-files&metric=ncloc)](https://sonarcloud.io/dashboard?id=com.exasol%3Avirtual-schema-common-document-files)Virtual Schema Common Document Files allows you to query data stored in a document file in the same way as if the data was stored in a regular Exasol database table.
This module is part of a larger project called [Virtual Schemas](https://github.com/exasol/virtual-schemas) covering document based dialects as well as JDBC based, see complete list of [dialects](https://github.com/exasol/virtual-schemas/blob/main/doc/user-guide/dialects.md).
Document-based virtual schemas are characterized by
* a *storage* that is basically a container hosting the document files and also defining the access control and type of account needed to access the files and
* a *document type* defining the format of the document containing the data.## Storage Variants
You cannot directly use this adapter. Please, use one of the dialects for specific storage variants below.
* [AWS S3](https://github.com/exasol/s3-document-files-virtual-schema/)
* [Exasol BucketFS](https://github.com/exasol/bucketfs-document-files-virtual-schema/)
* [Google Cloud Storage](https://github.com/exasol/google-cloud-storage-document-files-virtual-schema)
* [Azure Blob Storage](https://github.com/exasol/azure-blob-storage-document-files-virtual-schema)
* [Azure Data Lake Storage Gen2](https://github.com/exasol/azure-data-lake-storage-gen2-document-files-virtual-schema)If this list does not contain your file source you can [implement your own file source](doc/user_guide/dialect_development_guide.md).
## Document Types
Each storage variant can contain documents using any of the following supported document types:
* [JSON](https://www.json.org/json-en.html)
* [JSON-Lines](https://jsonlines.org/) (one json document per line)
* [Parquet](https://parquet.apache.org/)
* [CSV](https://en.wikipedia.org/wiki/Comma-separated_values)You can also [add support for other document types](doc/user_guide/document_type_plugin_development_guide.md).
## Integration Tests
VSDF builds and publishes a `test-jar` with common integration tests for document-oriented virtual schemas that can be used by any derived virtual schema. The derived virtual schema only needs to extend class `com.exasol.adapter.document.files.AbstractDocumentFilesAdapterIT` to inherit all common integration tests.
### Performance Regression Tests
`AbstractDocumentFilesAdapterIT` also contains performance regression tests tagged with `regression`.
#### Changes in Performance Regression Tests
The following changes to the performance regression tests might influence comparability of test results:
* Version 7.3.1
* CSV tests now use all six data types (string, boolean, integer, double, date and timestamp) instead of only string. The column count is unchanged.
* Test names in the test report changed. They now use suffix `()` instead of `(TestInfo)`.## Additional Information
* [User Guide](doc/user_guide/user_guide.md)
* [Changelog](doc/changes/changelog.md)
* [Dependencies](dependencies.md)