Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/terascope/file-assets
Teraslice processors for working with data stored in files on disk, S3 or HDFS.
https://github.com/terascope/file-assets
Last synced: about 2 months ago
JSON representation
Teraslice processors for working with data stored in files on disk, S3 or HDFS.
- Host: GitHub
- URL: https://github.com/terascope/file-assets
- Owner: terascope
- License: mit
- Created: 2018-08-01T16:23:24.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-05-21T22:18:55.000Z (8 months ago)
- Last Synced: 2024-05-22T16:51:36.208Z (8 months ago)
- Language: TypeScript
- Homepage:
- Size: 5.78 MB
- Stars: 1
- Watchers: 7
- Forks: 2
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# file-assets
> A set of Teraslice processors for working with data stored in files on disk. The readers utilize the `chunked-file-reader` module (migrated into this bundle from the Teraslice monorepo) to break data into records.Since all the readers in this asset bundle use DataEntities, the slice's file path can be retrieved from each record by using something like `record.getMetadata('path')`. More information about DataEntities can be found [here](https://terascope.github.io/teraslice/docs/packages/utils/api/classes/dataentity).
This bundle includes the following processors:
- [`file_exporter`](./docs/file_exporter.md)
- [`file_reader`](./docs/file_reader.md)
- [`s3_exporter`](./docs/s3_exporter.md)
- [`s3_reader`](./docs/s3_reader.md)
- [`file_sender_api`](./docs/file_sender_api.md)
- [`file_reader_api`](./docs/file_reader_api.md)
- [`s3_sender_api`](./docs/s3_sender_api.md)
- [`s3_reader_api`](./docs/s3_reader_api.md)## Releases
You can find a list of releases, changes, and pre-built asset bundles [here](https://github.com/terascope/file-assets/releases).
## Getting Started
This asset bundle requires a running Teraslice cluster, you can find the documentation [here](https://github.com/terascope/teraslice/blob/master/README.md).
```bash
# Step 1: make sure you have teraslice-cli installed
yarn global add teraslice-cli# Step 2:
teraslice-cli assets deploy clusterAlias terascope/file-assets
```## Connectors
### S3 Connector**Configuration:**
The S3 connector configuration, in your Teraslice configuration file, includes the following parameters:
| Configuration | Description | Type | Notes |
| --------- | -------- | ------ | ------ |
| endpoint | Target S3 HTTP endpoint, must be URL | String | optional, defaults to `http://127.0.0.1:80` |
| accessKeyId | S3 access key ID | String | required |
| secretAccessKey | S3 secret access key | String | required |
| region | AWS Region where bucket is located | String | optional, defaults to `us-east-1` |
| maxRetries | Maximum retry attempts | Number | optional, defaults to `3` |
| sslEnabled | Flag to enable/disable SSL communication | Boolean | optional, defaults to `true` |
| caCertificate | A string containing a single or multiple ca certificates | String | optional, defaults to ' ' |
| certLocation | DEPRECATED - use caCertificate. Location of ssl cert | String | optional, defaults to ' ' |
| forcePathStyle | Whether to force path style URLs for S3 objects | Boolean | optional, defaults to `false` |
| bucketEndpoint | Whether to use the bucket name as the endpoint for this request | Boolean | optional, defaults to `false` |**Terafoundation S3 configuration example:**
```yaml
terafoundation:
connectors:
s3:
default:
endpoint: "http://localhost:9000"
accessKeyId: "yourId"
secretAccessKey: "yourPassword"
forcePathStyle: true
sslEnabled: true
caCertificate: |
-----BEGIN CERTIFICATE-----
MIICGTCCAZ+gAwIBAgIQCeCTZaz32ci5PhwLBCou8zAKBggqhkjOPQQDAzBOMQs
...
DXZDjC5Ty3zfDBeWUA==
-----END CERTIFICATE-----
```## Development
### Tests
Run the file-assets tests
**Requirements:**
- `minio` - A running instance of minio. See this [Quickstart Guide](https://hub.docker.com/r/minio/minio).
```bash
yarn test
```### Build
Build a compiled asset bundle to deploy to a teraslice cluster.
**Install Teraslice CLI**
```bash
yarn global add teraslice-cli
``````bash
teraslice-cli assets build
```## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
## License
[MIT](./LICENSE) licensed.