https://github.com/conduitio/conduit-connector-file

Conduit connector for local files
https://github.com/conduitio/conduit-connector-file

conduit file go golang

Last synced: 23 days ago
JSON representation

Conduit connector for local files

Host: GitHub
URL: https://github.com/conduitio/conduit-connector-file
Owner: ConduitIO
License: apache-2.0
Created: 2022-02-22T12:18:32.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2025-05-06T15:15:29.000Z (24 days ago)
Last Synced: 2025-05-06T16:45:22.967Z (24 days ago)
Topics: conduit, file, go, golang
Language: Go
Homepage:
Size: 432 KB
Stars: 3
Watchers: 6
Forks: 3
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

# Conduit Connector File

The File plugin is one of [Conduit](https://github.com/ConduitIO/conduit) builtin plugins.
It provides both source and destination File connectors, allowing for a file to be either
a source, or a destination in a Conduit pipeline.

## How it works

The Source connector listens for changes appended to the source file and
sends records with the changes.
The Destination connector receives records and writes them to a file.

### Source

The Source connector only cares to have a valid path, even if the file
doesn't exist, it will still run and wait until a file with the configured
name is there, then it will start listening to changes and sending records.

### Destination

The Destination connector will create the file if it doesn't exist, and
records with changes will be appended to the destination file when received.

## Source Configuration Parameters

```yaml
version: 2.2
pipelines:
- id: example
status: running
connectors:
- id: example
plugin: "file"
settings:
# Path is the file path used by the connector to read/write records.
# Type: string
# Required: yes
path: ""
# Maximum delay before an incomplete batch is read from the source.
# Type: duration
# Required: no
sdk.batch.delay: "0"
# Maximum size of batch before it gets read from the source.
# Type: int
# Required: no
sdk.batch.size: "0"
# Specifies whether to use a schema context name. If set to false, no
# schema context name will be used, and schemas will be saved with the
# subject name specified in the connector (not safe because of name
# conflicts).
# Type: bool
# Required: no
sdk.schema.context.enabled: "true"
# Schema context name to be used. Used as a prefix for all schema
# subject names. If empty, defaults to the connector ID.
# Type: string
# Required: no
sdk.schema.context.name: ""
# Whether to extract and encode the record key with a schema.
# Type: bool
# Required: no
sdk.schema.extract.key.enabled: "false"
# The subject of the key schema. If the record metadata contains the
# field "opencdc.collection" it is prepended to the subject name and
# separated with a dot.
# Type: string
# Required: no
sdk.schema.extract.key.subject: "key"
# Whether to extract and encode the record payload with a schema.
# Type: bool
# Required: no
sdk.schema.extract.payload.enabled: "false"
# The subject of the payload schema. If the record metadata contains
# the field "opencdc.collection" it is prepended to the subject name
# and separated with a dot.
# Type: string
# Required: no
sdk.schema.extract.payload.subject: "payload"
# The type of the payload schema.
# Type: string
# Required: no
sdk.schema.extract.type: "avro"
```

## Destination Configuration Parameters

```yaml
version: 2.2
pipelines:
- id: example
status: running
connectors:
- id: example
plugin: "file"
settings:
# Path is the file path used by the connector to read/write records.
# Type: string
# Required: yes
path: ""
# Maximum delay before an incomplete batch is written to the
# destination.
# Type: duration
# Required: no
sdk.batch.delay: "0"
# Maximum size of batch before it gets written to the destination.
# Type: int
# Required: no
sdk.batch.size: "0"
# Allow bursts of at most X records (0 or less means that bursts are
# not limited). Only takes effect if a rate limit per second is set.
# Note that if `sdk.batch.size` is bigger than `sdk.rate.burst`, the
# effective batch size will be equal to `sdk.rate.burst`.
# Type: int
# Required: no
sdk.rate.burst: "0"
# Maximum number of records written per second (0 means no rate
# limit).
# Type: float
# Required: no
sdk.rate.perSecond: "0"
# The format of the output record. See the Conduit documentation for a
# full list of supported formats
# (https://conduit.io/docs/using/connectors/configuration-parameters/output-format).
# Type: string
# Required: no
sdk.record.format: "opencdc/json"
# Options to configure the chosen output record format. Options are
# normally key=value pairs separated with comma (e.g.
# opt1=val2,opt2=val2), except for the `template` record format, where
# options are a Go template.
# Type: string
# Required: no
sdk.record.format.options: ""
# Whether to extract and decode the record key with a schema.
# Type: bool
# Required: no
sdk.schema.extract.key.enabled: "true"
# Whether to extract and decode the record payload with a schema.
# Type: bool
# Required: no
sdk.schema.extract.payload.enabled: "true"
```

## How to build it

Run `make`.

## Testing

Run `make test` to run all the tests.

## Limitations

* The Source connector only detects appended changes to the file, so it
doesn't detect deletes or edits.
* The connectors can only access local files on the machine where Conduit
is running. So, running Conduit on a server means it can't access a file
on your local machine.
* Currently, only works reliably with text files (may work with non-text
files, but not guaranteed)

![scarf pixel](https://static.scarf.sh/a.png?x-pxid=42ff59b7-f26d-468d-8c8d-eafc530290cc)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/conduitio/conduit-connector-file

Awesome Lists containing this project

README