https://github.com/streamthoughts/kafka-connect-file-pulse
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
https://github.com/streamthoughts/kafka-connect-file-pulse
amazon-s3 avro azure-storage csv etl file-streaming google-cloud grok-filters kafka kafka-connect kafka-connector kafka-producer xml
Last synced: 5 months ago
JSON representation
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
- Host: GitHub
- URL: https://github.com/streamthoughts/kafka-connect-file-pulse
- Owner: streamthoughts
- License: apache-2.0
- Created: 2019-02-15T15:11:40.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2025-03-18T22:20:41.000Z (about 1 year ago)
- Last Synced: 2025-03-18T22:24:43.913Z (about 1 year ago)
- Topics: amazon-s3, avro, azure-storage, csv, etl, file-streaming, google-cloud, grok-filters, kafka, kafka-connect, kafka-connector, kafka-producer, xml
- Language: Java
- Homepage: https://streamthoughts.github.io/kafka-connect-file-pulse/
- Size: 16.5 MB
- Stars: 325
- Watchers: 6
- Forks: 69
- Open Issues: 34
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- awesome-kafka - kafka-connect-file-pulse - A polyvalent, scalable and reliable, Kafka Connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka. (Libraries / Kafka Connect)
- awesome-java - Kafka Connect File Pulse
- awesome-kafka-connect - streamthoughts/kafka-connect-file-pulse - Multipurpose connector for CSV, JSON, XML, Avro files from local, S3, GCS, Azure (File Systems / FilePulse)
- awesome-kafka - Streaming files from a local filesystem
README
# Kafka Connect File Pulse
[](https://github.com/streamthoughts/kafka-connect-file-pulse/blob/master/LICENSE)
[](https://img.shields.io/github/stars/streamthoughts/kafka-connect-file-pulse)
[](https://img.shields.io/github/forks/streamthoughts/kafka-connect-file-pulse)
[](https://img.shields.io/docker/pulls/streamthoughts/kafka-connect-file-pulse)
[](https://img.shields.io/github/issues/streamthoughts/kafka-connect-file-pulse)

[](https://sonarcloud.io/summary/new_code?id=streamthoughts_kafka-connect-file-pulse)
[](https://sonarcloud.io/summary/new_code?id=streamthoughts_kafka-connect-file-pulse)
[](https://sonarcloud.io/summary/new_code?id=streamthoughts_kafka-connect-file-pulse)
[](https://sonarcloud.io/summary/new_code?id=streamthoughts_kafka-connect-file-pulse)
**Connect FilePulse** is a multipurpose, scalable and reliable,
[Kafka Connector](http://kafka.apache.org/documentation.html#connect) that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka™.
It provides capabilities for reading files from: **local-filesystem**, **Amazon S3**, **Azure Storage** and **Google Cloud Storage**.
## Motivation
In organizations, data is frequently exported, shared and integrated from legacy systems through the use of
files in a wide variety of formats (e.g. CSV, XML, JSON, Avro, etc.). Dealing with all of these formats can
quickly become a real challenge for enterprise that usually end up with a complex and hard
to maintain data integration mess.
A modern approach consists in building a scalable data streaming platform as a central nervous
system to decouple applications from each other. **Apache Kafka™** is one of the most widely
used technologies to build such a system. The Apache Kafka project packs with Kafka Connect a distributed,
fault tolerant and scalable framework for connecting Kafka with external systems.
The **Connect File Pulse** project aims to provide an easy-to-use solution, based on Kafka Connect,
for streaming any type of data file with the Apache Kafka™ platform.
Some of the features of Connect File Pulse are inspired by the ingestion capabilities of **Elasticsearch** and **Logstash**.
## 🚀 Key Features Overview
Connect FilePulse provides a set of built-in features for streaming files from multiple filesystems into Kafka. This includes, among other things:
* Support for recursive scanning of local directories.
* Support for reading files from Amazon S3, Azure Storage and Google Cloud Storage.
* Support multiple input file formats (e.g: CSV, JSON, AVRO, XML).
* Support for Grok expressions.
* Parsing and transforming data using built-in or custom processing filters.
* Error handler definition
* Monitoring files while they are being written into Kafka
* Support pluggable strategies to clean up completed files
* Etc.
## 🙏 Show your support
You think this project can help you or your team to ingest data into Kafka ?
Please 🌟 this repository to support us!
## 🏁 How to get started ?
The best way to learn Kafka Connect File Pulse is to follow the step by step [Getting Started](https://streamthoughts.github.io/kafka-connect-file-pulse/docs/getting-started/).
If you want to read about using Connect File Pulse, the full documentation can be found [here](https://streamthoughts.github.io/kafka-connect-file-pulse/)
**File Pulse** is also available on [Docker Hub](https://hub.docker.com/r/streamthoughts/kafka-connect-file-pulse) 🐳
```bash
https://hub.docker.com/r/streamthoughts/kafka-connect-file-pulse:latest
```
## 💡 Contributions
Any feedback, bug reports and PRs are greatly appreciated! See our [guideline](./CONTRIBUTING.md).
* Source Code: [https://github.com/streamthoughts/kafka-connect-file-pulse](https://github.com/streamthoughts/kafka-connect-file-pulse)
* Issue Tracker: [https://github.com/streamthoughts/kafka-connect-file-pulse/issues](https://github.com/streamthoughts/kafka-connect-file-pulse/issues)
* Documentation: [https://streamthoughts.github.io/kafka-connect-file-pulse/](https://streamthoughts.github.io/kafka-connect-file-pulse/)
* Releases: [https://github.com/streamthoughts/kafka-connect-file-pulse/releases](https://github.com/streamthoughts/kafka-connect-file-pulse/releases)
## Licence
This code base is available under the Apache License, version 2.