Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/streamthoughts/kafka-connect-file-pulse
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
https://github.com/streamthoughts/kafka-connect-file-pulse
amazon-s3 avro azure-storage csv etl file-streaming google-cloud grok-filters kafka kafka-connect kafka-connector kafka-producer xml
Last synced: about 2 months ago
JSON representation
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
- Host: GitHub
- URL: https://github.com/streamthoughts/kafka-connect-file-pulse
- Owner: streamthoughts
- License: apache-2.0
- Created: 2019-02-15T15:11:40.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-05-07T09:01:56.000Z (about 2 months ago)
- Last Synced: 2024-05-07T09:56:02.819Z (about 2 months ago)
- Topics: amazon-s3, avro, azure-storage, csv, etl, file-streaming, google-cloud, grok-filters, kafka, kafka-connect, kafka-connector, kafka-producer, xml
- Language: Java
- Homepage: https://streamthoughts.github.io/kafka-connect-file-pulse/
- Size: 16.1 MB
- Stars: 305
- Watchers: 8
- Forks: 60
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Lists
- awesome-kafka - kafka-connect-file-pulse - A polyvalent, scalable and reliable, Kafka Connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka. (Libraries / Kafka Connect)
- awesome-kafka - Streaming files from a local filesystem
- awesome-kafka - kafka-connect-file-pulse - A polyvalent, scalable and reliable, Kafka Connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka. (Libraries / Kafka Connect)
README
# Kafka Connect File Pulse
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/streamthoughts/kafka-connect-file-pulse/blob/master/LICENSE)
[![Stars](https://img.shields.io/github/stars/streamthoughts/kafka-connect-file-pulse)](https://img.shields.io/github/stars/streamthoughts/kafka-connect-file-pulse)
[![Forks](https://img.shields.io/github/forks/streamthoughts/kafka-connect-file-pulse)](https://img.shields.io/github/forks/streamthoughts/kafka-connect-file-pulse)
[![DockerPull](https://img.shields.io/docker/pulls/streamthoughts/kafka-connect-file-pulse)](https://img.shields.io/docker/pulls/streamthoughts/kafka-connect-file-pulse)
[![Issues](https://img.shields.io/github/issues/streamthoughts/kafka-connect-file-pulse)](https://img.shields.io/github/issues/streamthoughts/kafka-connect-file-pulse)
![Main Build](https://github.com/streamthoughts/kafka-connect-file-pulse/actions/workflows/main.yml/badge.svg)[![Reliability Rating](https://sonarcloud.io/api/project_badges/measure?project=streamthoughts_kafka-connect-file-pulse&metric=reliability_rating)](https://sonarcloud.io/summary/new_code?id=streamthoughts_kafka-connect-file-pulse)
[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=streamthoughts_kafka-connect-file-pulse&metric=sqale_rating)](https://sonarcloud.io/summary/new_code?id=streamthoughts_kafka-connect-file-pulse)
[![Vulnerabilities](https://sonarcloud.io/api/project_badges/measure?project=streamthoughts_kafka-connect-file-pulse&metric=vulnerabilities)](https://sonarcloud.io/summary/new_code?id=streamthoughts_kafka-connect-file-pulse)
[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=streamthoughts_kafka-connect-file-pulse&metric=coverage)](https://sonarcloud.io/summary/new_code?id=streamthoughts_kafka-connect-file-pulse)
**Connect FilePulse** is a multipurpose, scalable and reliable,
[Kafka Connector](http://kafka.apache.org/documentation.html#connect) that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka™.
It provides capabilities for reading files from: **local-filesystem**, **Amazon S3**, **Azure Storage** and **Google Cloud Storage**.## Motivation
In organizations, data is frequently exported, shared and integrated from legacy systems through the use of
files in a wide variety of formats (e.g. CSV, XML, JSON, Avro, etc.). Dealing with all of these formats can
quickly become a real challenge for enterprise that usually end up with a complex and hard
to maintain data integration mess.A modern approach consists in building a scalable data streaming platform as a central nervous
system to decouple applications from each other. **Apache Kafka™** is one of the most widely
used technologies to build such a system. The Apache Kafka project packs with Kafka Connect a distributed,
fault tolerant and scalable framework for connecting Kafka with external systems.The **Connect File Pulse** project aims to provide an easy-to-use solution, based on Kafka Connect,
for streaming any type of data file with the Apache Kafka™ platform.Some of the features of Connect File Pulse are inspired by the ingestion capabilities of **Elasticsearch** and **Logstash**.
## 🚀 Key Features Overview
Connect FilePulse provides a set of built-in features for streaming files from multiple filesystems into Kafka. This includes, among other things:
* Support for recursive scanning of local directories.
* Support for reading files from Amazon S3, Azure Storage and Google Cloud Storage.
* Support multiple input file formats (e.g: CSV, JSON, AVRO, XML).
* Support for Grok expressions.
* Parsing and transforming data using built-in or custom processing filters.
* Error handler definition
* Monitoring files while they are being written into Kafka
* Support pluggable strategies to clean up completed files
* Etc.## 🙏 Show your support
You think this project can help you or your team to ingest data into Kafka ?
Please 🌟 this repository to support us!## 🏁 How to get started ?
The best way to learn Kafka Connect File Pulse is to follow the step by step [Getting Started](https://streamthoughts.github.io/kafka-connect-file-pulse/docs/getting-started/).
If you want to read about using Connect File Pulse, the full documentation can be found [here](https://streamthoughts.github.io/kafka-connect-file-pulse/)
**File Pulse** is also available on [Docker Hub](https://hub.docker.com/r/streamthoughts/kafka-connect-file-pulse) 🐳
```bash
https://hub.docker.com/r/streamthoughts/kafka-connect-file-pulse:latest
```## 💡 Contributions
Any feedback, bug reports and PRs are greatly appreciated! See our [guideline](./CONTRIBUTING.md).
* Source Code: [https://github.com/streamthoughts/kafka-connect-file-pulse](https://github.com/streamthoughts/kafka-connect-file-pulse)
* Issue Tracker: [https://github.com/streamthoughts/kafka-connect-file-pulse/issues](https://github.com/streamthoughts/kafka-connect-file-pulse/issues)* Documentation: [https://streamthoughts.github.io/kafka-connect-file-pulse/](https://streamthoughts.github.io/kafka-connect-file-pulse/)
* Releases: [https://github.com/streamthoughts/kafka-connect-file-pulse/releases](https://github.com/streamthoughts/kafka-connect-file-pulse/releases)## Licence
This code base is available under the Apache License, version 2.