https://github.com/nahumsa/streaming-pipeline-clickhouse

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/nahumsa/streaming-pipeline-clickhouse
Owner: nahumsa
License: apache-2.0
Created: 2024-07-26T22:13:49.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-01T22:52:32.000Z (10 months ago)
Last Synced: 2025-02-16T10:13:35.426Z (4 months ago)
Language: Go
Size: 28.3 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Streaming data pipeline from producer to Clickhouse

The main idea of the project is to create a streaming data pipeline that receives an HTTP request and sends it to
[clickhouse](https://clickhouse.com/). The first step is to add directly to clickhouse and see if it supports the load,
the second step is to add Kafka as a queue to write data to clickhouse, in this way we can handle more load than
writing directly to the server.

The reason for creating the API is two-fold:

1. We have a clear interface between the data producer and our data pipeline, this leads to data quality checks when the data is ingested, thus leading to higher data quality.
2. When the API is created, the data producer will only care about the interface and will not need to care what happens inside the API and what is done to handle the data.

Both points leads to scalability and reliability of data received by data consumers.

## Environment Variables

To run this application, you need to set up the following environment variables:

| Variable Name | Description |
|-------------------|-------------------------------------------------|
| `CLICKHOUSEHOST` | The hostname or IP address of the Clickhouse server |
| `CLICKHOUSEDB` | The database name |
| `CLICKHOUSEUSERNAME` | Username for the clickhouse database |
| `CLICKHOUSEUSERPASS` | Password for the clickhouse database |

## Running the Application

To run the application, follow these steps:

1. **Clone the repository**:

```bash
git clone https://github.com/nahumsa/streaming-pipeline-clickhouse.git
cd streaming-pipeline-clickhouse
# setup the clickhouse database locally
docker compose up
```

2. **Set up environment variables**:
You can set the environment variables directly in your shell or create a `.env` file in the root directory of your project:

```bash
export CLICKHOUSEHOST="localhost:9000"
export CLICKHOUSEDB="default"
export CLICKHOUSEUSERNAME="default"
export CLICKHOUSEPASS=" "
```

Or create a `.env` file:

```dotenv
CLICKHOUSEHOST="localhost:9000"
CLICKHOUSEDB="default"
CLICKHOUSEUSERNAME="default"
CLICKHOUSEPASS=" "
```

and run:

```bash
export $(cat .env | xargs)
```

3. **Run the application**:

```bash
go run main.go
```

Now, your application should be running and ready to receive HTTP requests to process data and send it to Clickhouse.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nahumsa/streaming-pipeline-clickhouse

Awesome Lists containing this project

README

Streaming data pipeline from producer to Clickhouse