https://github.com/acrlabs/prom2parquet
Remote write target for Prometheus that saves metrics to parquet files
https://github.com/acrlabs/prom2parquet
Last synced: 5 months ago
JSON representation
Remote write target for Prometheus that saves metrics to parquet files
- Host: GitHub
- URL: https://github.com/acrlabs/prom2parquet
- Owner: acrlabs
- License: mit
- Created: 2024-02-16T19:15:27.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2025-11-20T02:15:40.000Z (7 months ago)
- Last Synced: 2026-01-15T15:38:12.446Z (5 months ago)
- Language: Go
- Size: 737 KB
- Stars: 9
- Watchers: 0
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README

# prom2parquet
Remote write target for Prometheus that saves metrics to parquet files
**⚠️ This should be considered an alpha project. ⚠️**
In particular, the schema for the saved Parquet files is likely to change in the future.
## Overview
The prom2parquet remote write endpoint for Prometheus listens for incoming datapoints from Prometheus and saves them to
[parquet files](https://parquet.apache.org) in a user-configurable location. This can either (currently) be pod-local
storage or to an AWS S3 bucket. Metrics are saved in the following directory structure:
```
/data///2024022021.parquet
```
Each file for a particular metric will have the same schema, but different metrics may have different schemas. At a
minimum, each file has a `timestamp` and a `value` column, and a variety of other extracted columns corresponding to the
labels on the Prometheus timeseries. They also have a "catch-all" `labels` column to contain other unextracted columns.
## Usage
```
Usage:
prom2parquet [flags]
Flags:
--backend backend supported remote backends for saving parquet files
(valid options: none, s3/aws) (default local)
--backend-root string root path/location for the specified backend (e.g. bucket name for AWS S3)
(default "/data")
-h, --help help for prom2parquet
--prefix string directory prefix for saving parquet files
-p, --server-port int port for the remote write endpoint to listen on (default 1234)
-v, --verbosity verbosity log level (valid options: debug, error, fatal, info, panic, trace, warning/warn)
(default info)
```
Here is a brief overview of the options:
### backend
Where to store the Parquet files;; currently supports pod-local storage and AWS S3.
### backend-root
"Root" location for the backend storage. For pod-local storage this is the base directory, for AWS S3 this is the
bucket name.
### prefix
This option provides a prefix that can be used to differentiate between metrics collections.
### server-port
What port prom2parquet should listen on for timeseries data from Prometheus.
## Configuring Prometheus
Prometheus needs to know where to send timeseries data. You can include this block in your Prometheus's `config.yml`:
```yaml
remote_write:
- url: http://prom2parquet-svc.monitoring:1234/receive
remote_timeout: 30s
```
Alternately, if you're using the [Prometheus operator](https://prometheus-operator.dev), you can add this configuration
to your Prometheus custom resource:
```yaml
spec:
remoteWrite:
- url: http://prom2parquet-svc.monitoring:1234/receive
```
## Contributing
We welcome any and all contributions to prom2parquet project! Please open a pull request.
### Development
To set up your development environment, run `git submodule init && git submodule update` and `make setup`. To build
`prom2parquet`, run `make build`.
This project uses [🔥Config](https://github.com/acrlabs/fireconfig) to generate Kubernetes manifests from definitions
located in `./k8s/`. If you want to use this mechanism for deploying prom2parquet, you can just type `make` to build
the executable, create and push the Docker images, and deploy to the configured Kubernetes cluster.
All build artifacts are placed in the `.build/` subdirectory. You can remove this directory or run `make clean` to
clean up.
### Testing
Run `make test` to run all the unit/integration tests. If you want to test using pod-local storage, and you want to
flush the Parquet files to disk without terminating the pod (e.g., so you can copy them elsewhere), you can send the
process a SIGUSR1:
```
> kubectl exec prom2parquet-pod -- kill -s SIGUSR1
> kubectl cp prom2parquet-pod:/path/to/files ./
```
### Code of Conduct
Applied Computing Research Labs has a strict code of conduct we expect all contributors to adhere to. Please read the
[full text](https://github.com/acrlabs/simkube/blob/master/CODE_OF_CONDUCT.md) so that you understand the expectations
upon you as a contributor.
### Copyright and Licensing
SimKube is licensed under the [MIT License](https://github.com/acrlabs/simkube/blob/master/LICENSE). Contributors to
this project agree that they own the copyrights to all contributed material, and agree to license your contributions
under the same terms. This is "inbound=outbound", and is the [GitHub
default](https://docs.github.com/en/site-policy/github-terms/github-terms-of-service#6-contributions-under-repository-license).
> [!WARNING]
> Due to the uncertain nature of copyright and IP law, this repository does not accept contributions that have been all
> or partially generated with GitHub Copilot or other LLM-based code generation tools. Please disable any such tools
> before authoring changes to this project.