https://github.com/doziestar/datavinci
DataVinci enables you to visualize data from various sources, generate insights, analyze data with AI models, and receive real-time updates on anomalies
https://github.com/doziestar/datavinci
data golang logs pipeline
Last synced: 5 months ago
JSON representation
DataVinci enables you to visualize data from various sources, generate insights, analyze data with AI models, and receive real-time updates on anomalies
- Host: GitHub
- URL: https://github.com/doziestar/datavinci
- Owner: doziestar
- License: mit
- Created: 2024-06-30T09:52:25.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-07T14:48:40.000Z (almost 2 years ago)
- Last Synced: 2024-07-07T16:06:49.086Z (almost 2 years ago)
- Topics: data, golang, logs, pipeline
- Language: Go
- Homepage: https://datavinci.so
- Size: 11 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# DataVinci

DataVinci is a comprehensive data management and visualization tool designed for the developer community. It enables users to visualize data from various sources, generate insights, analyze data with AI models, and receive real-time updates on anomalies.
[](https://codecov.io/github/doziestar/datavinci)
## Table of Contents
- [Features](#features)
- [Architecture](#architecture)
- [Getting Started](#getting-started)
- [Development](#development)
- [Deployment](#deployment)
- [Contributing](#contributing)
- [License](#license)
## Features
- Multi-source data integration (PostgreSQL, MongoDB, Cassandra, Elasticsearch, various logs)
- Interactive data visualization with customizable dashboards
- AI-powered data analysis and anomaly detection
- Real-time data processing and alerts
- Cloud resource management and visualization (e.g., Amazon S3)
- Report generation and scheduling
- Collaboration features with version control
## Architecture
DataVinci follows a microservices architecture for scalability and maintainability. Here's a high-level overview of the system:
```mermaid
graph TB
A[Web UI] --> B[API Gateway]
B --> C[Authentication Service]
B --> D[Data Source Service]
B --> E[Visualization Service]
B --> F[Report Service]
B --> G[AI Analysis Service]
B --> H[Real-time Processing Service]
D --> I[Data Connectors]
I --> J[(Various Data Sources)]
E & F & G & H --> K[Data Processing Engine]
K --> L[(Data Lake/Warehouse)]
M[Background Jobs] --> K
```
## Getting Started
### Prerequisites
- Go 1.16+
- Node.js 14+
- Docker and Docker Compose
- Kubernetes cluster (for production deployment)
### Installation
1. Clone the repository:
```bash
git clone https://github.com/doziestar/datavinci.git
cd datavinci
```
2. Install the required dependencies:
```bash
go mod download
cd web && yarn install && cd ..
```
3. Set up the environment variables:
```bash
cp .env.example .env
# Edit .env with your configuration
```
4. Start the development server:
```bash
docker-compose up -d
go run cmd/datavinci/main.go
cd web/src-tauri && cargo tauri dev
```
5. Access the web UI at `http://localhost:3000`.
## Development
DataVinci uses a monorepo structure with Go workspaces (go.work) for backend services and Next.js with Tauri for the frontend.
### Folder Structure
```bash
datavinci/
├── cmd/
│ └── datavinci/
│ └── main.go
├── internal/
│ ├── auth/
│ ├── datasource/
│ ├── visualization/
│ ├── report/
│ ├── ai/
│ └── realtime/
├── pkg/
│ ├── common/
│ └── models/
├── web/
│ ├── components/
│ ├── pages/
│ └── public/
├── deployments/
│ ├── docker/
│ └── k8s/
├── scripts/
├── tests/
├── go.work
├── go.mod
├── go.sum
├── package.json
├── docker-compose.yml
├── Dockerfile
└── README.md
```
### Service Communication
The backend services communicate with each other using gRPC. The API Gateway acts as a reverse proxy for the frontend and forwards requests to the appropriate service.
```mermaid
graph TB
Client[Client] --> APIGateway[API Gateway]
subgraph "Service Mesh"
APIGateway --> Auth[Authentication Service]
APIGateway --> DataSource[Data Source Service]
APIGateway --> Visualization[Visualization Service]
APIGateway --> Report[Report Service]
APIGateway --> AI[AI Analysis Service]
APIGateway --> RealTime[Real-time Processing Service]
end
Auth -.->|gRPC| DataSource
DataSource -.->|gRPC| Visualization
Visualization -.->|gRPC| Report
DataSource -.->|gRPC| AI
DataSource -.->|gRPC| RealTime
MessageBroker[Message Broker] --> DataSource
MessageBroker --> Visualization
MessageBroker --> Report
MessageBroker --> AI
MessageBroker --> RealTime
EventStore[(Event Store)] --> MessageBroker
DataSource --> DB[(Data Sources)]
RealTime --> DB
```
### Testing
Run the tests with:
```bash
go test ./...
cd web && yarn test && cd ..
// or
go test -v -race -coverprofile=pkg/coverage.txt -covermode=atomic ./internal/auth/...
```
To ensure that the code meets our standards, run the pre-commit hooks:
```bash
pre-commit run --all-files
```
### Linting
Lint the Go code with:
```bash
golangci-lint run
```
Lint the JavaScript code with:
```bash
cd web && yarn lint && cd ..
```
## Deployment
DataVinci can be deployed on any cloud provider or on-premises infrastructure. For production deployments, we recommend using Kubernetes with Helm charts.
### Docker
Build the Docker image with:
```bash
docker build -t datavinci:latest .
```
### Kubernetes
Deploy the application on a Kubernetes cluster with:
```bash
kubectl apply -f deployments/k8s
```
### Helm
Install the Helm chart with:
```bash
helm install datavinci deployments/helm
```
## Contributing
Contributions are welcome! Please read the [contributing guidelines](CONTRIBUTING.md) before submitting a pull request.
## Pre-commit Hooks
We use pre-commit hooks to ensure code quality and consistency. These hooks run automatically before each commit, checking your changes against our coding standards and running various linters.
### Setup
1. Install pre-commit:
```bash
pip install pre-commit
```
2. Install the git hook scripts:
```bash
pre-commit install
```
### Running pre-commit
The hooks will run automatically on `git commit`. If you want to run the hooks manually (for example, to test them or run them on all files), you can use:
```bash
pre-commit run --all-files
```
### Our pre-commit hooks
We use the following hooks:
- **For Go:**
- `go-fmt`: Formats Go code
- `go-vet`: Reports suspicious constructs
- `go-imports`: Updates import lines
- `go-cyclo`: Checks function complexity
- `golangci-lint`: Runs multiple Go linters
- `go-critic`: Provides extensive code analysis
- `go-unit-tests`: Runs Go unit tests
- `go-build`: Checks if the code builds
- `go-mod-tidy`: Runs `go mod tidy`
- ** ensure that you have the following tools installed:**
- `golangci-lint`
- `go-critic`
- `go-cyclo`
- `go-unit-tests`
- `go-build`
- `go-mod-tidy`
```bash
go install github.com/fzipp/gocyclo/cmd/gocyclo@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
go install github.com/go-critic/go-critic/cmd/gocritic@latest
go install github.com/hexdigest/gounit/cmd/gounit@latest
go install github.com/securego/gosec/v2/cmd/gosec@latest
```
- **For TypeScript/JavaScript:**
- `prettier`: Formats code
- `eslint`: Lints JavaScript and TypeScript code
- **General:**
- `trailing-whitespace`: Trims trailing whitespace
- `end-of-file-fixer`: Ensures files end with a newline
- `check-yaml`: Checks yaml files for parseable syntax
- `check-added-large-files`: Prevents giant files from being committed
### Skipping hooks
If you need to bypass the pre-commit hooks (not recommended), you can use:
```bash
git commit -m "Your commit message" --no-verify
```
However, please use this sparingly and ensure your code still meets our standards.
### Updating hooks
To update the pre-commit hooks to the latest versions, run:
```bash
pre-commit autoupdate
```
Then commit the changes to `.pre-commit-config.yaml`.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.