An open API service indexing awesome lists of open source software.

https://github.com/hogwai/spring-boot-postgresql-copy

Several ways to leverage the COPY command of PostgreSQL (WIP)
https://github.com/hogwai/spring-boot-postgresql-copy

copy-command java-25 pgbulkinsert postgresql springboot sql

Last synced: 27 days ago
JSON representation

Several ways to leverage the COPY command of PostgreSQL (WIP)

Awesome Lists containing this project

README

          

# Using the PostgreSQL COPY command with Spring Boot

This repository demonstrates how to leverage the PostgreSQL `COPY` command for high-performance bulk inserts in a Spring
Boot application.

Two approaches are compared against standard JPA `saveAll()`:

| Strategy | 10 000 rows | Speedup |
|----------------------------|-------------|----------|
| JPA `saveAll()` | ~25s | baseline |
| **CopyManager** (custom) | ~1.1s | **x22** |
| **pgbulkinsert** (library) | ~1.6s | **x15** |

## Getting started

### Prerequisites

- Java 25+
- Docker

### Run

```bash
docker compose up -d
./gradlew bootRun
```

### API

```
GET /customers # list all customers
POST /customers/copy?customerNumber=10000 # insert via CopyManager
POST /customers/bulk?customerNumber=10000 # insert via pgbulkinsert
```

Both POST endpoints return an `InsertResult` with the row count and elapsed time:

```json
{
"rowCount": 10000,
"elapsedSeconds": 1.153
}
```

### Tests

```bash
docker compose up -d
./gradlew test
```

Unit tests run without a database. Integration benchmarks (`InsertBenchmarkIT`) require the PostgreSQL container.

---

## Approach 1 — Custom CopyManager implementation

### Architecture

The custom implementation is built around four components:

**`CopyMapper`** — defines the mapping contract for any entity:

```java
public interface CopyMapper {
String[] toCopyRow(T entity);

String tableName();

List columns();
}
```

**`CopyOptions`** — configurable COPY options (format, delimiter, null string, batch size):
```java
CopyOptions options = CopyOptions.builder()
.format(CopyOptions.Format.CSV)
.delimiter(",")
.batchSize(5000)
.build();
```

**`CopyInputStream`** — streams rows on-the-fly without buffering the entire dataset in memory. Only the current row
is held in memory at any time.

**`CopyUtils`** — orchestrates the COPY operation:
```java

@Component
public class CopyUtils {

public void insertWithCopy(CopyMapper mapper, List entities) { ...}

public void insertWithCopy(CopyMapper mapper, List entities, CopyOptions options) { ...}
}
```

### Key design decisions

- **Spring transaction participation** — connections are obtained via `DataSourceUtils.getConnection()` so the COPY
operation participates in the current `@Transactional` context
- **Streaming** — `CopyInputStream` feeds rows to the `CopyManager` one by one, avoiding loading the entire payload into
memory
- **Batching** — when `batchSize > 0`, entities are split into chunks and each chunk is sent in a separate COPY call
- **SQL injection prevention** — table and column names are quoted with `"` identifiers
- **Escaping** — `escapeTextValue()` handles `\`, `\t`, `\n`, `\r`, `\b`, `\f` and null; `escapeCsvValue()` handles CSV
double-quote escaping

### Usage example

```java
private static final CopyMapper CUSTOMER_MAPPER = new CopyMapper<>() {
@Override
public String[] toCopyRow(Customer customer) {
return new String[]{
String.valueOf(customer.getId()),
CopyUtils.escapeTextValue(customer.getFirstName()),
CopyUtils.escapeTextValue(customer.getLastName()),
customer.getCreationDate().toString()
};
}

@Override
public String tableName() {
return "customer";
}

@Override
public List columns() {
return List.of("id", "first_name", "last_name", "creation_date");
}
};

// Simple call with default options (TEXT format, tab delimiter)
copyUtils.

insertWithCopy(CUSTOMER_MAPPER, customers);

// Or with custom options
CopyOptions options = CopyOptions.builder()
.format(CopyOptions.Format.CSV)
.delimiter(",")
.batchSize(5000)
.build();
copyUtils.

insertWithCopy(CUSTOMER_MAPPER, customers, options);
```

---

## Approach 2 — [PgBulkInsert](https://github.com/PgBulkInsert/PgBulkInsert) library

A higher-level alternative using the `pgbulkinsert` library. Define a mapping:

```java
public class CustomerMapping extends AbstractMapping {
public CustomerMapping() {
super("public", "customer");
mapLong("id", Customer::getId);
mapText("first_name", Customer::getFirstName);
mapText("last_name", Customer::getLastName);
mapText("country", Customer::getCountry);
mapText("address", Customer::getAddress);
mapText("city", Customer::getCity);
mapTimeStamp("creation_date", Customer::getCreationDate);
mapTimeStamp("update_date", Customer::getUpdateDate);
}
}
```

Then use it:

```java
PgBulkInsert bulkInsert = new PgBulkInsert<>(new CustomerMapping());

try(
Connection conn = dataSource.getConnection()){
bulkInsert.

saveAll(PostgreSqlUtils.getPGConnection(conn),customers);
}
```

---

## Project structure

```
src/main/java/.../
├── controller/
│ └── CustomerController.java # REST endpoints
├── model/
│ ├── Customer.java # JPA entity
│ └── CustomerMapping.java # pgbulkinsert mapping
├── repository/
│ ├── CustomerRepository.java # JPA + custom repos
│ ├── CustomerCopyRepository.java # CopyManager interface
│ ├── PgbulkinsertCopyRepository.java # pgbulkinsert interface
│ ├── impl/
│ │ ├── CustomerCopyRepositoryImpl.java
│ │ └── PgbulkinsertCopyRepositoryImpl.java
│ └── util/
│ ├── CopyUtils.java # COPY orchestration
│ ├── CopyMapper.java # Entity-to-row contract
│ ├── CopyOptions.java # COPY configuration
│ ├── CopyInputStream.java # Streaming InputStream
│ └── CopyInsertException.java # Dedicated exception
├── service/
│ ├── CustomerService.java
│ ├── InsertResult.java # Response DTO
│ └── impl/
│ └── CustomerServiceImpl.java
└── util/
├── CustomerFactory.java # Test data generator
└── StringUtil.java
```