https://github.com/PgBulkInsert/PgBulkInsert
Java library for efficient Bulk Inserts to PostgreSQL using the Binary COPY Protocol.
https://github.com/PgBulkInsert/PgBulkInsert
bulk-inserts postgresql
Last synced: 7 months ago
JSON representation
Java library for efficient Bulk Inserts to PostgreSQL using the Binary COPY Protocol.
- Host: GitHub
- URL: https://github.com/PgBulkInsert/PgBulkInsert
- Owner: PgBulkInsert
- License: mit
- Created: 2016-02-02T20:05:59.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2025-05-25T23:40:19.000Z (10 months ago)
- Last Synced: 2025-08-17T10:42:12.681Z (7 months ago)
- Topics: bulk-inserts, postgresql
- Language: Java
- Homepage:
- Size: 2.08 MB
- Stars: 156
- Watchers: 6
- Forks: 35
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-java - PgBulkInsert
README
# PgBulkInsert #
[MIT License]: https://opensource.org/licenses/MIT
[COPY command]: http://www.postgresql.org/docs/current/static/sql-copy.html
[PgBulkInsert]: https://github.com/bytefish/PgBulkInsert
[Npgsql]: https://github.com/npgsql/npgsql

PgBulkInsert is a Java library for Bulk Inserts to PostgreSQL using the Binary COPY Protocol.
It provides a wrapper around the PostgreSQL [COPY command]:
> The [COPY command] is a PostgreSQL specific feature, which allows efficient bulk import or export of
> data to and from a table. This is a much faster way of getting data in and out of a table than using
> INSERT and SELECT.
This project wouldn't be possible without the great [Npgsql] library, which has a beautiful implementation of the Postgres protocol.
## Setup ##
[PgBulkInsert] is available in the Central Maven Repository.
You can add the following dependencies to your pom.xml to include [PgBulkInsert] in your project.
```xml
de.bytefish
pgbulkinsert
8.1.4
```
## Supported PostgreSQL Types ##
* [Numeric Types](http://www.postgresql.org/docs/current/static/datatype-numeric.html)
* smallint
* integer
* bigint
* real
* double precision
* numeric
* [Date/Time Types](http://www.postgresql.org/docs/current/static/datatype-datetime.html)
* timestamp
* timestamptz
* date
* time
* interval
* [Character Types](http://www.postgresql.org/docs/current/static/datatype-character.html)
* text
* [JSON Types](https://www.postgresql.org/docs/current/static/datatype-json.html)
* jsonb
* [Boolean Type](http://www.postgresql.org/docs/current/static/datatype-boolean.html)
* boolean
* [Binary Data Types](http://www.postgresql.org/docs/current/static/datatype-binary.html)
* bytea
* [Network Address Types](http://www.postgresql.org/docs/current/static/datatype-net-types.html)
* inet (IPv4, IPv6)
* macaddr
* [UUID Type](http://www.postgresql.org/docs/current/static/datatype-uuid.html)
* uuid
* [Array Type](https://www.postgresql.org/docs/current/static/arrays.html)
* One-Dimensional Arrays
* [Range Type](https://www.postgresql.org/docs/current/rangetypes.html)
* int4range
* int8range
* numrange
* tsrange
* tstzrange
* daterange
* [hstore](https://www.postgresql.org/docs/current/static/hstore.html)
* hstore
* [Geometric Types](https://www.postgresql.org/docs/current/static/datatype-geometric.html)
* point
* line
* lseg
* box
* path
* polygon
* circle
## Usage ##
You can use the [PgBulkInsert] API in various ways. The first one is to use the ``SimpleRowWriter`` when you don't have
an explicit Java POJO, that matches a Table. The second way is to use an ``AbstractMapping`` to define a
mapping between a Java POJO and a PostgreSQL table.
Please also read the FAQ, which may answer some of your questions.
## Using the SimpleRowWriter ##
Using the ``SimpleRowWriter`` doesn't require you to define a separate mapping. It requires you to define the PostgreSQL table structure using
a ``SimpleRowWriter.Table``, that has a schema name (optional), table name and column names:
```java
// Schema of the Table:
String schemaName = "sample";
// Name of the Table:
String tableName = "row_writer_test";
// Define the Columns to be inserted:
String[] columnNames = new String[] {
"value_int",
"value_text"
};
// Create the Table Definition:
SimpleRowWriter.Table table = new SimpleRowWriter.Table(schemaName, tableName, columnNames);
```
Once created you create the ``SimpleRowWriter`` by using the ``Table`` and a ``PGConnection``.
Now to write a row to PostgreSQL you call the ``startRow`` method. It expects you to pass a
``Consumer`` into it, which defines what data to write to the row. The call to
``startRow`` is synchronized, so it is safe to be called from multiple threads.
```java
// Create the Writer:
try(SimpleRowWriter writer = new SimpleRowWriter(table, pgConnection)) {
// ... write your data rows:
for(int rowIdx = 0; rowIdx < 10000; rowIdx++) {
// ... using startRow and work with the row, see how the order doesn't matter:
writer.startRow((row) -> {
row.setText("value_text", "Hi");
row.setInteger("value_int", 1);
});
}
}
```
So the complete example looks like this:
```java
public class SimpleRowWriterTest extends TransactionalTestBase {
// ...
@Test
public void rowBasedWriterTest() throws SQLException {
// Get the underlying PGConnection:
PGConnection pgConnection = PostgreSqlUtils.getPGConnection(connection);
// Schema of the Table:
String schemaName = "sample";
// Name of the Table:
String tableName = "row_writer_test";
// Define the Columns to be inserted:
String[] columnNames = new String[] {
"value_int",
"value_text"
};
// Create the Table Definition:
SimpleRowWriter.Table table = new SimpleRowWriter.Table(schemaName, tableName, columnNames);
// Create the Writer:
try(SimpleRowWriter writer = new SimpleRowWriter(table, pgConnection)) {
// ... write your data rows:
for(int rowIdx = 0; rowIdx < 10000; rowIdx++) {
// ... using startRow and work with the row, see how the order doesn't matter:
writer.startRow((row) -> {
row.setText("value_text", "Hi");
row.setInteger("value_int", 1);
});
}
}
// Now assert, that we have written 10000 entities:
Assert.assertEquals(10000, getRowCount());
}
}
```
If you need to customize the Null Character Handling, then you can use the ``setNullCharacterHandler(Function nullCharacterHandler)`` function.
## Using the AbstractMapping ##
The ``AbstractMapping`` is the second possible way to map a POJO for usage in PgBulkInsert. Imagine we want to bulk insert a large amount of people
into a PostgreSQL database. Each ``Person`` has a first name, a last name and a birthdate.
### Database Table ###
The table in the PostgreSQL database might look like this:
```sql
CREATE TABLE sample.unit_test
(
first_name text,
last_name text,
birth_date date
);
```
### Domain Model ###
The domain model in the application might look like this:
```java
private class Person {
private String firstName;
private String lastName;
private LocalDate birthDate;
public Person() {}
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
public LocalDate getBirthDate() {
return birthDate;
}
public void setBirthDate(LocalDate birthDate) {
this.birthDate = birthDate;
}
}
```
### Bulk Inserter ###
Then you have to implement the ``AbstractMapping``, which defines the mapping between the table and the domain model.
```java
public class PersonMapping extends AbstractMapping
{
public PersonMapping() {
super("sample", "unit_test");
mapText("first_name", Person::getFirstName);
mapText("last_name", Person::getLastName);
mapDate("birth_date", Person::getBirthDate);
}
}
```
This mapping is used to create the ``PgBulkInsert``:
```java
PgBulkInsert bulkInsert = new PgBulkInsert(new PersonMapping());
```
### Using the Bulk Inserter ###
[IntegrationTest.java]: https://github.com/bytefish/PgBulkInsert/blob/master/PgBulkInsert/pgbulkinsert-core/src/test/java/de/bytefish/pgbulkinsert/integration/IntegrationTest.java
And finally we can write a Unit Test to insert ``100000`` people into the database. You can find the entire Unit Test on GitHub as [IntegrationTest.java].
```java
@Test
public void bulkInsertPersonDataTest() throws SQLException {
// Create a large list of People:
List personList = getPersonList(100000);
// Create the BulkInserter:
PgBulkInsert bulkInsert = new PgBulkInsert(new PersonMapping(schema));
// Now save all entities of a given stream:
bulkInsert.saveAll(PostgreSqlUtils.getPGConnection(connection), personList.stream());
// And assert all have been written to the database:
Assert.assertEquals(100000, getRowCount());
}
private List getPersonList(int num) {
List personList = new ArrayList<>();
for (int pos = 0; pos < num; pos++) {
Person p = new Person();
p.setFirstName("Philipp");
p.setLastName("Wagner");
p.setBirthDate(LocalDate.of(1986, 5, 12));
personList.add(p);
}
return personList;
}
```
## FAQ ##
### How can I write Primitive Types (``boolean``, ``float``, ``double``)? ###
By default methods like ``mapBoolean`` map the boxed type ``Boolean``, ``Integer``, ``Long``. This might be problematic
if you need to squeeze out the last seconds when doing bulk inserts, see Issue:
* [https://github.com/PgBulkInsert/PgBulkInsert/issues/93](https://github.com/PgBulkInsert/PgBulkInsert/issues/93)
So for every data type that also has a primitive type, you can add a "Primitive" suffix to the method name like:
* ```mapBooleanPrimitive``
This will use the primitive type and prevent boxing and unboxing of values.
### How can I write a ``java.sql.Timestamp``? ###
You probably have Java classes with a ``java.sql.Timestamp`` in your application. Now if you use the ``AbstractMapping`` or a ``SimpleRowWriter`` it expects a ``LocalDateTime``. Here is how to map a ``java.sql.Timestamp``.
Imagine you have an ``EMail`` class with a property ``emailCreateTime``, that is using a ``java.sql.Timestamp`` to
represent the time. The column name in Postgres is ``email_create_time`` and you are using a ``timestamp`` data type.
To map the ``java.sql.Timestamp`` you would write the ``mapTimeStamp`` method like this:
```java
mapTimeStamp("email_create_time", x -> x.getEmailCreateTime() != null ? x.getEmailCreateTime().toLocalDateTime() : null);
```
And here is the complete example:
```java
public class EMail {
private Timestamp emailCreateTime;
public Timestamp getEmailCreateTime() {
return emailCreateTime;
}
}
public static class EMailMapping extends AbstractMapping
{
public EMailMapping(String schema) {
super(schema, "unit_test");
mapTimeStamp("email_create_time", x -> x.getEmailCreateTime() != null ? x.getEmailCreateTime().toLocalDateTime() : null);
}
}
```
### Handling Null Characters or... 'invalid byte sequence for encoding "UTF8": 0x00' ###
If you see the error message ``invalid byte sequence for encoding "UTF8": 0x00`` your data contains Null Characters. Although ``0x00`` is totally valid UTF-8... PostgreSQL does not support writing it, because it uses C-style string termination internally.
PgBulkInsert allows you to enable a Null Value handling, that removes all ``0x00`` occurences and replaces them with an empty string:
```java
// Create the Table Definition:
SimpleRowWriter.Table table = new SimpleRowWriter.Table(schema, tableName, columnNames);
// Create the Writer:
SimpleRowWriter writer = new SimpleRowWriter(table);
// Enable the Null Character Handler:
writer.enableNullCharacterHandler();
```
## Running the Tests ##
Running the Tests requires a PostgreSQL database.
You have to configure the test database connection in the module ``pgbulkinsert-core`` and file ``db.properties``:
```ini
db.url=jdbc:postgresql://127.0.0.1:5432/sampledb
db.user=philipp
db.password=test_pwd
db.schema=public
```
The tests are transactional, that means any test data will be rolled back once a test finishes. But it probably makes
sense to set up a separate ``db.schema`` for your tests, if you want to avoid polluting the ``public`` schema or have
different permissions.
## License ##
PgBulkInsert is released with under terms of the [MIT License]:
* [https://github.com/bytefish/PgBulkInsert](https://github.com/bytefish/PgBulkInsert)
## Resources ##
* [Npgsql](https://github.com/npgsql/npgsql)
* [Postgres on the wire - A look at the PostgreSQL wire protocol (PGCon 2014)](https://www.pgcon.org/2014/schedule/attachments/330_postgres-for-the-wire.pdf)