{"id":22707084,"url":"https://github.com/fdifrison/java-persistence","last_synced_at":"2025-03-29T20:44:43.932Z","repository":{"id":266666878,"uuid":"898982595","full_name":"fdifrison/java-persistence","owner":"fdifrison","description":null,"archived":false,"fork":false,"pushed_at":"2025-03-26T14:22:38.000Z","size":1588,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-26T15:37:04.872Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fdifrison.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-05T12:02:17.000Z","updated_at":"2025-03-26T14:22:42.000Z","dependencies_parsed_at":"2024-12-27T17:30:28.272Z","dependency_job_id":"38ae754a-1734-4485-83e9-147066652a9c","html_url":"https://github.com/fdifrison/java-persistence","commit_stats":null,"previous_names":["fdifrison/java-persistence"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdifrison%2Fjava-persistence","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdifrison%2Fjava-persistence/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdifrison%2Fjava-persistence/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdifrison%2Fjava-persistence/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fdifrison","download_url":"https://codeload.github.com/fdifrison/java-persistence/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246243564,"owners_count":20746307,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-10T10:11:25.559Z","updated_at":"2025-03-29T20:44:43.924Z","avatar_url":"https://github.com/fdifrison.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1\u003eJava Persistence\u003c/h1\u003e\n\nBest practices in java/spring-jpa persistence\nfollowing Vlad Mihalcea courses https://vladmihalcea.com/courses/ and blog posts https://vladmihalcea.com/blog/\n\n\u003ch2\u003eOrganization\u003c/h2\u003e\n\n- _**entity-relationship**_ contains tests on the best and worst practices to code entities mapping.\n- _**inheritance**_ how and when to use polymorphism in the persistence layer (reads single-table-inheritance)\n\nN.B.\nSeparate dbs and changelogs are used to have a representative environment in each situation. N.B. be sure to activate\nthe correct spring profile (it has to match the one requested in the context of each class).\n\n---\n\n# Theory\n\n\u003c!-- TOC --\u003e\n\n* [Theory](#theory)\n* [ACID](#acid)\n    * [Atomicity](#atomicity)\n    * [Consistency](#consistency)\n        * [CAP theorem](#cap-theorem)\n    * [Durability](#durability)\n    * [Isolation (added il SQL 92)](#isolation-added-il-sql-92)\n        * [Concurrency control](#concurrency-control)\n        * [MVCC](#mvcc)\n            * [Amdahl's law](#amdahls-law)\n    * [Phenomena](#phenomena)\n* [Connections](#connections)\n* [Persistence Context in JPA and Hibernate](#persistence-context-in-jpa-and-hibernate)\n    * [Caching](#caching)\n    * [Entity state transitions](#entity-state-transitions)\n        * [JPA EntityManager](#jpa-entitymanager)\n        * [Hibernate Session](#hibernate-session)\n            * [JPA merge vs Hibernate update](#jpa-merge-vs-hibernate-update)\n    * [Dirty checking](#dirty-checking)\n        * [Bytecode enhancement](#bytecode-enhancement)\n    * [Hydration -\u003e read-by-name (Hibernate \u003c 6.0)](#hydration---read-by-name-hibernate--60)\n    * [Flushing](#flushing)\n        * [AUTO flushing mode](#auto-flushing-mode)\n        * [Flushing in batch processing](#flushing-in-batch-processing)\n    * [Events and event listener](#events-and-event-listener)\n* [SQL Statements: lifecycle, execution plan and caching](#sql-statements-lifecycle-execution-plan-and-caching)\n    * [Execution plan cache](#execution-plan-cache)\n    * [Prepared statement](#prepared-statement)\n    * [Client-Side vs. Server-Side statement caching:](#client-side-vs-server-side-statement-caching)\n* [Fetching](#fetching)\n    * [Fetching associations](#fetching-associations)\n        * [N+1 query problem](#n1-query-problem)\n        * [Fetching multiple collections](#fetching-multiple-collections)\n    * [Open Session in View](#open-session-in-view)\n* [Projections](#projections)\n    * [JPA projections](#jpa-projections)\n        * [Tuple](#tuple)\n        * [DTO](#dto)\n            * [mapping native SQL queries](#mapping-native-sql-queries)\n    * [Hibernate projections](#hibernate-projections)\n    * [Bets approach for projecting parent-child relationship](#bets-approach-for-projecting-parent-child-relationship)\n* [Batching in Hibernate](#batching-in-hibernate)\n    * [Bulking operations](#bulking-operations)\n    * [Batching in cascade](#batching-in-cascade)\n        * [DELETE cascade](#delete-cascade)\n        * [Batching on versioned entity](#batching-on-versioned-entity)\n    * [Default UPDATE behavior](#default-update-behavior)\n* [Primary Keys and JPA identifiers](#primary-keys-and-jpa-identifiers)\n    * [JPA identifiers](#jpa-identifiers)\n* [Entity Relationship](#entity-relationship)\n    * [`@ManyToOne`](#manytoone)\n        * [bidirectional](#bidirectional)\n    * [Unidirectional `@OneToMany`](#unidirectional-onetomany)\n        * [join table](#join-table)\n            * [List vs Set Collections](#list-vs-set-collections)\n        * [`@JoinColumn`](#joincolumn)\n    * [`@OneToOne`](#onetoone)\n        * [unidirectional](#unidirectional)\n        * [bidirectional](#bidirectional-1)\n    * [`@ManyToMany`](#manytomany)\n        * [Explicit mapping](#explicit-mapping)\n* [JPA inheritance](#jpa-inheritance)\n    * [Single table inheritance](#single-table-inheritance)\n        * [`@DiscriminatorColumn` and `@DiscriminatorValue`](#discriminatorcolumn-and-discriminatorvalue)\n    * [Joined inheritance](#joined-inheritance)\n    * [Table per class](#table-per-class)\n    * [`@MappedSuperclass`](#mappedsuperclass)\n* [EnumType](#enumtype)\n* [Spring Data, JPA, and Hibernate Annotations Reference](#spring-data-jpa-and-hibernate-annotations-reference)\n    * [Entity Annotations](#entity-annotations)\n    * [Relationship Annotations](#relationship-annotations)\n    * [Inheritance Annotations](#inheritance-annotations)\n    * [Query Annotations](#query-annotations)\n    * [Spring Data Repository Annotations](#spring-data-repository-annotations)\n    * [Transaction Annotations](#transaction-annotations)\n    * [Auditing Annotations](#auditing-annotations)\n    * [Hibernate-Specific Annotations](#hibernate-specific-annotations)\n    * [Validation Annotations](#validation-annotations)\n\n\u003c!-- TOC --\u003e\n\n---\n\n# ACID\n\nTo better understand the ACID acronym, we should first dive a bit in how a relational database works.\n\n![](./images/acid/db.png)\n\nAt the very bottom we have the physical disk where the data are stored in tables/indexes; in fact, the unit of a\ndatabase memory is not bytes, but pages (usually 8 or 16 kilobytes). Operating on the physical disk is very slow\ncompared to RAM; for these reasons all the databases (also the NoSql) have an intermediate layer called\n`In-memory buffers` which synchronize a subset of the data on the disk; these are then modified and only at flush time\nthe new state is copied on disk (in the same way Hibernate. when loading an entity, it stores it in the first-level\ncache).\n\nDue to I/O bandwidth, it is not possible to synchronize each transaction back to the disk because it would mean to move\na potential huge amount of data very often (imagine an application that handles hundreds of transactions per second).\nFor example, PostgreSQL, which by default allocates 25% of the RAM to the In-memory buffer, flushes the entire cache\nonly once every 5 minutes. So what happens if we have a crash? Would we lose all the non-synchronized data in the cache?\n\nOf course this is not possible; therefore, we have another layer of insurance, represented by the `Redo log`. Every time\nwe are commiting a writing, the database appends to the Redo log our scheduled changes. In this way, if the changes have\nnot been synchronized in memory, or not flushed on the disk, when the database restarts will check the redo log and try\nto replay the operations left behind.\n\nSimilarly, the `Undo log` is required to be able to roll back to the previous consistent state.\n\nAn exception is PostgreSQL that doesn't use an append-only undo logs; instead it uses a `multi-version approach` which\nconsists in keeping multiple versions of the same record in memory. This behaviour allows faster rollbacks since no diff\nhas to be computed and no log search is performed, we simply switch from one version of the object to another. The\ndownside is that the previous version memory space is limited and has to be reclaimed regularly with the `VACUUM`\noperation, otherwise we may incur in a very disruptive behaviour. Postgres associate with a transaction a 32-bit XID (\nwith the constraint that newer transactions must have a greater XID); in a high performance application, we very short\nand frequent transactions, we might fill the four-billion transaction limit (given by the 32-bit size of the ID). If the\nVACUUM process is disabled, the XID counter will start from zero, making newer transactions look like older ones, hence\ndestroying the database).\n\n## Atomicity\n\nAtomicity is the property of grouping a set of operations and execute them in a `unit of work`, meaning that or all the\noperations succeed or, even if one fails, the whole unit of work fails (rollback). The database has to pass from a\nconsistency state to another at the end of the atomic operation. Rolling back is the action required to return to the\nprevious consistent state in the case of a failure in one of the operations in the transaction.\n\n## Consistency\n\nConsistency is the property ensuring that a transaction state change leaves the database in a proper state, without\nviolating the constraint described by the schemas (column type, nullability, pk and fk constraints etc.).\nAgain, if only one validation fails, all the transaction is rolled back and the database state is restored to prior the\ntransaction\n\n### CAP theorem\n\nThe CAP theorem states that when a distributed system encounters a network partition, it needs to choose between\nConsistency and Availability, can't have both. However, in this context, Consistency has a different meaning that in\nACID since it refers to an isolation guarantee called `linearizability`, i.e., the ability to always read the latest\nstate of a variable (something that in a distributed system, where we have follower nodes from which we can also read\nand a replication lag, it is not guaranteed)\n\n## Durability\n\nDurability ensures that all committed transaction changes become permanent, something that it's ensured by the\n`redo log`.\n\nIn Postgres there is something equivalent called `WAL` Write-Ahead Log which can be flushed asynchronously.\nWhile the log entries are buffered in memory and flushed every transaction commits, the cashed pages and indexes don't\nsince their state can be restored from the WAL, thus optimising I/O utilization.\n\n## Isolation (added il SQL 92)\n\nDatabases are not meant to be accessed by only one user at a time.\nThey instead need to sustain multiple concurrent connections; there is a need to ensure `Serializability`, meaning that\neven in a concurrent environment we need an outcome equivalent to a serial execution. Therefore, rules are required to\norchestrate the concurrent reads and writes so that conflicts don't occur, compromising data integrity. Note that\nserializability itself doesn't concern about time, instead we can think it as the property that ensures that the **reads\nand the writes of a user `A` are not interleaved by reads or writes of user `B`, but the operation of `A` and `B` can\nstill be interchanged in time.**\nLinearizability instead is about time; it concerns the ability to read the latest state of a variable.\nThe conjunction of serializability and linearizability is the golden standard of isolation level:\n`Strict Serializability` which, however, almost always comes at an unbearable cost for a real production environment.\n\nAs a fact, we always have to come down to a compromise, which ensures a satisfactory level of isolation while\nstill enabling a sufficient concurrency (we are most of the time in a `read commited` isolation level) or, in\nalternative, use a persistence provider like `VoltDB` which works only in-memory and single threaded, thus guarantying\nSerializability.\n\n![](./images/acid/isolation.png)\n\n### Concurrency control\n\nThe strategies applied to RDBMS to avoid data conflicts fall under the name of `concurrency control`. We can choose to\navoid the conflicts completely (e.g. two-phase locking) controlling access to shared resources, or to somehow detecting\nthe conflicts (e.g. MVCC multi-version concurrency control), with a gain in performance (better concurrency) but\nrelaxing the serializability constraint and accepting possible data anomalies.\n\nA certain number of locks will be required whichever path we choose, in two-phase locking we simply have more. Also,\nlocking can happen at different database hierarchy levels, from rows to pages; depending on the use case the database\ncould choose to substitute multiple low levels lock on rows to a single upper level lock on the page (since locks are\ntakes resources). Two-phase locking requires a lot of waiting time since, to achieve strict serializability, read lock\nblocks write operations and write lock block read operations.\nEven if each database system has its own lock hierarchy, the most common are:\n\n* `shared lock` (read lock) which allows for concurrent reads but prevent writing, hence if I acquire a read lock, also\n  others can have it, but no one can write on the record\n* `exclusive lock` (write lock) preventing both read and write operations, hence nobody can read or write the record\n  until my modification are not committed\n\nNo matter the strategy we choose, the database has to acquire an exclusive lock when a user wants to commit a change to\na record; therefore, if multiple users are trying to perform an operation that will eventually impact the same record a\n`deadlock` can happen, meaning that both the users hold a lock on interconnected data (think of a parent post entity and\na child comment entity). If this happens, the database will need to kill one of the two processes, triggering a rollback\nand the release of the locks, so that at least one transaction can be finalised. Usually, the discriminant is always the\ncost in terms of resources for the database, the operation that requires less effort to resolve the deadlock will be\nperformed (e.g. the transaction that holds fewer locks is more likely to be killed). Some databases instead use time as\ndiscriminant: the transaction that started first is the one that holds the right to survive.\n\nIn Java, `Synchronized` is a `exclusive lock`, while `shared lock` can be implemented with `ReadWriteLock` from the\nconcurrency API.\n\n### MVCC\n\nMost database systems nowadays use by default a `Multi-Version Concurrency Control` to overcome the transaction response\ntime and the scalability issue inherent in the two-phase locking mechanism. The premises of MVCC are that readers and\nwriters do not block each other; the only source of contention is concurrent writers which will undermine atomicity.\nEssentially, to prevent blocking, a previous version of a record can be rebuild so that an uncommited change can be\nhidden from concurrent readers. The name multi-version effectively comes from the fact that at a certain point in time,\nwe can have multiple versions of the same record. For example, postgresql stores two additional columns in our entities,\n`xmin` and `xmax`, which are used to control the visibility of various row versions; an update statement first deletes a\nrecord and then recreates it with the updated value. However, until committed, both versions of the record exist. Other\ndatabase vendors have only one version of the record at the time but store the difference between caused by the update\nin the undo log.\n\n#### Amdahl's law\n\n![](./images/acid/amdahl.png)\n\nThe more locks our application needs to acquire, the less parallelization we can achieve, and this is shown by the\n`Amdahl's law` with the correction of `Neil Gunter` which states that the throughput that we can achieve `C(N)` has a\nsaturation point in which, increasing the number of threads or connections `N` won't increase performances; on the\ncontrary, performances will start deteriorating due to the cost of synchronisation (coherency cost).\nThis is one of the reasons why two-phase locking has been abandoned, since moder application now requires a number of\ntransactions per seconds that are orders of magnitude greater tha when the approach was developed in 1981.\nOnly `SQL server` nowadays still uses two-phase locking has the default mechanism.\n\n## Phenomena\n\nThe new paradigm of abandoning strict serializability for a lower isolation level, opens the door to `phenomena`, hence\nthe possibility of data inconsistency. The lower is the isolation level, the more phenomena are allowed; this means that\n**we are shifting the responsibility of data integrity from the database to the application level**.\n\nThe phenomena are:\n\n* `dirty read`: a user reads a not yet committed write from another user; if the write operation abort and rollback, a\n  database record does not reference what the first user is reading, since it was only a temporary value. This can\n  sometimes be a feature, for example, to read the intermediate state of a batch operation involving multiple rows, or\n  to know the advancement status of the batch process. If a read lock is used, dirty read is not possible but also the\n  ability to read the intermediate state. (the same operation can be achieved by `read uncommitted` isolation level but\n  not all the database systems allow it)\n* `non-repeatable read`: two consecutive reads by the same user return a different result because an update operation of\n  a second user interleaved; this can be a problem because the first user might have done some logic based on the first\n  read that is no more applicable with the result of the second read\n* `phantom read`: similar to non-repeatable reads but extended to a whole result set and not a single record (e.g. a\n  find all on the comments of a post). To avoid this, we need a `predicate lock`.\n* `dirty write` (theoretical): a record modified by two separate transactions with the first transaction yet not\n  committed (no exclusive lock is taken by the first transaction). Since it breaks atomicity (the database doesn't know\n  to which state rollback), it is prevented by all database vendor\n* `read skew`: means reading from different transactions states; imagine a user is reading a post in its transaction and\n  in the same time another user modifies the same post, and its child entity post_detail and commits. Now, if the first\n  user, in the same transaction tries to access the post_detail will see the changes from the second user while still\n  seeing the post, fetched previously, as unmodified.\n* `write skew`: similar to read skew, the writes from two users that reads the same record that holds a child entity and\n  then try to modify it in the same transaction gets mixed; this cannot be prevented in t MVCC because we can have a\n  lock only on a single record and not on its entire graph.\n* `lost update`: a user reads a record, and before modifying it another user reads the same record and performs an\n  update; the first users update won't be aware of the changes performed by the second user leading to an unexpected\n  outcome since the first user starting point is different from the current state of the record in the database, hence\n  an update has been lost from the first user\n\n![](./images/acid/levels.png)\n\nN.B. in PostgreSQL `Serializable` is not achieved by tho-phase locking but by a new implementation called\n`serializable snapshot isolation` which check the schedule and determines if there are cycles.\nAlso, `repetable reads` views the database as of the beginning of the user transaction while `read committed` views the\ndatabase as of the beginning of the query\n\nN.B. `repetable read` is equivalent to `snapshot isolation`\n\n---\n\n# Locks\n\nLock can be fundamentally divided in two types: `Physical` or `Pessimistic` locks and `Logical` or `Optimistic` locks.\nPessimistic lock divides in two subcategories:\n\n* Implicit -\u003e Isolation level (see [](#isolation-added-il-sql-92))\n* Explicit -\u003e SQL statements like `FOR UPDATE` or `FOR SHARE` in postgres\n\n## Explicit locks\n\nFocusing on postgres, we have the `LockMode.PESSIMISTIC_READ` which is equivalent to the `FOR SHARE` sql statement\n(other databases like Oracle don't have it, it falls back to an exclusive lock) which allows other to read the same\nrecord but not to acquire an exclusive lock, like a `FOR UPDATE` statement, which will modify the record.\n\nThe `LockMode.PESSIMISTIC_WRITE` is equivalent to the `FOR UPDATE` sql statement and correspond to an exclusive lock.\nBasically we emulate 2PL on top of MVCC, nobody can update or delete the record that we've previously locked. In\npostgres, a user can still insert a record associated with the predicate that has been locked (e.g. the first user has\nlocked for update a post and all its comments, a second user can still insert a new comment). However, this is a\npeculiarity of postgres; MySql, for example, will block the insert until the lock is released. This is called a\n`Predicate lock` and postgres don't support it natively; to have the same behaviour we need to explicitly say that we\nwant a lock on the whole table with:\n\n```sql\nLOCK TABLE post\n    IN SHARE ROW EXCLUSIVE MODE\n    NOWAIT \n```\n\n### Hibernate-specific LockOptions\n\nHibernate has its lock option enumeration that allows for finer grain the lock mode; for example, we can specify how\nlong we allow the transaction to wait for lock acquisition (equivalent to add a sql `FOR UPDATE WAIT`). Similarly,\n`NO_WAIT` (equivalent to `FOR UPDATE NOWAIT` in SQL) has the same purpose, meaning that we want to avoid the transaction\nto stall while acquiring a lock; with NO WAIT we are telling the transaction to check if a lock is present on the record\nnad in that case to immediately abort.\n\n### SKIP_LOCKED\n\n`LockOptions.SKIP_LOCKED` is the only way we can execute queue-based processing tasks on a relational database. Imagine\nto have a queue of jobs that needs to be processed by multiple workers, and we need to ensure that each job is processed\nonly once. To avoid that the same job is taken, we could think to use the `FOR UPDATE` lock, but this will create a\nbottleneck where all the workers fight for the same lock. By using `FOR UPDATE SKIP LOCKED` instead we are allowing for\nparallel processing of the jobs by multiple workers since each will try to find an unlocked job without waiting.\n\n### Advisory lock\n\nAdvisory locks are an exclusive of postgres; they can be both session or transaction-based\n\n---\n\n# Connections\n\nThe throughput X is considered and the number of transactions per second and its reciprocal T_avg is the average\nresponse time\n\n![throughput.png](./images/connections/throughput.png)\n\nThe response time is a combination of several factors:\n\n- acquire database connection\n- statement submission to the database engine\n- statements execution time\n- resultset fetching\n- closing transactions and releasing the connection\n\n![response-time.png](./images/connections/response-time.png)\n\nThe most demanding operation is connection acquisition. The JDBC driver manager acts as a factory of physical database\nconnection; when the application asks for a new connection from the driver, a socket is opened, and a TCP connection is\nestablished between the JDBC client and the database server (the DB will allocate a thread or a process).\n\n![connection-lifecycle.png](./images/connections/connection-lifecycle.png)\n\nThis is why we use connection pools like HikariCP which leave the physical connection open while serving pool\nconnections that can be reused with a small overhead. Even closing a connection pool is not an expensive operation.\n\n![pooled-connection.png](./images/connections/pooled-connection.png)\n\nHibernate DatasourceConnectionProvider is the best choice among the connection pool providers since it offers the best\ncontrol over the DataSource configuration, it supports JTA transactions (for Java EE projects), it can have as many\nproxies as we want chained (like FlexyPool for monitoring), supports also connections pool not supported natively by\nHibernate. What Hibernate sees is just a decorated Datasource.\n\n\u003cimg alt=\"datasource-provider.png\" height=\"200\" src=\"./images/connections/datasource-provider.png\" width=\"600\"/\u003e\n\n---\n\n# Persistence Context in JPA and Hibernate\n\nThe persistence is responsible for managing entities once fetched from the database; we can think it as a Map where the\nkey is the entity identifier, and the values is the entity object reference. Its role is to synchronize the entities\nstate change with the database.\n\n![](./images/persistence-context/api.png)\n\nJPA offers the `EntityManager` interface to interact with the underlying persistence context, while Hibernate, which\npredates JPA, offer the `Session interface` with the same role.\n\n![](./images/persistence-context/entity_manager.png)\n\nSince Hibernate 5.2 the `Session` interface directly implements the `EntityManager` specifications and therefore, its\nimplementation, the `SessionImpl` is directly related as well. These are commonly referred to as `first-level cache`\n\n## Caching\n\nOnce an entity is *managed* (i.e. loaded) by the persistence context, it is also cached, meaning that each successive\nrequest will avoid a database roundtrip.\n\nThe standard caching mechanism offered by the persistence context is the so called `write-behinde` cache mechanism;\nbasically the cache act as buffer, the write operations are not executed when fired but enqueued and scheduled for\nexecution. It will be only at flush time when all enqueued operations are executed and the cache state synchronized with\nthe database. This allows for the write operations to be batched together, reducing the number of round-trips between\napplication and database.\n\n## Entity state transitions\n\nAside from caching entities, the persistence context manages entity state transitions; JPA and Hibernates define\nslightly different methods in their respective interfaces to handle state transitions.\n\n### JPA EntityManager\n\n![](./images/persistence-context/jpa_transitions.png)\n\n* A new entity when created for the first time is in the `New` or `Transiet` state; by calling `persist` it goes into\n  `Managed` state; only at flush time an INSERT statement will be executed.\n* By calling `find` (or any other retrieval method), an entity will be loaded into the persistence context in the\n  `Managed` state\n* By calling `remove` on a manged entity, the entity state will change to `Removed` and a flush time this will result in\n  a DELETE statement being fired to delete the associated row in the table\n* If the persistence context is closed ot the entity in managed state is evicted from it, the entity state will change\n  to `Detached` meaning that it is no longer synchronized with the database.\n* To reattach a detached entity, the `merge` method must be called and, if in the persistence context there\n  isn't another managed entity with the same identifier, the persistence context will fetch the entity directly from the\n  database and copy on it the state of the previously detached entity;\n* There is no method in the JPA EntityManager that results in an UPDATE SQL statement; this is because at flush time,\n  any entity in the `Managed` state will be synchronized with the database. If the persistence context determines the\n  entity changed since it was first loaded (aka `dirty checking`), then it will trigger an UPDATE statement at flush\n  time.\n\n### Hibernate Session\n\nHibernate session adheres to the JPA standards but pre-dated it, therefore, even if the same methods are supported,\nthere\nare some differences as well\n\n![](./images/persistence-context/Hibernate_transitions.png)\n\n* The `save` method is legacy, and unlike persist it returns the entity identifier\n* The fetching can be done not only by entity identifier but also by `naturalId`\n* The `delete` method is also a legacy one; as a matter of fact, the JPA `remove` delegates to the Hibernate `delete`\n  method\n* To reattach a detached entity, there is also the `update` method in addition to the JPA `merge`; this will change the\n  entity state to `Managed` and schedule an UPDATE statement for the next flush operation\n\n#### JPA merge vs Hibernate update\n\nThere is a slight difference in the behavior of JPA `merge` and Hibernate `update` methods, particularly important when\nusing batching. Both are used to reattach a detached entity to the persistence context and to eventually propagate the\nUPDATE statement; however, JPA `merge` executes a SELECT statement for each entity that we need to reattach while\nHibernate `update` is more efficient since it simply reattaches the detached entity without the need of N SELECT\nstatements.\n\n## Dirty checking\n\nDirty checking is the process of detecting entity modification happened in the persistence context; it facilitates\ngreatly the operations needed at the application level since the developer can focus on the domain models state changes\nand leave to the persistence context the generation of the underlying SQL statements.\n\n![](./images/persistence-context/dirty_checking.png)\n\nWhen the persistence context is flushed, the Hibernate Session triggers a `FlushEvent`, handled by its default event\nlistener (`DefaultFlushEventListener`); For each managed entity a `FlushEntityEvent`  is triggered, handled by the\nassociated event listener (`DefaultFlushEntityEventListener`) which in turn calls the `findDirty` method on the\nassociated `EntityPersister`. The latter, for every entity attribute checks if the current value is changed since the\nentity was first loaded in the persistence context; finally, the dirty properties are sent back to the\n`FlushEntityEvent` that will schedule the required UPDATE statements.\n\nWe can conclude that the number of dirty checks is proportional to the number of entities loaded in the persistence\ncontext, multiplied by their properties; since even if only one entity has changed, Hibernate will scan the entire\ncontext, and this can have a significant impact on CPU resources, particularly if the number of managed entities is\nlarge.\n\nTo limit this issue, we could rely on the Hibernate-specific annotation `@DynamicUpdate` which limits the update to the\ncolumns that have effectively changed from their first fetch from the persistence context. This, however, will\nautomatically disable batching, even if a batch size is set.\n\n### Bytecode enhancement\n\nIt is possible to activate, at build time as a maven plugin, the Hibernate bytecode enhancer which will allow you to\nHibernate to modify the bytecode of our java class for specific needs. In the specific case, we are interested in the\ndirty tracking capability of the tool. Essentially, the Hibernate enhanced class will be able to track before flushing\nall the changes in the entity properties and mark them for dirty checking using specific getters and setters; In this\nway, at flush time the persistence context won't need to perform the computation required for dirty checking; instead,\nit will simply ask the entity to return its dirty properties since the entity already holds the states of the changed\nproperties and their name/column.\n\n**N.B. the difference in performance needs to be measured in context, and in general it will have significant effect\nonly when the size of the persistence context is significant.**\n\n```xml\n\n\u003cplugin\u003e\n    \u003cgroupId\u003eorg.Hibernate.orm.tooling\u003c/groupId\u003e\n    \u003cartifactId\u003eHibernate-enhance-maven-plugin\u003c/artifactId\u003e\n    \u003cversion\u003e${Hibernate.version}\u003c/version\u003e\n    \u003cexecutions\u003e\n        \u003cexecution\u003e\n            \u003cconfiguration\u003e\n                \u003cfailOnError\u003etrue\u003c/failOnError\u003e\n                \u003cenableDirtyTracking\u003etrue\u003c/enableDirtyTracking\u003e\n            \u003c/configuration\u003e\n            \u003cgoals\u003e\n                \u003cgoal\u003eenhance\u003c/goal\u003e\n            \u003c/goals\u003e\n        \u003c/execution\u003e\n    \u003c/executions\u003e\n\u003c/plugin\u003e\n```\n\n## Hydration -\u003e read-by-name (Hibernate \u003c 6.0)\n\nWhen an entity is fetched from the database, the `EntityPersister` use the JDBC ResultSet to generate a Java `Object[]`\nto store all entity property values; this operation is called `hydration`. Once the state is loaded, it's stored in the\npersistence context along with the entity. This means that we need twice as much memory to manage an entity. In the\napplication layer, if we know that a specific entity is not to be modified, we can save space fetching the entity in\nread-only mode. This can be done at Session level (`session.setDefaultReadOnly(true)`) or ar query level using hints (\n`.setHint(QueryHints.HINT_READONLY, true)`).\n\n**N.B. read-only queries optimize both memory (no hydration) and CPU (no dirty checking) resources.**\n\nFrom Hibernate \u003e 6.0, mapping the readings from JDBC has been fundamentally changed from a `read-by-name` to a [\n`read-by-position`](https://docs.jboss.org/Hibernate/orm/6.0/migration-guide/migration-guide.html#type) approach,\n\n## Flushing\n\nFlushing is the act of synchronization between the in-memory information held by the persistence context and the\nunderlying database. The persistence context can be flushed either manually or automatically, as a matter of fact, both\nthe JPA and the Hibernate interfaces define the `flush` method to synchronize the in-memory domain models with the\nunderlying database structure. Flush is especially important before running a query or before a transaction commit since\nit guarantees that the in-memory changes are visible; thi prevents [\n`read-your-writes`](https://arpitbhayani.me/blogs/read-your-write-consistency/) consistency issue.\n\n**JPA flushing modes**\n\n![](./images/persistence-context/jpa_flush.png)\n\nThe `COMMIT` flush mode type is prone to inconsistency since it doesn't trigger a flush before every query that may not\ncapture the pending entity state changes\n\n**Hibernate flushing modes**\n\n![](./images/persistence-context/Hibernate_flush.png)\n\nThe `COMMIT` flush mode type is prone to inconsistency since it doesn't trigger a flush before every query that may not\ncapture the pending entity state changes\n\n### AUTO flushing mode\n\n![](./images/persistence-context/auto_flush.png)\n\nJPA and Hibernate AUTO flush modes differ slightly; JPA requires a flush before each query and transaction while\nHibernate use a smarter approach (see `NativeHibernateSessionFactory`, trying to identify if the flush before the query\nexecution is required. To do so, Hibernate inspects the query table space affected by the incoming query, and it\ntriggers a flush only if there is an entity in a state transition in that same query table space. This is to delay as\nmuch as possible the first-level cache (aka persistence context) synchronization.\n\nThe problem with Hibernate optimization is that it doesn't work with a native query out of the box since,\nwhen a query is tagged as native, Hibernate knows that it holds the specific dialect of the underlying database\nprovider. Therefore, it won't parse it (for this reason, the JPA compliant implementation of the Hibernate session will\nforce a flush when it sees a native query, to be sure to maintain consistency). This results in Hibernate being unable\nto know the query space of the incoming query. It is the developer job to instruct the query with the table space that\nneeds to synchronize upon its execution. (\nsee [Hibernate-query-space](https://thorben-janssen.com/Hibernate-query-spaces/)).\n\nAn alternative is to switch to `FlushMode.ALWAYS`, which has the same behavior of the JPA `AUTO`, either at session\nlevel or only for the specific query.\n\n### Flushing in batch processing\n\nFor standard operations, to avoid long locking time and excessive database memory consumption, JPA allows the\npersistence context to span over multiple database transactions; however, in batch processing it is very important to\nkeep the persistence context within a reasonable dimension to avoid committing a single huge transaction that also\nmight fail at the end, rollback, and invalidate all the work done. To avoid this, it's not enough to periodically flush\nanc clear the persistence context. However, we need also to commit the currently running database transaction to avoid a\nsingle huge transaction at the end that either commits or fail and rollback.\n\nThese steps are defined as `flush-clear-commit`:\n\n```java\nprivate void flush(EntityManager entityManager) {\n    //Commit triggers a flush, when using FlushType.AUTO, hence the SQL statements batched are executed\n    entityManager.getTransaction().commit();\n    entityManager.getTransaction().begin();\n    entityManager.clear();\n}\n```\n\n## Events and event listener\n\nHibernates internals defines, for any entity state change, specif events (i.e., `PersistEvent`, `MergeEvent` etc...)\nassociated with a default implementation of an event listener like `DefaultPersistEventListener` (these can be by custom\nimplementations). In turn, the event listener translates the state change in an internal `EntityAction` that can be\nqueued in an `ActionQueue` and gets executed only at flush time. If an entity that is going to be removed has an\nassociation is marked with the `orphan removal strategy`, then the `EntityDeleteAction` at flush time can also generate\nan `OrphanRemovalAction` if the child entity is unreferenced; both the actions trigger a SQL DELETE statement.\n\n![](./images/persistence-context/events.png)\n\nToward the end of the flushing of the persistence context, Hibernate will execute all the actions that have been\nenqueued, but in a strict specific order:\n\n* `OrphanRemovalAction`\n* `EntityInsertAction` and `EntityIdentityInsertAction`\n* `EntityUpdateAction`\n* `CollectionRemoveAction`\n* `CollectionUpdateAction`\n* `CollectionRecreateAction`\n* `EntityDeleteAction`\n\nThis implies that, the order of operations defined at the application level is not what then Hibernate executes, unless\nwe force a flush. For example if we remove an entity with a unique column and in the same context we create a new one\nwith the same value for that unique field, we will incur in a `ConstrainViolationException` since as seen above, the\ndelete action is the last executed by Hibernate action queues, therefore he will try to create the new entity before\ndeleting the older one. The solution would be or to flush right after the calling of the remove (wrong approach) or to\nmake Hibernate fire an update statement by simply changing the existing entity instead of deleting it and recreating it.\n\n**N.B avoiding manual flush we delay the connection acquisition and consequently reduce the transaction response time**\n\n---\n\n# SQL Statements: lifecycle, execution plan and caching\n\nSQL is a declarative language, it \"only\" describes what we as clients want and not how the underlying database engine\nwill ingest the statement and produces the algorithms to retrieve the correct information. In this way, the database can\ntest different execution strategies and estimate which is the most efficient data access plan for the client needs.\n\n![](./images/persistence-context/statements.png)\n\nThe main modules responsible for processing the SQL statements are the `Parser`, the `Optimizer` and the `Executor`.\nThe `Parser` verifies that the SQL statement is both syntactically and semantically correct (i.e., that both the\nspecific\nSQL grammar is correct and that the referenced tables and columns exist). The result of the parsing phase is the\n`syntax tree` (also known as query tree), i.e., the internal logical database representation of the query.\n\nFor a given syntax tree, the database must decide the most efficient data fetching algorithm; the operation of finding\nthe bests `action plans` is performed by the `Optimizer` which evaluates multiple data traversing options like which\naccess method (table scan or index scan), which joining strategy (nested loops, hash join or merge join) and the join\norder. As a result, the Optimizer presents a list of access plans that will be passed to the Executor. The number of\naction plan possible can be very large, depending on the complexity of the query, and it's a cost intensive operation\nthat can increase the transaction response time; therefore, the Optimizer has a fixed time budget for finding a\nreasonable action plan, usually with the most common algorithm: the `Cost-Based optimizer`. In the end, the cost is\ncomputed with the estimate of CPU cycle and I/O operation required for a specific plan. Due to the expensiveness of this\noperation, most database vendors will cache the execution plan chosen but, since the database structure can change over\ntime, they also need a separate process for validating the existing plans.\n\nOnce the best execution plan has been chosen (and cached), the `Executor`, using the storing engine, will use it to\nretrieve the data, built the resul set and, using the `trasaction engine`, guarantee the current transaction data\nintegrity.\n\n## Execution plan cache\n\nBoth statement parsing and execution plan generation are expensive operation, therefore the statement string value is\nused as an input to a hash function which becomes the key associated to the execution plan cache entry; as a\nconsequence, if the statement changes, the database cannot reuse the cached execution plan. A concrete example are the\ndynamically generated JDBC statements.\n\n## Prepared statement\n\n![](./images/persistence-context/prepare.png)\n\nPrepared statements, due to their static nature, allows the data access logic to reuse the same plan for multiple\nexecution since only the bind parameters are supposed to vary at runtime. Because the JDBC PreparedStatements take the\nSQL query at creation time, the database can precompile (`prepare`) it in the syntax tree prior to executing it. During\nthe execution phase the driver sends the binding parameters values allowing the database to compile and run the\nexecution plan right away.\n\nIn PostgreSQL \u003e 9.2 the `prepare` phase only parse and rewrite the statement while the optimization and the planning\nphase are deferred until execution time; in this way the syntax tree is always optimized according to the actual values\nof the binding parameters, leading to an optimal execution plan.\n\nTheoretically, a prepared statement would require 2 database round trip, one for prepare and one for execute (contrary\nto a plain statement); however the JDBC PreparedStatement is optimized to perform both the actions in a single database\nrequest.\n\n## Client-Side vs. Server-Side statement caching:\n\n* Client-Side Prepared Statement: The JDBC driver simply performs parameter substitution and sends the query as a normal\n  SQL command.\n* Server-Side Prepared Statement: The query is sent to the PostgreSQL server as a prepared statement. This means the\n  server parses and plans the query once and can then execute it repeatedly with different parameters more efficiently.\n\nBy default, when you use a prepared statement in Java, the driver doesn’t immediately create a prepared statement on the\nPostgreSQL server. Instead, it first “simulates” a prepared statement on the client side. The driver keeps a count of\nhow many times that statement is executed. Once the same prepared statement has been run at least five times (this “5”\nis the default value for the driver's prepare threshold), the driver then sends a command to the server to actually\ncreate a server-side prepared statement.\n\n---\n\n# Fetching\n\n## Fetching associations\n\nBy default, `@ManyToOne` and `OneToOne` associations use `FetchType.LAZY` (using a LEFT JOIN) while `@OneToMany` and\n`@ManyToMany` use `FetchType.LAZY`. The options can be overridden both by changing the attribute in the mapping or at\nquery time by using an `entity graph`.\n\nHowever, the fetching behavior when there is an eager fetch strategy is different whether we use a direct fetching (e.g.\n`entityManager.find()`) or a JPQL query (e.g. `entityManager.createQuery()`; the direct fetching uses a LEFT JOIN while\nfor the JPQL query an additional query is performed to fetch the entity graph, even if not explicitly specified by the\nquery itself.\n\n**N.B. Lazy fetching is only a hint, the underlying persistence provider might choose to ignore it**\n\nThe fetch strategy EAGER could be used also for retrieve collections, like in a `@OneToMany` relationship, but this is a\nterrible idea.\nImagine a Post entity with a collection of comments and collection of tags; the problem is that the resulting query will\nhave a LEFT OUTER JOIN for both the comments and the tags, but tags and comments don't have any relationship between\nthem, therefore, the only way in SQL to join them is by a **cartesian product** that generates all the possible records\ncombination between the two tables (50 tags and 100 comments will generate 5000 rows). Slightly better would be if we\nuse a JPQL query since, instead of the cartesian product, we will have three queries generated, one for the Post SELECT,\none for the Tags (with an inner join supposing it's a `@ManyToMany` association) and one for the SELECT of Comments\nmatching the Post id.\n\n### N+1 query problem\n\nWhen using `FetchType.LAZY`, hibernates generates a proxy to represent the uninitialized association. Imagine fetching\nthree comments which have a lazy association with their Post parent entity. When first loaded, the Post property in each\ncomment is represented by a proxy object of type `Post.class` which has only the identifier given by the foreign key\ncoming from the Comment entity. At this point, if we try to access the Post, hibernate will execute and additional\nSELECT query to initialize the Post proxy. Now imagine fetching a collection of `N` Comments and later access their Post\nproxy in a for look or in a stream; the result will be that `N` SELECT queries will be performed to initialize the\nproxies `+1` initial query to retrieve the Comments: hence `N+1` queries have been executed when an initial left join\nwould have solved the issue with just one query.\n\nBe aware that an `N+1` situation can incur also if `FetchType.EAGER` is enabled; in fact, if we write a JPQL query\nwithout the JOIN FETCH, hibernate will want to comply with the fetching strategy, therefore, one SELECT will be\nperformed as per JPQL statement, plus N queries to fetch all the associated entities.+\n\n### Fetching multiple collections\n\nWhile fetching multiple `@ManyToOne` and/or `OneToOne` association only requires multiple JOIN FETCH on the child\nentities, without incurring in a cartesian product, fetching multiple collections at once can be more cumbersome.\n\nImagine we have a Post entity with a collection of Comments and a collections of Tags in a `OneToMany` and `@ManyToMany`\nrelationship respectively; now we have two scenarios trying to JOIN FETCH the collection in the same JPQL query, both\nundesirable:\n\n* If at least one of the collection is a List, we will incur in a `MultipleBagFetchException` an Hibernate exception\n  telling us that it List doesn't have a built-in mechanism to avoid duplicates (the ones that will eventually occur due\n  to the cartesian product)\n* The query is executed, but the resultset contains a number of rows equal to the cartesian product of\n  `Post * Comments * Tags` which, depending on the average size of these collections, is not optimal\n\nThe solution is quite simple, i.e. fetch only one collection per query in the same persistence context\n\n## Open Session in View\n\n**N.B. always disable the Hibernate properties, since for legacy reason is enabled by default in spring**\n\nThe Open Session in View is an architectural pattern that aims to hold the persistence context open throughout the whole\nweb request. This allows the service to provide entity without fetching the association, leaving to the UI the ability\nto trigger the proxy if needed. The service layer is still responsible for managing the database transaction, but the\nSession is no longer closed by the `HibernateTransactionManager`.\n\nViewed from a database prospective, there are a number of things that are undesirable:\n\n* After the Service acquire the transaction (`getPosts()`), there is no active transaction, therefore, for the UI to ask\n  for initialize an association a new database connection from the pool is required each time to execute a single fetch\n  in auto-commit mode.\n* Navigating uninitialized proxies can easily trigger a `N+1` query problem\n* No separation of concerns since not only the service layer but also the view layer can access the persistence layer\n\n![](./images/fetching/open-in-view.png)\n\n**N.B. same as for Open session in View, Hibernate has a custom property to allow a similar behavior (i.e., fetching an\nassociation after the persistence context is closed) and can be (BUT SHOULD NOT) be enabled through the property\n`hibernate.enable_lazy_load_no_trans`**\n\n\n---\n\n# Projections\n\nA projection is the operation of fetching a subset of an entity's columns and store it in a convenient POJO class.\nLimiting the number of columns retrieved can be beneficial in terms of performance since only the data required by the\nbusiness case are fetched.\n\n## JPA projections\n\nBy default, in plain JPA, a projection is represented by a Java `Object[]` where the selected columns, retrieved by the\n`ResultSet` for each row, are stored in the order of the SELECT clause. This applies to any JPA Query, be it JPQL,\nCriteria API or native SQL query\n\n```java\nList\u003cObject[]\u003e tuples = entityManager.createQuery(\"\"\"\n                select\n                    p.id,\n                    p.title\n                from Post p\n                \"\"\")\n        .getResultList();\n```\n\n### Tuple\n\nFrom JPA 2.0 the support for `Tuple` projections was added; Tuples container are essentially a map that store the column\nname as key. One of the benefit is that we can access the records by column name instead of column position, therefore\nit the latter changes there won't be any side effect on the application code. Like the Object array, also Tuples can be\nused with any kind of query.\n\n```java\nList\u003cTuple\u003e tuples = entityManager.createQuery(\"\"\"\n                select\n                   p.id as id,\n                   p.title as title\n                from Post p\n                \"\"\", Tuple.class)\n        .getResultList();\n\nlong id = tuple.get(\"id\", Number.class).longValue();\nString title = tuple.get(\"title\", String.class);\n```\n\n### DTO\n\nThe main disadvantage of `Objects[]` and `Tuples` is that the returned values are not typesafe; to solve this issue we\ncan use DTO projects that are essentially Java POJO class which maps only the desired columns. DTO class needs to have\nfield types that can be associated with the expected SQL types returned and a constructor matching the SELECT clause of\nthe query that is going to be projected.\n\n```java\npublic class PostDTO {\n\n    private final Long id;\n\n    private final String title;\n\n    public PostDTO(Number id, String title) {\n        this.id = id.longValue(); // some database engine might return a BigInteger instead of a Long\n        this.title = title;\n    }\n\n    public Long getId() {\n        return id;\n    }\n\n    public String getTitle() {\n        return title;\n    }\n}\n\nList\u003cPostDTO\u003e postDTOs = entityManager.createQuery(\"\"\"\n                select new com.vladmihalcea.hpjp.Hibernate.forum.dto.PostDTO(\n                    p.id,\n                    p.title\n                )\n                from Post p\n                \"\"\", PostDTO.class)\n        .getResultList();\n```\n\nBy default, we need to reference the DTO projection with the full package name in the query; to solve this, improving\nreadability and allowing to move the DTOs from one package to another freely, we can use the Hibernate\n`ClassImportIntegrator` to register our DTOs, by supplying the configuration to the `Hibernate.integrator_provider`.\n\n```java\nimport java.util.Properties;\n\npublic void additionalProperties(Properties properties) {\n    properties.put(\n            \"Hibernate.integrator_provider\",\n            (IntegratorProvider) () -\u003e Collections.singletonList(\n                    new ClassImportIntegrator(\n                            List.of(\n                                    PostDTO.class,\n                                    PostRecord.class\n                            )\n                    ).excludePath(\"com.vladmihalcea.hpjp.Hibernate\") // in case of conflicting DTOs name we can narrow down the Hibernate registration by specifying the base package path to exclude\n            )\n    );\n}\n```\n\nBy doing so we are able to use the simple class name of the DTO in the JPQL query.\n\n#### mapping native SQL queries\n\nHowever, DTO projects will work out of the box only with JPQL queries and not with native SQL queries. To work with the\nlatter there is a qute verbose fallback plan consisting in the use of a specific `@SqlResultSetMapping` and a\n`@NamedNativeQuery` on top of the specific entity we want to map.\n\n```java\n\n@NamedNativeQuery(\n        name = \"PostDTONativeQuery\",\n        query = \"\"\"\n                SELECT\n                   p.id AS id,\n                   p.title AS title\n                FROM post p\n                \"\"\",\n        resultSetMapping = \"PostDTOMapping\"\n)\n@SqlResultSetMapping(\n        name = \"PostDTOMapping\",\n        classes = @ConstructorResult(\n                targetClass = PostDTO.class,\n                columns = {\n                        @ColumnResult(name = \"id\"),\n                        @ColumnResult(name = \"title\")\n                }\n        )\n)\n@Entity(name = \"Post\")\n@Table(name = \"post\")\npublic class Post {\n    // ...\n}\n```\n\nWhere the SQL query needs to use the same column aliases that are expected by the `@ConstructorResult` mapping. To then\nexecute a named query, be it native or not, we use the entity manager `.createNamedQuery` method:\n\n```java\nvar postDTOs = entityManager.createNamedQuery(\"PostDTOEntityQuery\", PostDTO.class).getResultList();\n```\n\n**N.B. DTOs projection is perfectly suited for the use of Java Records as POJO**\n\n## Hibernate projections\n\nPrior to version 6, Hibernate allowed defining a custom `ResultTransformer` that use a DTO to projects a resultset by\nits canonical constructor and java beans setter methods. Now there are alternatives (as of today, version 6.2 still\nincubating) like `TupleTransfromer` and `ResultListTransformer` that can perform the same task but are quite messy (the\nformer is used to cast the JDBC resultset in the specific type of the destination DTO while the latter is used to filter\nthe result list that may contain duplicates since a cartesian product is performed when a one-to-may collection is\npresent).\n\n## Bets approach for projecting parent-child relationship\n\nAs of today (February 2025) the best approach to project to DTO condensed information from a parent-child relationship\nis to use interface projection. An interface is defined with the getters method related to the fields we need in the\nprojection; the resultset is then mapped into a proxy object\n\n```java\ninterface PostWithCommentsProjection {\n    Long getId();\n\n    String getTitle();\n\n    List\u003cCommentProjection\u003e getComments();\n\n    interface CommentProjection {\n        String getComment();\n    }\n}\n\nclass PostRepository {\n    @EntityGraph(attributePaths = \"comments\")\n    @Query(\"select p.id as id, p.title as title, c.comment as comment from Post p left join p.comments c \")\n    List\u003cPostWithCommentsProjection\u003e findAllByProjecting();\n}\n```\n\nThe proxy can then be mapped to a concrete DTO and returned to the client.\n\n___\n\n# Batching in Hibernate\n\nTo enable batching in Hibernate, only a single property is required (while with plain JDBC a programmatic configuration\nis required)\n\n```yaml\nHibernate.jdbc.batch_size: 5\n```\n\nThis setting is configured at the `EntityManagerFactory` (or `SessionFactory`) level so it will apply to all the\nsessions the same batch size. From Hibernate 5.2 we can also set the jdbc batch size per query basis, optimizing each\nbusiness case.\n\n```java\n// Setting the batch size to null at the end of the method, will reset the entity manager configuration for the\n// next usage of the extended entity manager\n\n@PersistenceContext(type = PeristenceContextType.Extendend)\nprivate EntityManager entityManager;\n\npublic void batchPerQuery() {\n    entityManager.unwrap(Session.class).setJdbcBatchSize(10);\n//...\n    entityManager.unwrap(Session.class).setJdbcBatchSize(null);\n}\n```\n\nIf the entity identifier use the `GenerationType.IDENTITY`, Hibernate disable the batch insert since the only way to\nknow the entity id, needed to construct the first-level cache entry key, is to execute the actual INSERT statement.\n\n**N.B. the restriction doesn't apply to UPDATE and DELETE statements that can still benefits of batch operation even\nwith the identity primary key**\n\n## Bulking operations\n\nBatching is not the only way to execute statements on multiple rows at once; SQL offers `bulk operations` to modify a\nset of rows that satisfy a filtering criteria\n\n```SQL\n-- examples\nUPDATE post\nSET version = version + 1;\nDELETE\nFROM post\nWHERE version \u003e 1;\n```\n\n**N.B. operating on too many entities at once, especially in a highly concurrent environment, can be a problem both for\nbatching and bulk operations, since we are performing long-running transaction that will block any other write operation\n**\n\n## Batching in cascade\n\nImagine a parent entity with a `@OneToMany` mapping and `CascadeType.ALL` (e.g. post and post_comment); even if we\nenable batch operations and try to insert multiple post with associated post_comments, Hibernate will execute separately\none insert statement for each entity persisted; this because JDBC batching requires executing the same\n`PreparedStatement` over and over, but in this case the insert of a post in followed by the insert of a post_comment and\ntherefore the batch needs to be flushed prior to switching to the next post entity.\n\nTo solve this we need to enable another property that tells Hibernate to sort the type of statements while making sure\nthat the parent-child integrity is preserved.\n\n```yaml\nHibernate.order_insert: true\nHibernate.order_updates: true\n```\n\n**N.B the same applies to batch UPDATE**\n\n### DELETE cascade\n\nUnlike INSERT and UPDATE statements, there is no property to sort DELETE statements in batch operations when cascading\ndeletes applies. However, there are some workarounds:\n\n* delete all the child entities and then flux the persistence context before removing the parent entities\n* bulk deleting the child entities (this implies to change the cascade type to only `PERSIST` and `MERGE` which has also\n  the benefit of a faster flushing operation since the persistence context doesn't need to propagate the delete\n  statement to the child entities)\n* (BEST APPROACH) delegating the DELETE of the child entity to the database engine by adding a database-level directive\n  of cascade delete on the foreign key\n  ```SQL\n  alter table post_comment\n  add constraint fk_post_comment_post\n  foreign key (post_id) references post on delete cascade\n  ```\n\n### Batching on versioned entity\n\nPrior to Hibernate 5 or when using Oracle \u003c 12c it was not possible to perform batch operations on entity with a\n`@Version` field, since, due to some old JDBC driver logics, it would incur in an `OptimistickLockException` or\n`StaleObjectStateException` due to a mismatch in the entity update count.\n\nTo solve this, since Hibernate 5 the property `Hibernate.jdbc.batch:versioned_data` is set to **true** by default.\n\n## Default UPDATE behavior\n\nThe default UPDATE behavior consents to batch statements that modify different columns of the same entity since all the\ncolumns are sent over the network, even those which haven't been modified. This leads to a wider possibility of batching\nbut with some potential disadvantages:\n\n* if there are large columns (e.g. blob) we are sending these always over the network as well\n* all indexes are scanned\n* replication node will also be propagated with all the columns, not just those modified\n* possible accidental execution of triggers\n\nHowever, at the cost of disabling batching entirely for a given entity, we can mark it with the Hibernate annotation\n`@DynamicUpdate` which will select only the modified columns over the network. This will disable batching because a\nchange in the binding parameters effectively results in a different prepared statement.\n\n---\n\n# Primary Keys and JPA identifiers\n\nPrimary keys are the unique identifier of a row in a table.\n\nWe can choose between a **natural ID**, i.e., a unique identifier naturally related to the entity we are storing, like a\nsocial security number or a book ISBN. Natural identifiers are usually not the best choice due to their space overhead (\nto be unique they are generally long).\n\nThe most common option is a **surrogate key**, i.e., a generic unique identifier. We can choose from:\n\n- UUID (128bits)\n- Auto-increment column (from 8 to 64 bits at max if we use long)\n\nThe dimension of the primary key and the efficiency of its associated index; b-trees are self-balancing tree data\nstructures at the core of relational databases, and they work better with sequential indexes because a new index is\nalways appended at the end of the clustered index, hence the physical ordering will match the logical ordering resulting\nin an optimal key-set pagination (searching for a range of primary keys) since we will have sequential reads. If the key\nis generated randomly, we will have fragmentation and page splits leading to more I/O operations.\n\n## JPA identifiers\n\nIn JPA and Hibernate each entity requires an identifier for the primary key mapping. It can be manually assigned (using\nonly the `@Id`, annotation don’t do this) or generated by the provider with 3 different strategies:\n\n- Identity → `GenerationType.IDENTITY`, using the physical database identity column. The identity generator can be\n  applied only to a single column; An internal counter is incremented every time it is invoked using a lightweight\n  locking mechanism that is not transactional (i.e., **rollbacks can lead to gaps in the identity column values of two\n  consecutive columns rows**) and can release the lock right away after the increment.\n  **DRAWBACKS:**\n  The new value assigned from the counter is known only after executing the INSERT statement.\n  Since the ID generation and the insert statement occur in a different transaction, Hibernate disables the batch\n  insert. Hibernate issues the insert statement during the persist method call without waiting for the first-level\n  cache (i.e., the Persistence Context) to flush and synchronize the entity state changes with the database.\n- Sequence → `GenerationType.SEQUENCE`, using a sequence generator. A Sequence is a database object that generates a\n  number upon incrementing an internal counter, and this can be done by incremental steps, allowing for\n  application-level optimization techniques (like caching strategy to preallocate a set of values reducing the number of\n  database round trip). The sequence call can be decoupled from the insert statement, allowing for batch insert.\n  Like Identity columns, sequences use lightweight locks (released right after the increment operation) to prevent\n  concurrent transactions from acquiring the same value, but since sequence increments are not transactional, gaps can\n  be found in the primary key value of consecutive rows (not a bug).\n- Table → `GenerationType.TABLE` (**DON’T USE IT,** best to use identity if sequences are not supported), to emulate\n  a database sequence using a separate table (for database vendors that do not support sequences) with a low-level\n  lock (row-level locking) that is transactional and requires the whole insert transaction to commit or rollback. An\n  alternative is to have a separate transaction handling the value generation, but this requires a separate database\n  connection\n\n---\n\n# Entity Relationship\n\n## `@ManyToOne`\n\nThe `@ManyToOne` mapping is the most natural way to map a foreign key relationship between a parent and a child entity.\nThe annotation is placed on the child entity, usually with a LAZY behavior and optionally with a `@JoinColum` (mandatory\nonly if we use different naming in the entities' variables)\n\n```java\n\n@ManyToOne(fetch = FetchType.LAZY)\n@JoinColumn(name = \"post_id\")\nprivate Post post;\n```\n\n### bidirectional\n\nIf we need the relationship to be **bidirectional**, the parent entity must be hold a collection of the child entity\nannotated with `@OneToMany`, specifying the mapping to the child variable holding the reference, the cascade type and\nthe\norphan removal strategy (usually, if the entities are strongly tied, the cascade type is set to ALL and the orphan\nremoval to true). Moreover, even if it is the child-side responsibility to sync the association, it is common to have\nutility methods to synchronize both sides of the relationship if used in a `@Transactional` method.\n\n```java\n\n@OneToMany(\n        mappedBy = Comment_.POST,\n        cascade = CascadeType.ALL,\n        orphanRemoval = true)\nprivate List\u003cComment\u003e comments = new ArrayList\u003c\u003e();\n\npublic void addComment(Comment comment) {\n    comments.add(comment);\n    comment.post(this);\n}\n\npublic void removeComment(Comment comment) {\n    comments.remove(comment);\n    comment.post(null);\n}\n```\n\nIn this way, the removal of a comment is efficient since executes only one DELETE statement and remove the reference\nfrom the Comment object so that it can be garbage collected.\n\n## Unidirectional `@OneToMany`\n\n(DON'T USE IT IF YOU CAN)\n\nEven if uncommon, we might opt to hold a unidirectional reference only on the parent-side of the relationship.\nIts performance depends on the implementation and on the type of collection we implement, but always worse (less\nperformant) than a bidirectional one.\n\n### join table\n\nOne solution would be to have a join table acting as collector of the parent-child relationship, however this means to\nhave an extra table with two foreign keys (and most probably two indexes) instead of only one on the child side.\n\nTo map the collection on the parent-side we can use either a `Set` or a `List`\n\n#### List vs Set Collections\n\nWhile inserting elements in the two collections require the same effort, when coming to the remove of an element,\n`Lists` are inefficients since they require the join table to remove all the rows associated with the id of the parent\nentity to then re-add all the parent-child association rows except the one associated with the child entity we were\nremoving from the collection. To this, it must be added a re-balancing of the indexes in the join table two times, first\nwhen the rows are removed, and then we are re-added.\n\nIf there is a meaning of ordering in the join column we could use the `List` collection together with the `@OrderColumn`\nannotation to reduce the burden of using lists; in this way, if we want to remove the last element only two delete\nstatement are executed, one for the join table and one for the child table. However, if we are not removing the last\nelement, Hibernate will execute an update statement for each row that will be shifted.\n\n### `@JoinColumn`\n\nAn alternative, that requires the child-entity to hold a reference to the parent, is to annotate the parent-side\ncollection with the `@JoinColumn` annotation. However, this approach is also inefficient since for persisting elements\nin the parent side collection Hibernate will have to issue and insert statement and an update for each element\npersisted. The update is required since the child entity is flushed before the parent-side collection, therefore,\nHibernate has no clue about the foreign key value, hence an update is required to set the foreign key. If the option\n`nullable=false` is specified in the `@JoinColumn` annotation, Hibernate will flush the child entity with the foreign\nkey populated, but it will issue an update statement anyway.\n\nSimilarly, deleting an element from the parent-side collection has bad performance if the `nullable=false` is not set,\nHibernate will first fire an update statement on the child entity to set the foreign key value to null, and only after\nis will issue the delete statement to remove that same child entity. If `nullable=false` we save the first useless\nupdate statement.\n\n## `@OneToOne`\n\n### unidirectional\n\nThe one-to-one unidirectional relationship can be mapped in two ways, with a `@JoinColumn` or with a `@MapsId`\nannotation. In the first case we would need two indexes on the child table, one for its primary key and one for the\nforeign key pointing to the parent entity.\n\nWith `@MapsId` we have several advantages, parent and child table shares the same primary key, therefore, even if the\nrelation is unidirectional (from the child side) we can access the parent or the child knowing one id; the indexes on\nthe child side are reduced to one.\n\n### bidirectional\n\nIf it is required to access the child entity even from the parent side, a `@OneToOne` annotation is required on the\nparent-side. However, there is the possibility of incurring in an N+1 performance bottleneck: in fact, since Hibernate\nneeds to know if assign a null value or an object to the one-to-one mapping, a select query is performed for each post\nentity retrieved in order to check, and eventually find, if there is a child entity connected.\n\nTherefore, is a query like the following executed, n+1 queries are executed!\n\n```java\n\n@Query(value = \"\"\"\n        select * from post p\n        where p.title like :title\n        \"\"\", nativeQuery = true)\nList\u003cPost\u003e findPostsWhereTitleIn(@Param(\"title\") String title);\n```\n\n## `@ManyToMany`\n\nIn a many-to-many relationship, each side of the relation act as a parent-side, while the join table can be considered\nas the child. The association can be either unidirectional or bidirectional, depending on the fact that we might need to\naccess the collection from both sides. However, in JPA terms, even in a bidirectional mapping, only one side will be\nresponsible to synchronize the mapping. Due to the parent-to-parent relation, the cascade type is confined to `PERSIST`\nand `MERGE` operations, since none of the sides owns the other and, therefore, has no means to determine a cascade\ndelete for example. Furthermore, in the case of a bidirectional mapping, cascade delete would have catastrophic\nconsequence, since the deleting will be ping-ponged between the two entity, resulting in the complete deletion of all\nrecords.\n\nAs for a unidirectional `@OneToMany` association there is a difference in the behavior triggered by the underling\ncollection type; as a matter of fact, while Lists and Sets behave equally for insertion, upon deletion Lists will first\nremove all the rows associated with the id of the entity owner of the collection, to then reinsert all the rows but the\none that actually we wanted to remove. Sets instead execute a punctual delete on the row of the join table.\n\nUsually, the join table is not modelled explicitly but embedded in one of the two parent-side with the `@JoinTable`\nannotation\n\n```java\n\n@ManyToMany(cascade = {CascadeType.PERSIST, CascadeType.MERGE})\n@JoinTable(\n        name = \"post_tag\",\n        joinColumns = @JoinColumn(name = \"post_id\"),\n        inverseJoinColumns = @JoinColumn(name = \"tag_id\"))\nprivate Set\u003cTag\u003e tags = new HashSet\u003c\u003e();\n```\n\n### Explicit mapping\n\nThe join table can also be explicitly mapped into an entity; in this case the relation between the two parent-side\nbecomes like a bidirectional `@OneToMany` mapping, and the use of cascade type ALL is possible since the two parent\nentity doesn't speak directly one to another but only through the join table entity acting now as a child to both the\nends.\n\nThe join table can now contain more attributes than merely the two ids; the combined primary key, composed by the two\nids of the parents entities, id defined as an `@Embeddable`. The join table entity has therefore an `@EmbeddedId` and at\nleast two fields with a `@ManyToOne` annotation, pointing to the parents collections and with the `@MapsId` annotation,\ndelegating the foreign key reference to the embeddable type.\n\n```java\n\n@Entity\n@Table(name = \"post_tag\")\nclass PostTag {\n\n    public PostTag(Post post, Tag tag) {\n        this.post = post;\n        this.tag = tag;\n        this.id = new PostTagId(post().id(), tag.id());\n    }\n\n    @EmbeddedId\n    private PostTagId id;\n\n    @ManyToOne\n    @MapsId(PostTagId_.POST_ID)\n    private Post post;\n\n    @ManyToOne\n    @MapsId(PostTagId_.TAG_ID)\n    private Tag tag;\n\n}\n\n@Embeddable\nrecord PostTagId(@Column(name = \"post_id\")\n                 Long postId,\n                 @Column(name = \"tag_id\")\n                 Long tagId) {\n}\n```\n\nFrom the parents side, we now have a collections of the new child entity, and we can use List without incurring in the\nHibernate bag behavior seen in the unidirectional `@OnetoMany` mapping (i.e., we have a single delete statement instead\nof a deleted all of n records where id = my_id and a n-1 insert back). The synchronization methods are again useful on\nthe parent side, even if their implementation is a bit more cumbersome since we need to keep in sync both ends of the\nmany-to-many association.\n\n```java\npublic void addTag(Tag tag) {\n    PostTag postTag = new PostTag(this, tag);\n    tags.add(postTag);\n    tag.posts().add(postTag);\n}\n\npublic Post removeTag(Tag tag) {\n    tags.stream().filter(t -\u003e t.post().equals(this) \u0026\u0026 t.tag().equals(tag))\n            .findFirst()\n            .ifPresent(t -\u003e {\n                tags.remove(t);\n                t.tag().posts().remove(t);\n                t.post(null);\n                t.tag(null);\n            });\n    return this;\n}\n```\n\n# JPA inheritance\n\nInheritance is a common paradigm in OOP to vary the behavior of a super-class (interface) depending on its subclasses'\nimplementation. With JPA inheritance we can implement behavioral design pattern such as the **Strategy Pattern** to\nvarying the business logic independently by the underlying database table.\n\nIn general, in a RDBMS, inheritance can only be emulated through table relationships\n\nMartin Fowler design patter for RDBMS inheritance:\n\n* **Single Table inheritance**: a single database table is used to represent all classes in a given inheritance\n  hierarchy\n* **Class Table inheritance**: the parent class and each subclass are mapped to separate tables related by the foreign\n  key\n  on the base class; the subclasses tables contains only the fields that are not in the parent class table.\n* **Concrete Table inheritance**: each table in the hierarchy defines all attributes\n\nJPA inheritance mapping models:\n\n* `InheritanceType.SINGLE_TABLE`\n* `InheritanceType.JOINED`\n* `InheritanceType.TABLE_PER_CLASS`\n* `@MappedSuperclass` (inheritance is available only in the Domain Model without being mirrored in the database)\n\n## Single table inheritance\n\n![](./images/inheritance/single-table.png)\n\nPros: query efficiency, since we have one single table to query\n\nCons: data integrity; we are not respecting the consistency principle of ACID since we can enforce non-nullability on\nthe application level (on the entities) but not on the persistence layer (since a single table represents more than one\nentity there will be fields that are always null for one child entity but not the other, hence nullability can't be\nconstrained on the persistence layer). The only alternative is to use DB specifics constrain like CHECK (PostgreSQL) or\nTRIGGER (MySQL)\n\n### `@DiscriminatorColumn` and `@DiscriminatorValue`\n\nWhen using single table inheritance, by default JPA use a discriminator column of type `String` called `DTYPE` to\ndifferentiate the child entities of the single table with a discriminator value equal to the entity classes name.\nHowever, this is not the only option since we can choose to use a custom string or opt to a char or an int. We only need\nto annotate the parent entity with the `@DiscriminatorColumn`annotation specifying the type and the name of the column,\nand then annotates all the child entities with the `@DiscriminatorValue`.\n\n```java\n\n@DiscriminatorColumn(\n        discriminatorType = DiscriminatorType.STRING,\n        name = \"topic_type_id\",\n        columnDefinition = \"VARCHAR(3)\"\n)\n@DiscriminatorValue(\"TPC\")\npublic class Topic {\n    // ....\n}\n```\n\n## Joined inheritance\n\n![](./images/inheritance/join-table.png)\n\nPros: Explicit representation of the child entities and consistency in nullability\n\nCons: Expensive polymorphic queries due to the number of join\n\nIn the joined inheritance, the child entities have an explicit table that contains their specific properties while the\ncommon attributes are defined in the parent table; child and parent entity share the same id column.\nAs a direct consequence, an insert of child entity requires the execution of two insert statements, one for the parent\nand one for the child entity. While in single inheritance we have a single index (a single pkey) shared between parent\nand child entities, the explicit child table representation requires the presence of more indexes. In contrast, joined\ninheritance allows for consistency since we can respect nullability in subclasses both on the application and on the\npersistence layer. Polymorphic queries are also more expensive since Hibernate needs to resolve all the possible\nsubclasses of the parent entity, leading to N + 1 joins where N is the number of subclasses, leading to a suboptimal\nexecution plan.\n\n## Table per class\n\n![](./images/inheritance/table-per-class.png)\n\nN.B. Identity generation strategy is not allowed since it can't guarantee unique identifier between parent and children\nentities and this will generate conflicts in polymorphic queries which needs a way to provide unique results\n\nPros: Write operation are faster since we are inserting only once in the specific subclass\n\nCons: Polymorphic queries use Hibernate `UNION ALL` in inner queries and therefore are very inefficient; besides, not\nall Hibernates dialect support UNION ALL and fall back to UNION which adds a sorting phase to eliminate duplicates,\nsomething that is redundant since polymorphic queries cannot contain duplicates since the entity identifier and the\ndiscriminator column provides unique results in the inheritance tree.\n\nIn the table per class inheritance, the child entities contain all the fields that are shared with the parent entity\nplus their specifics ones. There is no foreign key neither between parent and children.\n\n## `@MappedSuperclass`\n\n![](./images/inheritance/mapped.png)\n\nPros: Efficient read and write operations\n\nCons: No polymorphic queries or associations since there is no dedicated table for the parent class that now lives only\nat the application level annotated with `@MappedSuperclass`\n\nUsing the `@MappedSuperclass` inheritance strategy, the persistence layer is represented only by the concrete\nrepresentations of the child entities. The parent entity is modeled for convenience only at the application level as\nabstract class and owing the fields common to every member of the inheritance tree. Hence, no polymorphic queries are\npossible, since the inheritance hierarchy exist only at the application level.\n\n---\n\n# EnumType\n\nAn EnumType can be mapped to a database column in 3 ways:\n\n- Using JPA `@Enumerated` annotation:\n    - with `EnumType.STRING` by which the enum is stored as a string. The string representation occupies more bits but\n      it is human-readable\n\n  ![string-enum.png](./images/enumtype/string-enum.png)\n\n    - with `EnumType.ORDINAL` by which the enum is stored as an int representing the literal value. The ordinal\n      representation saves bites but, for a service consuming this data, it doesn’t give any way to interpret the data\n      without a decoding table. If we know the enum to have less than 256 values we can use a tinyint. To map the\n      decoding table post_status_info we need a `@ManyToOne`association on the table containing the enum column,\n      specifying that the item cannot be inserted or updated since we don't want to have two owner of the same data\n\n  ![ordinal-enum-1.png](./images/enumtype/ordinal-enum-1.png)\n\n  ![ordinal-enum-2.png](./images/enumtype/ordinal-enum-2.png)\n\n- Creating a custom type (if the db vendor permits it) like the PostgreSQL EnumType, by which the database will be able\n  to store the string value of the enum while reducing the space required in comparison to the varchar implementation\n  required in EnumType.STRING\n\n  ![psql-enum-create.png](./images/enumtype/psql-enum-create.png)\n\n  Since Hibernate is not aware of the custom enum type we need to explicitly state its definition programmatically\n\n  ![psql-enum.png](./images/enumtype/psql-enum.png)\n\n  And create a custom class that extends the default Hibernate EnumType, overriding the nullSafeSet method that is\n  responsible for binding the enum type as a jdbc-prepared statement parameter\n\n  ![psql-custom-type.png](./images/enumtype/psql-custom-type.png)\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffdifrison%2Fjava-persistence","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffdifrison%2Fjava-persistence","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffdifrison%2Fjava-persistence/lists"}