In this article I describe a recent problem I had with PostgreSQL, where a deadlock caused my application to crash for unclear reasons. Naturally I also describe how to debug such problems and the reason for the deadlock.
We have a large transaction doing batch uploads which inserts rows of item_component. Every 100 entities inserted it will flush the session to the DB in order to save memory. That does not commit the transaction, it just sends the SQL commands to the database.
For each item inserted we create a separate transaction which:
Because our batch insertion transaction does not need to lock the person table, this should work fine.
Our code does not work: the first 100 items are created fine, the person objects are locked and incremented 100 times, but when I flush the first 100 inserted items to the DB (which works OK), the next person lock/increment hangs forever. Deadlock.
After ruling out anything that is triggered by our JPA code (disable Hibernate Search, entity listeners, caching), I have to conclude this could be a database mystery.
The first step is to find out the PostgreSQL transaction IDs of my JPA transactions. This can be done using the code shown below. Our batch insertion has TX ID 3/1501 while our blocking lock/increment TX ID is 2/4942. If we list all lock and transaction information using the query shown below (after filtering out locks on indexes, views and sequences and removing columns we do not care about) we get the following results:
| # | Type | TX ID | VTX ID | Mode | G | Name | K | SQL | W |
|---|---|---|---|---|---|---|---|---|---|
| TX = Transaction; VTX = Virtual TX; G = Granted; K = Kind; W = Waiting | |||||||||
| 1 | tuple | 2/4941 | ExclusiveLock | t | person | r | select person0_.pers...rson_id=$1 for update | t | |
| 2 | relation | 2/4941 | RowShareLock | t | person | r | select person0_.pers...rson_id=$1 for update | t | |
| 3 | virtualxid | 2/4941 | ExclusiveLock | t | select person0_.pers...rson_id=$1 for update | t | |||
| 4 | transactionid | 774990 | 2/4941 | ShareLock | f | select person0_.pers...rson_id=$1 for update | t | ||
| 5 | relation | 3/1501 | RowShareLock | t | person | r | <IDLE> in transaction | f | |
| 6 | relation | 3/1501 | RowExclusiveLock | t | item_component | r | <IDLE> in transaction | f | |
| 7 | relation | 3/1501 | AccessShareLock | t | person | r | <IDLE> in transaction | f | |
| 8 | virtualxid | 3/1501 | ExclusiveLock | t | <IDLE> in transaction | f | |||
| 9 | transactionid | 774990 | 3/1501 | ExclusiveLock | t | <IDLE> in transaction | f | ||
Let's first describe what each line is about:
My lock/increment transaction is waiting for the batch insertion transaction to release its lock on the person row it is trying to lock. We now have to find out why it acquired this lock since I did not do it manually.
After trying it myself in a PostgreSQL console, the following SQL causes a ROW SHARE lock to be obtained on the person row referenced by foreign key 2:
begin;
insert into item_component (item_component_id, creator_id) values (nextval('hibernate_sequence'), 2);
Of course this ROW SHARE lock is held until my batch insert transaction is committed, and prevents my lock/increment transactions from doing their job.
After a friendly chat on the PostgreSQL IRC channel I was able to get a confirmation that this behaviour is normal and a result of the RI (Referential Integrity) triggers. I have sent a mail to the documentation mailing list asking for better documentation of such implicit side-effect locks that will help others like me not waste days finding this out.
Now I need to find a better strategy for my code.
Use this code to find out your PostgreSQL virtual transaction ID:
public Object getTXID() { Query query = entityManager.createNativeQuery("select virtualtransaction from pg_locks where pid = pg_backend_pid()"); return query.getSingleResult(); }
Use this SQL to list all locks and their transactions:
select locktype, database, relation, page, tuple, transactionid, virtualtransaction, mode, granted, relname, relkind, datname, usename, case when length(current_query) > 40 then substring(current_query for 20) || '...' || substring(current_query from (length(current_query) - 19) for 20) else current_query end as query, waiting, query_start, client_addr, client_port from pg_locks left outer join pg_class on oid = relation left outer join pg_stat_activity on pid = procpid order by pid;
Stéphane "FroMage" Épardaud is a senior software developer at Lunatech Research. He enjoys good arguments, especially if he will win them or/and learn something, so don't hesitate to comment by dropping him a mail.
Please send comments on this article to editorial@lunatech.com.
Copyright © 2005-2012, Lunatech Research B.V. Tous droits réservés.