Odoo PostgreSQL Prepared Transaction Orphan Incident Runbook
A production-safe runbook to detect orphaned prepared transactions (2PC), unblock locks, and recover Odoo write throughput without corrupting in-flight business operations.
Prepared transactions (PREPARE TRANSACTION) are uncommon in default Odoo setups, but they appear in some real environments via integration middleware, custom connectors, or transaction managers.
When those coordinators fail mid-flow, orphaned prepared transactions can hold locks, block vacuum progress, and quietly degrade Odoo writes.
This runbook is for controlled recovery: identify the owning flow, release locks safely, and prevent recurrence.
Incident signals
Treat this as an active incident when one or more signals persist:
- Odoo writes stall intermittently while DB CPU is not fully saturated.
- Lock waits increase on hot tables (
account_move,stock_move,mail_message). pg_prepared_xactscontains old entries (minutes to hours) instead of short-lived 2PC windows.- You see
idle in transaction/lock chains but root sessions are no longer active. - Autovacuum progress on hot tables is poor despite workers running.
Step 0 — Stabilize before touching 2PC state
- Pause non-critical batch imports, heavy cron lanes, and replay jobs.
- Keep revenue/warehouse-critical flows online if possible.
- Assign one incident lead for business ownership decisions on commit vs rollback.
Do not blindly ROLLBACK PREPARED everything. A wrong decision can create external side-effect mismatches (for example payment captured externally but DB rollback locally).
Step 1 — Confirm and size the prepared-transaction blast radius
# How many prepared transactions exist and how old are they?
psql "$ODOO_DB_URI" -c "
select
gid,
prepared,
now() - prepared as age,
owner,
database
from pg_prepared_xacts
order by prepared asc;
"
# Track lock footprint from prepared xacts
psql "$ODOO_DB_URI" -c "
select
l.locktype,
l.mode,
l.granted,
l.relation::regclass as relation,
px.gid,
now() - px.prepared as prepared_age
from pg_locks l
join pg_prepared_xacts px on px.transaction = l.transactionid
order by prepared_age desc, relation nulls last;
"
# Who is blocked right now?
psql "$ODOO_DB_URI" -c "
select
a.pid,
a.usename,
a.application_name,
a.state,
now() - a.query_start as query_age,
a.wait_event_type,
a.wait_event,
left(a.query, 160) as query
from pg_stat_activity a
where a.datname = current_database()
and a.wait_event_type = 'Lock'
order by a.query_start asc;
"
Capture outputs in incident notes before remediation.
Step 2 — Triage checklist (commit vs rollback decision)
For each gid in pg_prepared_xacts, answer:
- What integration/service created this transaction? (
application_name, integration logs, coordinator ID) - Is the external side effect already committed (payment capture, shipping label, webhook ack)?
- Is there an authoritative transaction coordinator that can still resolve it safely?
- Is the prepared transaction age far beyond expected 2PC SLA (for example > 5 minutes)?
Decision rule:
- COMMIT PREPARED when external side effects are confirmed committed and DB must match reality.
- ROLLBACK PREPARED when side effects are confirmed not committed, or explicitly marked failed by coordinator.
- If uncertain, isolate the affected business lane and escalate to app owner before resolving.
Step 3 — Resolve oldest high-impact prepared transactions first
Start with the oldest gid holding the most blocking locks.
# Example: finalize a known-good prepared transaction
psql "$ODOO_DB_URI" -c "COMMIT PREPARED 'odoo_xa_20260321_104455_42';"
# Example: cancel a known-bad prepared transaction
psql "$ODOO_DB_URI" -c "ROLLBACK PREPARED 'odoo_xa_20260321_104901_77';"
Operational guardrails:
- Resolve one or a small batch at a time, then re-check blocked sessions.
- Never run mass rollback/commit scripts without ownership mapping.
- Keep a timeline:
gid, action taken, approver, reason.
Step 4 — Verify lock release and Odoo recovery trend
# Prepared xacts should trend toward zero (or normal tiny baseline)
psql "$ODOO_DB_URI" -c "
select count(*) as prepared_count,
min(now() - prepared) as youngest_age,
max(now() - prepared) as oldest_age
from pg_prepared_xacts;
"
# Blocking lock pressure should drop
psql "$ODOO_DB_URI" -c "
select wait_event_type, wait_event, count(*)
from pg_stat_activity
where datname = current_database()
group by 1,2
order by count(*) desc;
"
# Validate application-side symptom recovery
odoocli logs tail --service odoo --since 15m --grep "canceling statement|deadlock|timeout|could not obtain lock"
Close incident only after:
- Prepared transaction count returns to expected baseline.
- Lock-wait queue shrinks across consecutive checks.
- Odoo business writes complete at normal latency.
Rollback/safety actions if remediation worsens impact
- Stop additional
COMMIT/ROLLBACK PREPAREDoperations immediately. - Reconfirm business ownership mapping for unresolved
gidvalues. - Re-enable paused workloads gradually only after lock-wait trend improves.
- If data consistency is uncertain, switch affected flows to read-only/degraded mode and run reconciliation jobs.
Hardening and prevention checklist
- If you do not intentionally use 2PC, set
max_prepared_transactions = 0(or keep tightly bounded) and restart in a maintenance window. - Add alerting on:
count(*) from pg_prepared_xacts > 0beyond expected window- oldest prepared age above your SLO (for example 5m)
- Require integration owners to include a deterministic
gidformat with service + timestamp + correlation ID. - Log coordinator decisions (commit/rollback) to durable audit logs.
- Run quarterly failure drills: coordinator crash after
PREPARE TRANSACTION, then operator recovery using this runbook.
Core principle: prepared transactions are not inherently dangerous, but unresolved ones are silent lock bombs. Resolve with business-context evidence, not guesswork.