WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS

Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PostgreSQL Deadlock Storm Runbook

Production-safe incident runbook to detect, contain, and remediate repeated PostgreSQL deadlocks impacting Odoo write paths.

When Odoo users see intermittent failures like deadlock detected during order confirmation, invoice posting, or stock validation, you are not dealing with generic slowness. You are dealing with conflicting transaction order under write pressure.

This runbook focuses on fast containment without risky kill-everything behavior: verify deadlock pattern, reduce contention lanes, remove pathological transactions safely, and restore predictable write throughput.

Incident signals (page-worthy)

Treat as an incident when one or more signals persist for 5–10 minutes:

  • Odoo logs show repeated psycopg2.errors.DeadlockDetected or deadlock detected.
  • Retries in app workers rise, but user-facing writes still fail intermittently.
  • PostgreSQL log volume spikes with deadlock traces.
  • Lock wait count and long transaction age trend upward together.

Triage checklist (first 10 minutes)

  • Freeze deploys/module upgrades while triage is active.
  • Pause non-critical high-write cron lanes/import jobs.
  • Confirm deadlock frequency (not a single isolated event).
  • Capture blocker/waiter evidence before changing sessions.
  • Identify top 2–3 write paths involved (sales, inventory, accounting).

Step 1 — Confirm deadlock storm and blast radius

1.1 Count deadlock growth at database level

psql "$ODOO_DB_URI" -c "
select datname, deadlocks, stats_reset
from pg_stat_database
where datname = current_database();
"

Re-run after 5 minutes. If deadlocks keeps increasing, treat as active storm.

1.2 Inspect active lock pressure and oldest transactions

psql "$ODOO_DB_URI" -c "
select
  pid,
  usename,
  application_name,
  state,
  wait_event_type,
  wait_event,
  now() - xact_start as xact_age,
  now() - query_start as query_age,
  left(query, 180) as query
from pg_stat_activity
where datname = current_database()
  and xact_start is not null
order by xact_start asc
limit 40;
"

1.3 Map blocked ↔ blocking pairs

psql "$ODOO_DB_URI" -c "
select
  a.pid as blocked_pid,
  pg_blocking_pids(a.pid) as blocking_pids,
  a.usename,
  a.application_name,
  now() - a.query_start as blocked_for,
  left(a.query, 140) as blocked_query
from pg_stat_activity a
where a.datname = current_database()
  and cardinality(pg_blocking_pids(a.pid)) > 0
order by a.query_start asc;
"

This shows contention shape, even though deadlocks themselves are resolved by PostgreSQL aborting one transaction.

1.4 Pull Odoo-side deadlock evidence quickly

odoocli logs tail --service odoo --since 15m --grep "deadlock detected|DeadlockDetected|TransactionRollbackError"

Step 2 — Contain load before surgical remediation

  1. Pause only non-critical write amplifiers first (bulk sync/import/recompute cron).
  2. Keep customer-critical paths running where possible.
  3. Avoid broad PostgreSQL or Odoo restarts unless the system is unrecoverable.

Example operational controls:

odoocli doctor --env production
odoocli cron pause --tag heavy-write

Goal: reduce lock graph churn while preserving core business transactions.

Step 3 — Safe remediation order (cancel first, terminate second)

Deadlocks are often amplified by long or repeatedly retried transactions. Remove pathological sessions in the safest order.

3.1 Cancel oldest non-critical active statements

select pg_cancel_backend(<pid>);

3.2 Terminate only if cancel fails or session instantly re-blocks

select pg_terminate_backend(<pid>);

Apply in this order:

  1. BI/reporting/background sessions unrelated to live checkout/accounting flows.
  2. Stale Odoo workers repeatedly retrying the same failing write path.
  3. Last resort: high-impact business sessions after explicit incident commander sign-off.

Do not terminate replication, backup, or migration sessions without checking downstream impact.

Step 4 — Reduce deadlock probability during incident window

Use temporary guardrails so one bad transaction cannot sit and collide indefinitely.

-- Scope these conservatively (role/database/session) under change control.
alter role odoo set lock_timeout = '5s';
alter role odoo set statement_timeout = '90s';

If you apply temporary timeout changes, record them in incident notes and define explicit rollback timing.

Step 5 — Verification loop (every 5 minutes)

5.1 Deadlock counter should stop accelerating

watch -n 60 "psql \"$ODOO_DB_URI\" -Atc \"
select now(), deadlocks
from pg_stat_database
where datname = current_database();
\""

5.2 Lock waits and transaction age should trend down

psql "$ODOO_DB_URI" -c "
select
  count(*) filter (where wait_event_type = 'Lock') as lock_waiters,
  max(now() - xact_start) filter (where xact_start is not null) as oldest_xact_age
from pg_stat_activity
where datname = current_database();
"

5.3 Business-path verification

  • Sales order confirm succeeds.
  • Invoice post succeeds.
  • Stock move validation succeeds.
  • Error logs no longer show deadlock bursts.

Rollback / normalization plan

If temporary controls were applied:

  1. Keep paused write-heavy cron lanes paused until deadlock rate is stable.
  2. Re-enable one lane at a time and monitor deadlock count delta.
  3. Roll back temporary timeout overrides if they are too strict for normal workloads.
alter role odoo reset lock_timeout;
alter role odoo reset statement_timeout;
  1. Resume paused cron jobs in controlled sequence.
odoocli cron resume --tag heavy-write

Hardening checklist (post-incident)

  • Standardize transaction ordering in custom modules touching the same tables (for example: parent record before child lines consistently).
  • Split large batch writes into smaller commits to reduce lock hold time.
  • Add retry with jitter/backoff for known deadlock-prone write paths (idempotent-safe operations only).
  • Alert on pg_stat_database.deadlocks rate-of-change, not just absolute count.
  • Alert on oldest transaction age and lock-waiter count.
  • Review high-contention indexes and query plans after module upgrades.
  • Load-test staging with concurrent writes for inventory/accounting hotspots before production rollout.

Practical references

Operational rule: during deadlock storms, prioritize contention-shape control (load containment + transaction hygiene) over restart-first reactions.

Back to blog