Odoo PostgreSQL DDL Lock Queue Incident Runbook
A production-safe runbook for AccessExclusive lock incidents during Odoo schema changes, with deterministic triage, containment, recovery, and prevention.
DDL lock incidents in Odoo usually show up during module upgrades, emergency hotfixes, or manual schema changes: one session requests an ACCESS EXCLUSIVE lock, traffic keeps writing, and suddenly user requests stall behind a lock queue.
This runbook is for real-time containment with minimal blast radius: capture evidence, stop lock amplification, unwind blockers safely, and complete or roll back the change cleanly.
Incident signals
Treat this as an active incident when multiple signals persist for more than a few minutes:
- Odoo write paths (confirm order, validate picking, post invoice) time out during/after deploy.
pg_stat_activityshows many sessions waiting withwait_event_type = 'Lock'.- One migration/DDL query is waiting or holding lock on hot tables (
account_move,stock_move,sale_order,ir_attachment). - PgBouncer active client count rises while DB throughput drops.
- Error logs show lock timeout or statement timeout spikes.
Step 0 — Stabilize and prevent further lock amplification
Before terminating anything:
- Freeze new deploys and schema changes.
- Pause non-critical cron/import workloads that add write pressure.
- Keep one DB operator and one app operator paired during mitigation.
- Capture a snapshot of blockers/waiters before making changes.
# Example: pause heavy scheduled work first
odoocli cron pause --tag heavy-write
# Capture current lock/wait state to incident notes
psql "$ODOO_DB_URI" -c "
select now() as captured_at;
select pid, usename, application_name, state, wait_event_type, wait_event,
now() - query_start as runtime,
left(query, 220) as query
from pg_stat_activity
where datname = current_database()
order by query_start asc;
"
Step 1 — Confirm DDL lock queue and identify root blocker
# Show waiting vs granted locks with relation names
psql "$ODOO_DB_URI" -c "
select
a.pid,
a.usename,
a.application_name,
l.mode,
l.granted,
coalesce(c.relname::text, l.relation::text) as relation,
now() - a.query_start as query_age,
left(a.query, 200) as query
from pg_locks l
join pg_stat_activity a on a.pid = l.pid
left join pg_class c on c.oid = l.relation
where a.datname = current_database()
order by l.granted asc, query_age desc;
"
# Focus on AccessExclusive lock holders/waiters (common DDL choke point)
psql "$ODOO_DB_URI" -c "
select
a.pid,
a.application_name,
l.granted,
now() - a.query_start as age,
coalesce(c.relname::text, l.relation::text) as relation,
left(a.query, 220) as query
from pg_locks l
join pg_stat_activity a on a.pid = l.pid
left join pg_class c on c.oid = l.relation
where a.datname = current_database()
and l.mode = 'AccessExclusiveLock'
order by l.granted asc, age desc;
"
# Blocking graph: who is waiting on whom
psql "$ODOO_DB_URI" -c "
select
w.pid as waiting_pid,
w.application_name as waiting_app,
now() - w.query_start as waiting_for,
b.pid as blocking_pid,
b.application_name as blocking_app,
now() - b.query_start as blocking_for,
left(w.query, 120) as waiting_query,
left(b.query, 120) as blocking_query
from pg_stat_activity w
join pg_stat_activity b on b.pid = any(pg_blocking_pids(w.pid))
where w.datname = current_database()
order by waiting_for desc;
"
Triage checklist
- Is a migration/DDL statement requesting or holding
AccessExclusiveLockon a hot table? - Are blockers from Odoo workers/cron, or from the migration session itself?
- Is the lock queue growing (more waiters every sample) or draining?
- Is customer-facing write latency above SLO while queue depth rises?
If yes, move to controlled containment.
Step 2 — Contain safely (cancel first, terminate second)
Use this escalation order to reduce data risk:
- Cancel long-running blocker statements that are safe to retry.
- Re-sample blocking graph.
- Terminate only clearly stuck sessions that keep the queue frozen.
- Avoid bulk-killing all sessions.
# First choice: cancel statement, keep backend alive
psql "$ODOO_DB_URI" -c "select pg_cancel_backend(<blocking_pid>);"
# Escalate only when needed
psql "$ODOO_DB_URI" -c "select pg_terminate_backend(<blocking_pid>);"
Production-safe ordering:
- Clear non-critical app/cron blockers first.
- Re-check lock graph after each small batch (1-3 sessions).
- Touch migration backend last unless it is the confirmed root blocker and blast radius is widening.
- Do not terminate replication/backup/admin sessions unless explicitly validated.
Step 3 — Decide path: finish-forward or roll back the schema change
Path A: Finish-forward (preferred if queue is draining)
Use when DDL can complete quickly once blockers are removed.
-- Guardrail for migration session to fail fast instead of hanging forever
set lock_timeout = '5s';
set statement_timeout = '15min';
Re-run migration in controlled mode (maintenance window, reduced app write load):
odoo-bin -c /etc/odoo/odoo.conf -d "$ODOO_DB" -u <module_name> --stop-after-init
Path B: Roll back change attempt (preferred if impact keeps growing)
Use when retries keep recreating lock queues or customer impact is severe.
- Stop active migration attempt.
- Revert to previous release artifact/config.
- Resume app traffic only after lock queue is clear and core writes succeed.
- Schedule migration for dedicated window with stricter guardrails.
Step 4 — Verify recovery before incident closure
# Waiters should trend down toward zero
psql "$ODOO_DB_URI" -c "
select count(*) as lock_waiters
from pg_stat_activity
where datname = current_database()
and wait_event_type = 'Lock';
"
# No remaining AccessExclusive waits on hot relations
psql "$ODOO_DB_URI" -c "
select count(*) as access_exclusive_waiters
from pg_locks l
join pg_stat_activity a on a.pid = l.pid
where a.datname = current_database()
and l.mode = 'AccessExclusiveLock'
and not l.granted;
"
# Validate module state is not stuck mid-upgrade
psql "$ODOO_DB_URI" -c "
select name, state
from ir_module_module
where state in ('to install', 'to upgrade', 'to remove')
order by name;
"
# App-side symptom check
odoocli logs tail --service odoo --since 15m --grep "lock timeout|could not obtain lock|statement timeout"
Close only when all are true:
- Lock waiters stay near zero across repeated checks.
- No persistent
AccessExclusiveLockqueue on hot tables. - Odoo write flows succeed end-to-end.
- No fresh lock-timeout burst after resuming paused workloads.
Rollback / safety path if remediation increases risk
If cancellation/termination worsens instability:
- Stop further backend termination.
- Keep non-critical write producers paused.
- Revert migration attempt and return to known-good app release.
- Resume traffic in phases and monitor lock waiters.
If you applied aggressive session-level guardrails for emergency handling, revert them after stabilization:
reset lock_timeout;
reset statement_timeout;
Hardening and prevention checklist
- Require migration windows for DDL touching high-traffic Odoo tables.
- Set explicit
lock_timeoutin migration jobs so lock waits fail fast. - Prefer
CREATE INDEX CONCURRENTLY/REINDEX CONCURRENTLYwhere applicable. - Break large schema changes into small, reversible steps.
- Pause heavy cron/import jobs during upgrade windows by default.
- Alert on lock-wait count, oldest waiting query age, and
AccessExclusiveLockwaiters. - Add preflight "hot table lock risk" checks to release runbooks.
The core principle: never let DDL compete blindly with peak write traffic. Control write pressure first, then run schema change with explicit lock guardrails.