WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS

Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PostgreSQL VACUUM FULL AccessExclusive Lockout Runbook

A production-safe runbook for incidents where VACUUM FULL blocks Odoo traffic by taking AccessExclusive locks on hot tables.

Running VACUUM FULL on a hot Odoo table in peak hours can trigger a lockout incident: read/write requests queue, workers time out, and response latency spikes across modules.

This runbook is for active production response: identify the blocking VACUUM FULL session, stop lock amplification safely, restore service, and move to lower-risk bloat remediation.

Incident signals

Treat this as active when multiple signals appear together:

  • Odoo logs show sudden statement timeout, canceling statement due to lock timeout, or generic request timeouts.
  • Many sessions are waiting on wait_event_type = 'Lock' in PostgreSQL.
  • pg_stat_activity shows VACUUM FULL running on a high-traffic table (stock_move, account_move_line, mail_message, etc.).
  • Queue depth and worker latency rise quickly even though CPU may look moderate.

Step 0 — Stabilize first

Before touching database backends:

  1. Pause non-critical Odoo cron and bulk import jobs.
  2. Freeze manual admin operations that create extra writes (mass updates, re-computes, large exports).
  3. Keep one DB operator and one app operator assigned (single decision path).
  4. Announce “write degradation in progress” in your incident channel.

The goal is to stop adding pressure while you unwind lock queues.

Step 1 — Confirm lockout pattern and identify blocker

# Confirm VACUUM FULL and lock waits in current DB
psql "$ODOO_DB_URI" -c "
select
  pid,
  usename,
  application_name,
  state,
  wait_event_type,
  wait_event,
  now() - xact_start as xact_age,
  left(query, 220) as query
from pg_stat_activity
where datname = current_database()
  and (
    query ilike 'vacuum full%'
    or wait_event_type = 'Lock'
  )
order by xact_start nulls last;
"
# Show blocking tree (who blocks whom)
psql "$ODOO_DB_URI" -c "
with blocked as (
  select
    a.pid as blocked_pid,
    pg_blocking_pids(a.pid) as blocking_pids,
    a.usename,
    a.application_name,
    now() - a.query_start as blocked_for,
    left(a.query, 140) as blocked_query
  from pg_stat_activity a
  where a.datname = current_database()
)
select
  b.blocked_pid,
  b.blocking_pids,
  b.usename,
  b.application_name,
  b.blocked_for,
  b.blocked_query
from blocked b
where cardinality(b.blocking_pids) > 0
order by b.blocked_for desc;
"
# Optional: confirm lock mode on the hot relation
psql "$ODOO_DB_URI" -c "
select
  l.pid,
  l.mode,
  l.granted,
  c.relname,
  left(a.query, 120) as query
from pg_locks l
join pg_class c on c.oid = l.relation
join pg_stat_activity a on a.pid = l.pid
where c.relname in ('stock_move', 'account_move_line', 'mail_message')
order by c.relname, l.granted, l.mode;
"

Triage checklist

  • Is a VACUUM FULL session currently active on a high-traffic table?
  • Are application sessions piling up behind the same blocker PID?
  • Did this start immediately after a manual “bloat cleanup” command?
  • Can non-critical jobs stay paused for 10–20 minutes while queue drains?

If yes, continue with controlled remediation.

Step 2 — Controlled remediation (safe order)

2.1 Stop new maintenance actions

Immediately stop any additional manual maintenance commands (VACUUM FULL, table rewrites, broad REINDEX without CONCURRENTLY) from being started.

2.2 Cancel the blocking VACUUM FULL first

Prefer graceful cancel before termination.

select pg_cancel_backend(<vacuum_full_pid>);

If the session does not stop promptly and lock waiters continue to rise:

select pg_terminate_backend(<vacuum_full_pid>);

Use terminate only with incident owner approval and a clear blocker mapping.

2.3 Drain queued lock waiters safely

# Re-check lock waiters every 30-60s until queue trends down
psql "$ODOO_DB_URI" -c "
select
  count(*) filter (where wait_event_type = 'Lock') as lock_waiters,
  count(*) filter (where state = 'active') as active_sessions,
  count(*) as total_sessions
from pg_stat_activity
where datname = current_database();
"

If specific stale sessions remain blocked for a long duration after blocker removal, cancel them selectively (do not mass-terminate all clients).

2.4 Recycle Odoo workers only if needed

If app workers are still error-looping after lock queue clears:

# Systemd
sudo systemctl restart odoo

# Container
docker restart <odoo_container>

Restart only affected pools first when possible.

Step 3 — Rollback / safety path

If service quality worsens during remediation:

  1. Re-pause all non-critical jobs.
  2. Stop backend termination actions.
  3. Keep only critical business flows online.
  4. Revert temporary worker/concurrency changes made during incident.
  5. Move heavy maintenance to planned off-peak window with explicit lock budget.

Step 4 — Verification before incident close

# Verify blocker command is gone
psql "$ODOO_DB_URI" -c "
select pid, state, wait_event_type, left(query, 160) as query
from pg_stat_activity
where datname = current_database()
  and query ilike 'vacuum full%';
"
# Verify lock waits returned to baseline
psql "$ODOO_DB_URI" -c "
select
  count(*) filter (where wait_event_type = 'Lock') as lock_waiters,
  max(now() - query_start) filter (where wait_event_type = 'Lock') as worst_wait
from pg_stat_activity
where datname = current_database();
"
# App-level check for timeout/lock errors
odoocli logs tail --service odoo --since 20m --grep "lock timeout|statement timeout|could not obtain lock|Traceback"
# Basic write-path smoke test
odoocli shell -c "env['res.partner'].create({'name': 'VACUUM FULL incident smoke'})"

Close incident only when:

  • No active VACUUM FULL remains on hot tables.
  • Lock waiter count is stable near baseline for at least one normal traffic cycle.
  • Critical Odoo create/update flows succeed without timeout spikes.

Step 5 — Safer follow-up maintenance plan (post-incident)

After stabilization, handle bloat with lower-lock alternatives:

  • Prefer VACUUM (ANALYZE) during business hours.
  • Use REINDEX INDEX CONCURRENTLY for index bloat where applicable.
  • For table rewrite-level recovery, schedule maintenance window and test runtime on staging first.
  • Consider pg_repack (where approved) to reduce blocking risk versus ad-hoc VACUUM FULL on live traffic.

Hardening and prevention checklist

  • Restrict who can run VACUUM FULL in production (role and runbook gate).
  • Require preflight checks: table size, expected runtime, lock impact, rollback plan.
  • Add an operational policy: no rewrite-heavy maintenance during peak business windows.
  • Alert on sustained lock waiters and oldest blocked query age.
  • Keep a “hot Odoo tables” list that always requires maintenance-window approval.
  • Tune autovacuum per hot tables to reduce pressure for emergency manual rewrites.
  • Document approved bloat-remediation patterns and unsafe commands in on-call docs.

Practical rule: in an outage, cancel the blocking rewrite first, let lock queues drain, then do bloat cleanup later with a scheduled, lower-risk plan.

References

Back to blog