WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS

Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PostgreSQL MultiXact Wraparound Emergency Runbook

A production-safe runbook to detect PostgreSQL MultiXact age emergencies in Odoo workloads, recover safely, and harden autovacuum before forced shutdown conditions.

When Odoo workloads generate heavy row-lock churn (inventory reservation, cron bursts, queue workers), PostgreSQL MultiXact IDs can age toward wraparound.

If not controlled, PostgreSQL will escalate from warnings to protective behavior that blocks writes to avoid data loss.

This runbook focuses on practical triage and safe remediation under pressure.

Incident signals

Treat this as active when one or more of these persist:

  • PostgreSQL logs warn about MultiXact age approaching wraparound thresholds.
  • Errors appear like database is not accepting commands that assign new MultiXactIds.
  • Odoo symptoms: failing writes, stuck confirmations/transfers, queue/cron retries climbing.
  • mxid_age(datminmxid) is dangerously high relative to autovacuum_multixact_freeze_max_age.

Step 0 — Stabilize and reduce lock churn

  1. Freeze deploys, migrations, and DDL.
  2. Pause non-critical Odoo cron lanes and queue consumers.
  3. Pause bulk imports, mass mail, and backfills.
  4. Keep one incident lead responsible for kill/pause decisions.
# Example: inspect Odoo errors for write failures / lock churn patterns
odoocli logs tail --service odoo --since 20m --grep "MultiXact|wraparound|could not serialize|lock|deadlock"

Step 1 — Confirm MultiXact pressure and blast radius

-- Database-level age snapshot
select
  datname,
  mxid_age(datminmxid) as mxid_age,
  age(datfrozenxid) as xid_age
from pg_database
order by mxid_age desc;
-- Current safety limit (defaults vary by Postgres version/config)
show autovacuum_multixact_freeze_max_age;
-- Percentage to limit by database
with cfg as (
  select current_setting('autovacuum_multixact_freeze_max_age')::numeric as mxid_max
)
select
  d.datname,
  mxid_age(d.datminmxid) as mxid_age,
  round(100.0 * mxid_age(d.datminmxid) / cfg.mxid_max, 2) as pct_of_limit
from pg_database d
cross join cfg
order by pct_of_limit desc;

Escalate to emergency response when a production database is nearing/exceeding your internal danger threshold (commonly 85–90%+ of configured max).

Step 2 — Find highest-risk tables and blockers

-- Table-level hotspots by MultiXact age
select
  n.nspname,
  c.relname,
  c.relkind,
  mxid_age(c.relminmxid) as mxid_age,
  age(c.relfrozenxid) as xid_age,
  pg_size_pretty(pg_total_relation_size(c.oid)) as total_size
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
where c.relkind in ('r','m','t')
  and n.nspname not in ('pg_catalog', 'information_schema')
order by mxid_age desc
limit 30;
-- Long transactions can hold old horizons and slow freeze progress
select
  pid,
  usename,
  application_name,
  client_addr,
  state,
  now() - xact_start as xact_age,
  wait_event_type,
  wait_event,
  left(query, 180) as query
from pg_stat_activity
where xact_start is not null
order by xact_start asc
limit 20;
-- Blockers/waiters to identify lock chains
select
  a.pid as waiting_pid,
  a.application_name as waiting_app,
  a.wait_event_type,
  a.wait_event,
  pg_blocking_pids(a.pid) as blocking_pids,
  left(a.query, 120) as waiting_query
from pg_stat_activity a
where cardinality(pg_blocking_pids(a.pid)) > 0;

Step 3 — Safe emergency remediation

3.1 Stop creating more pressure

  • Keep non-critical worker pools paused.
  • Reduce concurrent Odoo write traffic if possible (maintenance window or traffic shaping).
  • Avoid wide table rewrites and schema changes.

3.2 Remove known blockers (carefully)

Terminate only clearly non-critical or stale long transactions first.

-- REVIEW rows first; terminate with incident-lead approval
select pg_terminate_backend(pid)
from pg_stat_activity
where pid <> pg_backend_pid()
  and xact_start is not null
  and now() - xact_start > interval '15 minutes'
  and application_name in ('odoo', 'odoo-cron', 'odoo-queue');

If you cannot safely target by application_name, generate a reviewed PID list manually before termination.

3.3 Run targeted emergency vacuum freeze on hottest tables

Run table-by-table from highest mxid_age downward.

vacuum (verbose, freeze, index_cleanup off, analyze) public.stock_move;
vacuum (verbose, freeze, index_cleanup off, analyze) public.stock_quant;
vacuum (verbose, freeze, index_cleanup off, analyze) public.mail_message;

Notes:

  • INDEX_CLEANUP OFF can reduce extra I/O overhead during emergency freeze runs.
  • Start with highest-risk lock-heavy tables, then continue down the hotspot list.
  • Keep write load throttled while these vacuums run.

3.4 Verify age is moving in the right direction

select
  n.nspname,
  c.relname,
  mxid_age(c.relminmxid) as mxid_age_after
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
where c.relkind = 'r'
  and n.nspname = 'public'
order by mxid_age_after desc
limit 20;
select datname, mxid_age(datminmxid) as db_mxid_age_after
from pg_database
order by db_mxid_age_after desc;

Step 4 — Controlled recovery of Odoo services

  1. Re-enable paused workloads in stages (core user traffic first, then cron, then queues).
  2. Watch lock waits and transaction age during each ramp step.
  3. Roll back to reduced concurrency immediately if age/lock pressure spikes again.
# After each ramp step, validate app-side error trend
odoocli logs tail --service odoo --since 10m --grep "ERROR|deadlock|timeout|MultiXact|wraparound"

Step 5 — Exit criteria

Close incident only when all are true:

  • mxid_age(datminmxid) is back below your alert threshold with stable trend.
  • No fresh PostgreSQL wraparound warnings for at least 30 minutes.
  • Odoo write paths (sales/orders/inventory/posting) complete at normal error rate.
  • Queues/cron drain without rebuilding backlog.

Rollback/safety actions

If remediation increases instability:

  • Pause additional workload classes again.
  • Stop broad vacuum attempts; return to a narrower, highest-risk table order.
  • Revert temporary worker concurrency increases.
  • If storage latency is the limiter, reduce parallelism and prioritize business-critical tables first.

Hardening and prevention checklist

  • Alert on database and table-level MultiXact age trend (not only XID age).
  • Tune autovacuum for Odoo hot tables with per-table settings where needed, for example:
alter table public.stock_move set (
  autovacuum_vacuum_scale_factor = 0.02,
  autovacuum_vacuum_threshold = 5000,
  autovacuum_analyze_scale_factor = 0.02
);
  • Ensure autovacuum workers are not starved during peak hours (autovacuum_max_workers, I/O budget).
  • Eliminate chronic long transactions in cron/queue code paths.
  • Add operational guardrails for bulk jobs (batch size, commit cadence, concurrency caps).
  • Review autovacuum_multixact_freeze_max_age in capacity planning; higher is not a substitute for healthy vacuum throughput.
  • Rehearse this runbook in staging with production-like write/lock patterns.

Core principle: MultiXact incidents are usually a throughput and lock-churn problem before they become a wraparound problem. Control churn fast, freeze the hottest tables safely, then harden autovacuum so you do not revisit the same page in a week.

Back to blog