WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS WORK IN PROGRESS

Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PostgreSQL Subtransaction Overflow Incident Runbook

A production-safe runbook for Odoo incidents that trigger PostgreSQL "too many subtransactions in progress" errors from excessive SAVEPOINT usage.

When Odoo workloads (imports, custom cron, migration scripts, or ORM loops) create deep SAVEPOINT nesting inside long transactions, PostgreSQL can fail requests with:

  • ERROR: too many subtransactions in progress

This runbook gives a practical recovery path: stabilize write traffic, identify the offending worker/job, drain or stop the bad transaction pattern, recover service health, and harden code paths to prevent recurrence.

Incident signals

Treat this as active when multiple signals appear together:

  • Odoo logs show too many subtransactions in progress.
  • Failed writes spike on specific flows (imports, stock moves, reconciliations, or custom batch jobs).
  • One or more DB sessions show long transaction age tied to worker/cron users.
  • Error rate drops temporarily after worker restart, then returns under the same workload.

Step 0 — Stabilize first (limit new write pressure)

Before touching DB backends:

  1. Pause non-critical cron jobs and bulk imports.
  2. Temporarily block new admin-initiated mass operations.
  3. Keep customer-facing read paths and critical financial flows prioritized.
  4. Assign one incident owner for app actions and one for DB actions.

Step 1 — Confirm error pattern and locate likely offenders

# Odoo-side confirmation
odoocli logs tail --service odoo --since 30m --grep "too many subtransactions|SAVEPOINT|RELEASE SAVEPOINT|ROLLBACK TO SAVEPOINT"
# PostgreSQL logs (adapt to your platform)
odoocli logs tail --service postgres --since 30m --grep "too many subtransactions|statement:|duration:"
# Find long/open transactions in current DB
psql "$ODOO_DB_URI" -c "
select
  pid,
  usename,
  application_name,
  client_addr,
  state,
  now() - xact_start as xact_age,
  now() - query_start as query_age,
  wait_event_type,
  wait_event,
  left(query, 220) as query
from pg_stat_activity
where datname = current_database()
  and xact_start is not null
order by xact_start asc
limit 30;
"
# Sessions with unusually high lock footprint (often correlated with deep nested transactional work)
psql "$ODOO_DB_URI" -c "
select
  a.pid,
  a.usename,
  a.application_name,
  a.state,
  now() - a.xact_start as xact_age,
  count(l.*) as lock_count,
  left(max(a.query), 160) as sample_query
from pg_stat_activity a
join pg_locks l on l.pid = a.pid
where a.datname = current_database()
  and a.xact_start is not null
group by a.pid, a.usename, a.application_name, a.state, a.xact_start
order by lock_count desc, xact_age desc
limit 20;
"

Triage checklist

  • Is one cron/import/custom worker repeatedly tied to the error?
  • Is there a long-lived transaction repeatedly failing with SAVEPOINT-heavy logic?
  • Did a recent module deploy change batching, retry loops, or per-row exception handling?
  • Can you pause the specific workload without stopping all business-critical traffic?

If yes, proceed with controlled remediation.

Step 2 — Controlled remediation (safe order)

2.1 Stop the offending workload first

  • Pause the specific cron(s), queue consumer(s), or import process causing nested SAVEPOINT churn.
  • If from a custom module, disable that feature flag or route traffic away temporarily.

Do this before backend termination so the pattern does not immediately restart.

2.2 Gracefully cancel the worst session(s)

Prefer cancel before terminate.

-- Graceful cancel (first choice)
select pg_cancel_backend(<pid>);
-- Hard terminate only if cancel fails and incident owner approves
select pg_terminate_backend(<pid>);

Use termination only for clearly non-critical sessions tied to the failing workload.

2.3 Recycle impacted Odoo workers

# Systemd example
sudo systemctl restart odoo

# Container example
docker restart <odoo_container>

If you run separate worker pools, restart only the affected worker group first.

2.4 Reduce transaction scope before resuming jobs

Before re-enabling paused jobs:

  • Decrease batch size (for example, 10k rows -> 500 rows).
  • Commit between chunks (short transactions), not once at end.
  • Avoid per-row SAVEPOINT in large loops; use chunk-level retry boundaries.
  • Add backoff/retry caps to prevent repeated SAVEPOINT storms.

Step 3 — Rollback / safety path

If impact worsens during remediation:

  1. Re-pause all non-critical write jobs.
  2. Stop further backend termination.
  3. Revert temporary worker/concurrency changes.
  4. Keep only essential transactional flows active.
  5. Schedule controlled maintenance window if code rollback is required.

Step 4 — Verification before incident close

# Confirm no long problematic transactions remain
psql "$ODOO_DB_URI" -c "
select pid, usename, state, now() - xact_start as xact_age, left(query, 160)
from pg_stat_activity
where datname = current_database()
  and xact_start is not null
order by xact_start asc
limit 15;
"
# Confirm error signature is gone from app logs
odoocli logs tail --service odoo --since 20m --grep "too many subtransactions|SAVEPOINT|Traceback"
# Smoke test critical write path (example)
odoocli shell -c "env['res.partner'].create({'name': 'Subxid Incident Smoke Test'})"

Exit incident mode only when:

  • No new too many subtransactions in progress errors appear for at least one normal workload cycle.
  • Transaction age distribution returns to baseline.
  • Paused jobs are re-enabled gradually without error recurrence.

Hardening and prevention checklist

  • Refactor custom module loops to commit per chunk, not one giant transaction.
  • Replace per-record SAVEPOINT exception handling with bounded chunk-level retries.
  • Add job-level safeguards: max batch size, max runtime, and retry ceilings.
  • Alert on long transaction age (xact_age) for Odoo roles.
  • Set role-level statement_timeout, lock_timeout, and (where safe) idle_in_transaction_session_timeout.
  • Add canary runs for large imports after module upgrades.
  • Include SAVEPOINT-heavy flows in pre-production load tests.
  • Document safe batch sizes per job type (accounting, stock, ETL, mail sync).

Practical rule: pause noisy writers first, cancel/terminate only the confirmed offenders, then restart with smaller transaction boundaries.

Back to blog