Odoo Slow-Query Regression After Module Upgrade Runbook
A production-safe runbook to detect, contain, and remediate PostgreSQL slow-query regressions introduced by Odoo module upgrades.
A module upgrade can pass functional smoke tests and still trigger a query-plan or ORM access-pattern regression under real traffic.
This runbook is for operators who need to restore latency quickly without gambling on blind restarts.
Incident signals that justify immediate response
- P95/P99 API latency rises within minutes of a module upgrade.
- PostgreSQL CPU/IO climbs while request throughput drops.
pg_stat_statementsshows new or suddenly expensive query fingerprints.- Odoo workers start hitting timeout guardrails (
limit_time_real, proxy timeout). - Lock waits and queue depth increase in business flows that were previously stable.
Step 0 — Stabilize and preserve evidence
- Freeze additional deploys and follow-up module upgrades.
- Pause non-essential heavy jobs (bulk exports, mass mailers, large recomputes).
- Keep user-facing traffic flowing; do not full-restart all layers unless service is hard down.
- Capture current top SQL before making changes.
# Optional if not already enabled (requires shared_preload_libraries + restart)
psql "$ODOO_DB_URI" -c "create extension if not exists pg_stat_statements;"
Step 1 — Confirm this is a post-upgrade SQL regression
1.1 Check top queries by total execution time
select
queryid,
calls,
round(total_exec_time::numeric, 2) as total_ms,
round(mean_exec_time::numeric, 2) as mean_ms,
round((100 * shared_blks_hit::numeric / nullif(shared_blks_hit + shared_blks_read, 0)), 2) as hit_pct,
left(query, 220) as query
from pg_stat_statements
order by total_exec_time desc
limit 20;
1.2 Find newly expensive queries with poor mean time
select
queryid,
calls,
round(mean_exec_time::numeric, 2) as mean_ms,
round(rows::numeric / nullif(calls, 0), 2) as rows_per_call,
left(query, 220) as query
from pg_stat_statements
where calls > 50
order by mean_exec_time desc
limit 20;
1.3 Confirm live blockers/waits from Odoo sessions
select
pid,
now() - query_start as age,
wait_event_type,
wait_event,
state,
left(query, 180) as query
from pg_stat_activity
where datname = current_database()
and application_name ilike '%odoo%'
and state <> 'idle'
order by query_start asc
limit 30;
If the worst fingerprints appeared or spiked right after upgrade window, treat this as upgrade-related regression until disproven.
Step 2 — Map query regression to upgraded module path
2.1 Correlate upgrade logs and first error/latency spike
journalctl -u odoo -S -2h --no-pager | egrep -i "module|loading|upgrading|registry|ERROR|WARNING" | tail -n 300
2.2 Identify tables most impacted by slow fingerprints
select
schemaname,
relname,
seq_scan,
idx_scan,
n_live_tup,
n_dead_tup
from pg_stat_user_tables
order by seq_scan desc
limit 20;
High seq_scan growth on large business tables right after upgrade often means missing/ineffective indexes for new query shapes.
Step 3 — Contain impact with safest-first remediation
3.1 Cancel runaway statements before terminate
-- Prefer cancel first
select pg_cancel_backend(<pid>);
-- Escalate only if session is stuck and user impact continues
select pg_terminate_backend(<pid>);
3.2 Add temporary guardrails to avoid worker pileups
alter role odoo set statement_timeout = '45s';
alter role odoo set lock_timeout = '8s';
psql "$ODOO_DB_URI" -c "show statement_timeout; show lock_timeout;"
3.3 Apply missing indexes without blocking writes
Create indexes concurrently for the identified predicates/order columns.
-- Example pattern; adapt to your actual slow query filter/order columns
create index concurrently if not exists idx_sale_order_company_state_date_order
on sale_order (company_id, state, date_order desc);
After index creation, refresh planner stats:
analyze sale_order;
3.4 If regression is ORM/code-path level, use controlled rollback
If query shape is fundamentally bad after upgrade and cannot be fixed fast:
- Roll back to previous known-good release SHA.
- Keep schema-compatible emergency indexes (if safe) until proper patch lands.
- Re-run module upgrade in staging with production-like volume before retrying prod.
# Example app rollback flow (adapt to your deploy tooling)
git checkout <last_known_good_sha>
sudo systemctl restart odoo
Step 4 — Verification and rollback checks
Exit incident mode only when all are true for at least 20 minutes:
- P95/P99 latency returns near baseline.
- No sustained growth in active queries over 30 seconds.
- Top regressed query mean/total execution time drops materially.
- Business-critical flows (checkout, invoice posting, stock validation) succeed.
watch -n 10 "psql \"$ODOO_DB_URI\" -Atc \"
select count(*)
from pg_stat_activity
where datname = current_database()
and state = 'active'
and now() - query_start > interval '30 seconds';\""
Hardening checklist (post-incident)
- Keep
pg_stat_statementsenabled and retained across restarts. - Capture top query baselines before every module upgrade window.
- Require staging load tests for upgraded modules with production-like data volume.
- Add migration review check for new domains/order clauses lacking matching indexes.
- Alert on latency + top query drift (mean/total execution time deltas).
- Keep a tested rollback path (release SHA + DB restore point) for every upgrade.
Practical references
- PostgreSQL
pg_stat_statements: https://www.postgresql.org/docs/current/pgstatstatements.html - PostgreSQL monitoring stats (
pg_stat_activity, table stats): https://www.postgresql.org/docs/current/monitoring-stats.html - PostgreSQL
CREATE INDEX CONCURRENTLY: https://www.postgresql.org/docs/current/sql-createindex.html - PostgreSQL
ANALYZE: https://www.postgresql.org/docs/current/sql-analyze.html - Odoo upgrade/deployment operations docs: https://www.odoo.com/documentation/17.0/administration/on_premise/deploy.html
Core principle: treat post-upgrade slow-query incidents as a contain, measure, fix, verify loop. Fast evidence beats fast guessing.