Odoo PostgreSQL Accidental Data Change and PITR Recovery Runbook
Production-safe runbook for containing accidental UPDATE/DELETE incidents in Odoo, recovering lost rows via point-in-time restore to a sidecar, and validating clean rollback.
Accidental mass updates or deletes happen in real systems: a bad server action, unsafe manual SQL, broken integration sync, or a faulty migration script.
This runbook is optimized for minimum additional damage: contain writes fast, recover data from a point-in-time clone, repair surgically, and verify before resuming full traffic.
Scope: Odoo on PostgreSQL where unintended writes already hit production.
Incident signals
Treat this as active incident response when one or more are true:
- Sudden user reports of missing records or unexpectedly changed statuses.
- Row counts in critical tables drop/spike outside normal business patterns.
- Odoo logs show a burst of writes from a cron/integration user around one timestamp.
- PostgreSQL shows abnormal write volume and large transaction commits.
odoocli logs tail --service odoo --since 30m --grep "write\(|unlink\(|cron|server action|Traceback"
-- Quick row-count health check on common critical tables
select 'res_partner' as table_name, count(*) from res_partner
union all
select 'sale_order', count(*) from sale_order
union all
select 'account_move', count(*) from account_move
union all
select 'stock_move', count(*) from stock_move;
Step 0 - Immediate containment (stop the bleeding)
- Freeze deploys and schema changes.
- Pause non-essential writers (cron, queue workers, external integrations).
- Keep one operator on DB commands; one on incident timeline and decision log.
- Preserve evidence before cleanup.
# Example containment pattern (adapt service names to your stack)
odoocli scale --service odoo-cron --replicas 0
odoocli scale --service odoo-worker --replicas 1
If writes are still unsafe, move app traffic to temporary read-only mode at application edge (maintenance page for write endpoints).
Step 1 - Identify blast radius and incident window
1.1 Find suspicious active sessions and recent write statements
select pid,
usename,
application_name,
client_addr,
now() - query_start as runtime,
state,
left(query, 220) as query
from pg_stat_activity
where datname = current_database()
and state <> 'idle'
order by query_start asc;
If the bad session is still running:
-- Prefer cancel first
select pg_cancel_backend(<pid>);
-- Escalate only if cancellation fails and data damage continues
select pg_terminate_backend(<pid>);
1.2 Estimate when damage started
Use write_date on affected Odoo models to bound the incident window.
-- Example for sales records unexpectedly modified
select min(write_date) as earliest_changed,
max(write_date) as latest_changed,
count(*) as changed_rows
from sale_order
where write_date >= now() - interval '2 hours';
Capture a candidate recovery target time T_recover as the last known good minute before the suspicious write burst.
Step 2 - Choose safe recovery strategy
Use this order of preference:
- Targeted logical repair from PITR sidecar clone (preferred for partial corruption).
- Full database PITR cutback (only when corruption is broad and RTO allows full rewind).
For most Odoo production incidents, do not rewind the primary immediately. Recover to a sidecar first, then patch surgically.
Step 3 - Build a PITR sidecar at T_recover
Create a separate restored PostgreSQL instance from base backup + WAL archive with:
recovery_target_time = '<T_recover>'recovery_target_action = 'pause'
Example (conceptual PostgreSQL config on sidecar):
restore_command = 'cp /mnt/wal-archive/%f %p'
recovery_target_time = '2026-03-21 17:42:00+00'
recovery_target_action = 'pause'
After startup, verify sidecar recovery point:
select pg_is_in_recovery(), now();
Then expose sidecar read-only for diff and export.
Step 4 - Diff current primary vs recovered sidecar
Compare counts and sample IDs on affected tables.
-- Run on PRIMARY
select id from sale_order where state = 'cancel' order by id limit 200;
-- Run on SIDECAR (same query)
select id from sale_order where state = 'cancel' order by id limit 200;
For larger diffs, export key rows from sidecar and prepare replay SQL.
# Example: export rows from sidecar before incident
pg_dump "$SIDECAR_DB_URI" \
--data-only \
--table=public.sale_order \
--column-inserts \
--file=/tmp/sale_order_repair.sql
Prefer constrained repairs (WHERE id in (...)) over full-table reimports.
Step 5 - Execute surgical repair on primary
- Open explicit transaction.
- Apply smallest possible data fix.
- Validate row counts and business invariants before commit.
begin;
-- Example pattern: restore only known bad IDs
update sale_order
set state = src.state,
amount_total = src.amount_total,
write_date = now()
from repair_sale_order src
where sale_order.id = src.id;
-- Sanity checks before commit
select count(*) from sale_order where id in (select id from repair_sale_order);
commit;
If using generated SQL file, run in controlled chunks and checkpoint verification between chunks.
Step 6 - Verification gates before reopening traffic
6.1 Data correctness checks
-- Example invariant checks
select state, count(*)
from sale_order
group by state
order by count(*) desc;
select count(*)
from account_move
where state = 'posted' and name is null;
6.2 Application checks
odoocli logs tail --service odoo --since 15m --grep "Traceback|ERROR|constraint|serialization|deadlock"
Verify with business owners:
- Missing records restored for sample customers/orders/invoices.
- Critical workflows (confirm sale, validate delivery, post invoice, register payment) are green.
6.3 Controlled ramp-up
odoocli scale --service odoo-cron --replicas 1
odoocli scale --service odoo-worker --replicas 2
Increase gradually while monitoring errors and write throughput.
Rollback / backout plan
If surgical repair introduces new inconsistency:
- Stop further repair batches.
- Re-disable non-essential writers.
- Roll back open transaction (or restore from pre-repair snapshot if already committed).
- Recompute repair dataset from sidecar with narrower predicates.
Always keep:
- Original incident timestamp and
T_recoverrationale. - SQL scripts used for repair.
- Before/after validation query outputs.
Hardening and prevention checklist
- Enforce least-privilege DB roles for integrations; remove broad write access.
- Add approval gates for server actions and bulk updates in production.
- Alert on abnormal row-count deltas for critical Odoo tables.
- Enable and retain PostgreSQL logs useful for forensic reconstruction.
- Rehearse sidecar PITR drills quarterly (not just full-cluster restore drills).
- Require dry-run + row-count preview for destructive maintenance scripts.
- Keep tested runbooks for both partial logical recovery and full rewind decisions.
References
- PostgreSQL continuous archiving and PITR: https://www.postgresql.org/docs/current/continuous-archiving.html
- PostgreSQL monitoring statistics (
pg_stat_activity, table stats): https://www.postgresql.org/docs/current/monitoring-stats.html - PostgreSQL server signaling functions (
pg_cancel_backend,pg_terminate_backend): https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-SIGNAL - Odoo deployment/operations documentation: https://www.odoo.com/documentation/17.0/administration/on_premise/deploy.html
Core principle: never fix accidental data writes by guessing on primary; recover truth from a known-good point in time, patch surgically, and verify with explicit gates.