Sat Mar 21 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Odoo PostgreSQL Accidental Data Change and PITR Recovery Runbook

Production-safe runbook for containing accidental UPDATE/DELETE incidents in Odoo, recovering lost rows via point-in-time restore to a sidecar, and validating clean rollback.

Accidental mass updates or deletes happen in real systems: a bad server action, unsafe manual SQL, broken integration sync, or a faulty migration script.

This runbook is optimized for minimum additional damage: contain writes fast, recover data from a point-in-time clone, repair surgically, and verify before resuming full traffic.

Scope: Odoo on PostgreSQL where unintended writes already hit production.

Incident signals

Treat this as active incident response when one or more are true:

Sudden user reports of missing records or unexpectedly changed statuses.
Row counts in critical tables drop/spike outside normal business patterns.
Odoo logs show a burst of writes from a cron/integration user around one timestamp.
PostgreSQL shows abnormal write volume and large transaction commits.

odoocli logs tail --service odoo --since 30m --grep "write\(|unlink\(|cron|server action|Traceback"

-- Quick row-count health check on common critical tables
select 'res_partner' as table_name, count(*) from res_partner
union all
select 'sale_order', count(*) from sale_order
union all
select 'account_move', count(*) from account_move
union all
select 'stock_move', count(*) from stock_move;

Step 0 - Immediate containment (stop the bleeding)

Freeze deploys and schema changes.
Pause non-essential writers (cron, queue workers, external integrations).
Keep one operator on DB commands; one on incident timeline and decision log.
Preserve evidence before cleanup.

# Example containment pattern (adapt service names to your stack)
odoocli scale --service odoo-cron --replicas 0
odoocli scale --service odoo-worker --replicas 1

If writes are still unsafe, move app traffic to temporary read-only mode at application edge (maintenance page for write endpoints).

Step 1 - Identify blast radius and incident window

1.1 Find suspicious active sessions and recent write statements

select pid,
       usename,
       application_name,
       client_addr,
       now() - query_start as runtime,
       state,
       left(query, 220) as query
from pg_stat_activity
where datname = current_database()
  and state <> 'idle'
order by query_start asc;

If the bad session is still running:

-- Prefer cancel first
select pg_cancel_backend(<pid>);

-- Escalate only if cancellation fails and data damage continues
select pg_terminate_backend(<pid>);

1.2 Estimate when damage started

Use write_date on affected Odoo models to bound the incident window.

-- Example for sales records unexpectedly modified
select min(write_date) as earliest_changed,
       max(write_date) as latest_changed,
       count(*) as changed_rows
from sale_order
where write_date >= now() - interval '2 hours';

Capture a candidate recovery target time T_recover as the last known good minute before the suspicious write burst.

Step 2 - Choose safe recovery strategy

Use this order of preference:

Targeted logical repair from PITR sidecar clone (preferred for partial corruption).
Full database PITR cutback (only when corruption is broad and RTO allows full rewind).

For most Odoo production incidents, do not rewind the primary immediately. Recover to a sidecar first, then patch surgically.

Step 3 - Build a PITR sidecar at T_recover

Create a separate restored PostgreSQL instance from base backup + WAL archive with:

recovery_target_time = '<T_recover>'
recovery_target_action = 'pause'

Example (conceptual PostgreSQL config on sidecar):

restore_command = 'cp /mnt/wal-archive/%f %p'
recovery_target_time = '2026-03-21 17:42:00+00'
recovery_target_action = 'pause'

After startup, verify sidecar recovery point:

select pg_is_in_recovery(), now();

Then expose sidecar read-only for diff and export.

Step 4 - Diff current primary vs recovered sidecar

Compare counts and sample IDs on affected tables.

-- Run on PRIMARY
select id from sale_order where state = 'cancel' order by id limit 200;

-- Run on SIDECAR (same query)
select id from sale_order where state = 'cancel' order by id limit 200;

For larger diffs, export key rows from sidecar and prepare replay SQL.

# Example: export rows from sidecar before incident
pg_dump "$SIDECAR_DB_URI" \
  --data-only \
  --table=public.sale_order \
  --column-inserts \
  --file=/tmp/sale_order_repair.sql

Prefer constrained repairs (WHERE id in (...)) over full-table reimports.

Step 5 - Execute surgical repair on primary

Open explicit transaction.
Apply smallest possible data fix.
Validate row counts and business invariants before commit.

begin;

-- Example pattern: restore only known bad IDs
update sale_order
set state = src.state,
    amount_total = src.amount_total,
    write_date = now()
from repair_sale_order src
where sale_order.id = src.id;

-- Sanity checks before commit
select count(*) from sale_order where id in (select id from repair_sale_order);

commit;

If using generated SQL file, run in controlled chunks and checkpoint verification between chunks.

Step 6 - Verification gates before reopening traffic

6.1 Data correctness checks

-- Example invariant checks
select state, count(*)
from sale_order
group by state
order by count(*) desc;

select count(*)
from account_move
where state = 'posted' and name is null;

6.2 Application checks

odoocli logs tail --service odoo --since 15m --grep "Traceback|ERROR|constraint|serialization|deadlock"

Verify with business owners:

Missing records restored for sample customers/orders/invoices.
Critical workflows (confirm sale, validate delivery, post invoice, register payment) are green.

6.3 Controlled ramp-up

odoocli scale --service odoo-cron --replicas 1
odoocli scale --service odoo-worker --replicas 2

Increase gradually while monitoring errors and write throughput.

Rollback / backout plan

If surgical repair introduces new inconsistency:

Stop further repair batches.
Re-disable non-essential writers.
Roll back open transaction (or restore from pre-repair snapshot if already committed).
Recompute repair dataset from sidecar with narrower predicates.

Always keep:

Original incident timestamp and T_recover rationale.
SQL scripts used for repair.
Before/after validation query outputs.

Hardening and prevention checklist

Enforce least-privilege DB roles for integrations; remove broad write access.
Add approval gates for server actions and bulk updates in production.
Alert on abnormal row-count deltas for critical Odoo tables.
Enable and retain PostgreSQL logs useful for forensic reconstruction.
Rehearse sidecar PITR drills quarterly (not just full-cluster restore drills).
Require dry-run + row-count preview for destructive maintenance scripts.
Keep tested runbooks for both partial logical recovery and full rewind decisions.

References

PostgreSQL continuous archiving and PITR: https://www.postgresql.org/docs/current/continuous-archiving.html
PostgreSQL monitoring statistics (pg_stat_activity, table stats): https://www.postgresql.org/docs/current/monitoring-stats.html
PostgreSQL server signaling functions (pg_cancel_backend, pg_terminate_backend): https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-SIGNAL
Odoo deployment/operations documentation: https://www.odoo.com/documentation/17.0/administration/on_premise/deploy.html

Core principle: never fix accidental data writes by guessing on primary; recover truth from a known-good point in time, patch surgically, and verify with explicit gates.

Back to blog