pgsync Tunnel Workflow Specification

Status: Draft v1 Purpose: Define the tunneled production-to-development Postgres sync workflow clearly enough that it can be rebuilt in another codebase, shell, or orchestration layer without reference to the current Bash script.


1. Problem Statement

Developers need to pull a subset of live production Postgres data into a local development database on demand. The production database is deployed on a private network (Hatchbox-hosted Postgres) and is not reachable directly from a developer laptop. A secure, reproducible, one-command workflow is required that:

  • establishes a temporary network path from the developer’s laptop to the private production database
  • performs a selective table-level sync (not a full dump/restore)
  • tears the network path down on exit, success or failure
  • keeps credentials out of version control
  • handles Postgres foreign key and concurrency pitfalls that arise when syncing a relational schema in parallel

The feature owns the full local experience of “refresh my dev DB from prod.” It does not own how production Postgres is provisioned, how credentials are issued, or how production backups are taken.


2. Goals and Non-Goals

2.1 Goals

  • G1. Provide a single developer-facing command that syncs any subset of production tables into the local dev DB.
  • G2. Route all production DB traffic through an SSH tunnel terminating on a jump host that has network access to the private Postgres instance.
  • G3. Fail fast and loudly if the tunnel cannot be established (no silent fallback to a direct, failed connection).
  • G4. Guarantee the tunnel is torn down whether the sync succeeds, fails, or is interrupted.
  • G5. Keep all secrets (DB URLs, credentials) out of the repository.
  • G6. Survive Postgres foreign-key-ordering errors and truncate deadlocks inherent to parallel whole-schema sync on a typical Rails schema.
  • G7. Make the tunnel topology legible to the operator each run (what is forwarded to where, and why).
  • G8. Allow the operator to override SSH target, local port, and remote host/port via environment variables without editing the script.

2.2 Non-Goals

  • NG1. Not a bi-directional sync. The destination (dev) is always overwritten from the source (prod).
  • NG2. Not a schema migration tool. The destination schema must already match the source schema.
  • NG3. Not a backup mechanism. No snapshot, archival, or restore point is produced.
  • NG4. Not a production-to-production or staging-to-staging sync. Destination is assumed to be a developer’s local database.
  • NG5. Not responsible for installing pgsync, ssh, or Postgres client libraries.
  • NG6. Not responsible for provisioning or rotating the SSH alias used as the jump host.
  • NG7. No support for data scrubbing or PII redaction in this version (see Open Questions).

3. System Overview

3.1 Main Components

  1. Tunnel launcher (wrapper script). Orchestrates environment loading, SSH tunnel creation, pgsync invocation, and cleanup. Concrete instance: bin/pgsync-tunnel.
  2. Sync configuration file. Declares tables to exclude and named table groups. Concrete instance: .pgsync.yml.
  3. Environment file. Supplies source and destination database URLs plus tunnel parameters. Concrete instance: .env.pgsync (gitignored); .env.pgsync.example serves as the committed template.
  4. Underlying sync engine. A third-party tool that performs the actual data copy. Concrete instance: pgsync (Ankane, v0.8.x).
  5. SSH jump host alias. An entry in the operator’s SSH configuration that resolves to the production app server and has credentials, hostkeys, and ProxyJump rules preconfigured. Referenced by alias name only (default: hatchbox).

3.2 Abstraction Levels

  • Network layer: SSH local port forwarding establishes a TCP bridge from localhost:<local_port> on the laptop to <remote_host>:<remote_port> reachable from the jump host.
  • Credential layer: The environment file supplies a source URL whose host is the laptop-local tunnel endpoint (localhost:<local_port>), not the real production host. This keeps the sync tool unaware of tunneling.
  • Sync layer: The sync engine operates purely on two Postgres URLs and a declarative include/exclude/groups config.
  • Orchestration layer: The launcher composes the three above and guarantees teardown via a shell trap.

This separation is deliberate: the sync engine must remain substitutable, and the tunnel must be invisible to it.

3.3 External Dependencies

  • An SSH client on the developer machine.
  • A preconfigured SSH alias that authenticates non-interactively (typically key-based) to a host with private network access to production Postgres.
  • The pgsync CLI installed and on PATH (confirmed by code with v0.8.1 at time of spec).
  • A running local Postgres instance with a database matching the destination URL.
  • Read access on the production database sufficient for SELECT across all non-excluded tables.
  • Write access on the local development database sufficient for TRUNCATE, INSERT, and deferred-FK manipulation.

4. Core Domain Model

4.1 Entities

The feature is data-manipulation only; it has no domain entities of its own. Its contract is defined over two external entities:

  • Source Database. A Postgres instance, reachable only through the tunnel. Read-only from this feature’s perspective. Identified by PGSYNC_FROM_URL.
  • Destination Database. A local Postgres instance. Fully read/write from this feature’s perspective. Identified by PGSYNC_TO_URL.
  • Table Group. A named list of table names, declared in the sync configuration under a groups: key. Invoked as group:<name> on the command line.
  • Exclude Pattern. A table name or glob (solid_queue_* style) declared in the sync configuration under exclude: and never synced.

4.2 State and Lifecycle

The launcher passes through a strict linear lifecycle on each invocation:

  READY  ─▶  ENV-LOADED  ─▶  TUNNEL-OPEN  ─▶  SYNCING  ─▶  TEARDOWN  ─▶  DONE
    │           │               │              │
    │           │               │              └──(on success or failure)──▶ TEARDOWN
    │           │               └──(on ssh failure)──▶ EXIT (non-zero, no teardown needed)
    │           └──(on missing env file)──▶ EXIT (non-zero)
    └──(process starts)
  • READY → ENV-LOADED: Environment file is sourced. Absence of the file is a fatal error with an instructional message.
  • ENV-LOADED → TUNNEL-OPEN: SSH is invoked in background-forward mode. The launcher MUST NOT proceed if the forward fails.
  • TUNNEL-OPEN → SYNCING: Sync engine is launched in the foreground.
  • Any → TEARDOWN: A shell trap kills the background SSH process on every exit path, including SIGINT.

5. Feature Contract

5.1 Entrypoints and Interfaces

5.1.1 Command Line

A single command:

<launcher> [pgsync_args...]

Accepted forms for pgsync_args:

  • (empty): sync all non-excluded tables.
  • <table_name> [<table_name>...]: sync only the named tables.
  • group:<group_name>: sync a named table group declared in the sync configuration.
  • Any flag accepted by the underlying sync engine (passed through verbatim).

Reserved by the launcher (MUST NOT be overridden by caller arguments):

  • --from
  • --to
  • --defer-constraints

5.1.2 Environment Variables

Required:

  • PGSYNC_FROM_URL — Postgres URL pointing at the tunnel endpoint. The host/port in this URL MUST match the local tunnel entrance, not the real production host.
  • PGSYNC_TO_URL — Postgres URL pointing at the local destination database.

Optional (with documented defaults):

  • PGSYNC_SSH_HOST — SSH alias or target. Default: hatchbox.
  • PGSYNC_LOCAL_PORT — Laptop-local port the tunnel listens on. Default: 5433.
  • PGSYNC_REMOTE_HOST — Host, as resolvable from the jump host, of the production Postgres server. Default: localhost.
  • PGSYNC_REMOTE_PORT — Port of the production Postgres server on the remote side. Default: 5432.

All variables must be resolvable at the point of tunnel construction; therefore the environment file MUST be sourced before defaults are computed.

5.2 Input and Output Semantics

Inputs:

  • Positional arguments: table names, group references, or sync-engine flags.
  • Environment variables (see 5.1.2).
  • Sync configuration file at a well-known path.

Outputs:

  • Human-readable diagnostic output on stdout describing the tunnel topology, the exact SSH command being run, the background PID, and the sync engine’s own per-table progress.
  • Side effect: the destination database’s non-excluded, in-scope tables are truncated and repopulated from the source database.
  • Exit code: 0 on successful sync; non-zero on environment-file-missing, tunnel-open failure, or any sync-engine failure.

Observable side effects:

  • A transient ssh -f -N -L ... process lives for the duration of the sync and is killed on exit.
  • A listening TCP socket on localhost:<local_port> exists for the tunnel’s lifetime.
  • Destination database contents are modified.

5.3 Core Behavioral Rules

  • B1. The launcher MUST NOT read credentials from any source other than the environment file (no prompts, no keychains, no hard-coded values).
  • B2. The launcher MUST load the environment file before computing any default that could be overridden by the file.
  • B3. The SSH tunnel MUST be opened with a directive equivalent to ExitOnForwardFailure=yes so that an unreachable remote port causes tunnel establishment to fail rather than appear to succeed.
  • B4. The launcher MUST invoke the sync engine with an equivalent of --defer-constraints to avoid FK violations and truncate deadlocks during parallel table sync.
  • B5. The launcher MUST register a teardown hook that kills the backgrounded SSH process on any exit path (success, failure, or signal).
  • B6. The source URL passed to the sync engine MUST resolve to the laptop-local tunnel endpoint. A source URL whose host is the real production address is out of contract.
  • B7. The sync configuration MUST be honored for excludes and groups. Tables matching an exclude pattern MUST NOT be synced even if named on the command line (confirmed by sync engine behavior).
  • B8. The launcher MUST print, before opening the tunnel, a topology description sufficient for the operator to understand what endpoints are involved and why.
  • B9. Missing environment file MUST produce an actionable, non-zero-exit error pointing the operator at the template file.

5.4 Permissions and Access Control

  • Access is gated by two credentials the operator must already hold:
    • SSH key authorized for the jump host alias.
    • Production Postgres username/password with sufficient read access.
  • The feature defines no additional authorization layer and does not validate that the operator “should” be pulling production data. This is assumed to be enforced upstream (SSH key provisioning, Postgres role grants).
  • The destination database is implicitly trusted to belong to the operator.

5.5 Failure Handling and Error Surface

Failure mode Required behavior
Environment file missing Exit non-zero with a message naming the expected path and the template file. No tunnel attempt.
SSH alias unresolvable or auth fails Exit non-zero via the SSH client. No sync attempt. Teardown trivially safe.
Remote port unreachable Tunnel establishment fails (ExitOnForwardFailure). Exit non-zero. No sync attempt.
Local port already in use Tunnel establishment fails. Exit non-zero with SSH’s error.
Sync engine returns non-zero Launcher propagates the non-zero exit code. Teardown MUST still run.
Operator interrupts (SIGINT) Teardown MUST still run. Background SSH MUST be killed.
Schema mismatch between source and destination Delegated to sync engine; surfaces as a per-table error. Other tables may still succeed.
FK violation during parallel sync Prevented by required use of deferred constraints.
Truncate deadlock during parallel sync Prevented by required use of deferred constraints.

6. Configuration and Runtime Assumptions

6.1 Configuration Inputs

Source Purpose Required Default
.env.pgsync Secrets and tunnel parameters Yes
.env.pgsync.example Committable template Yes (in repo)
.pgsync.yml Includes, excludes, groups Yes
SSH config (operator-maintained) Jump host alias definition Yes
CLI arguments Ad-hoc table or flag selection No sync all non-excluded

Precedence: Environment variables set by .env.pgsync override shell-inherited variables with the same names (the env file is sourced with set -a semantics). CLI arguments are appended after launcher-reserved flags; launcher-reserved flags take precedence on conflict (confirmed by code).

6.2 Runtime Assumptions

  • Operator machine is POSIX-like with a Bash-compatible shell capable of set -euo pipefail, trap, and pgrep.
  • ssh, pgsync, and psql-compatible Postgres client libraries are on PATH.
  • The operator’s SSH agent is unlocked or the key is passphraseless for the jump host alias.
  • The local destination database exists and its schema matches the source.
  • The local user has permission to bind the chosen local port.
  • Production is configured to accept Postgres connections from the jump host across the private network.

7. Observability and Operations

  • All diagnostics are emitted to stdout as plain text. There is no structured logging, metrics, or telemetry.
  • The topology banner printed before each run is the primary operational artifact: it discloses SSH target, local port, remote host, and remote port as resolved at runtime.
  • The backgrounded SSH process’s PID is printed so the operator can manually clean up if the launcher is killed ungracefully (e.g., kill -9 of the launcher itself, which bypasses traps).
  • No audit log of what was synced when is produced. (See Open Questions.)

8. Performance Characteristics

  • Throughput is bounded by the SSH tunnel’s effective bandwidth and the sync engine’s parallelism.
  • The sync engine runs tables in parallel by default; deferred constraints are the required mechanism for making that parallelism safe on an FK-dense schema.
  • Expect roughly single-digit seconds for a typical Rails schema of hundreds of rows; minutes for tables with millions of rows. No caching or incremental-diff strategy is in scope for this version.
  • The tunnel adds SSH framing and encryption overhead but no intermediate buffering.

9. Evidence Examined

  • bin/pgsync-tunnel — launcher script (normative for behavior B1–B9).
  • .pgsync.yml — excludes and group definitions.
  • .env.pgsync.example — committed template; confirms variable names and host-swapping convention.
  • .gitignore — confirms .env.pgsync is ignored and .env.pgsync.example is exempt.
  • CLAUDE.md — Hatchbox deployment context.
  • Live terminal output of bin/pgsync-tunnel users and bin/pgsync-tunnel --defer-constraints confirming behaviors B3, B4, B5, B8.
  • pgsync 0.8.1 CLI behavior: confirmed that --from/--to accept full URLs, --defer-constraints resolves FK and deadlock classes observed on this schema, ERB/$VAR expansion is NOT supported in the YAML file in this version.

10. Open Questions and Ambiguities

  • OQ1. Should the spec require PII/secret scrubbing before the data lands in dev? Currently none is specified. If regulated data exists in prod, this is a gap.
  • OQ2. Is there a minimum pgsync version required, or should the spec be loose? The FK deferral flag’s exact semantics differ across versions (--defer-constraints vs --defer-constraints-v1/v2).
  • OQ3. Should the launcher log each run (timestamp, operator, tables synced) to a persistent file for audit? Not currently done.
  • OQ4. Should the launcher refuse to run if the destination URL appears to point at anything other than localhost, as a guardrail against syncing prod-to-prod by misconfiguration?
  • OQ5. Windows support: the current launcher uses Bash and pgrep. Is WSL an acceptable baseline or must native Windows be supported?
  • OQ6. The cleanup uses pgrep to locate the backgrounded SSH PID by matching its command line. If two launcher runs overlap with the same local port, the PID match may be ambiguous. In practice the same port would already have caused one to fail; still, worth confirming.

11. Implementation Notes (Non-Normative)

These reflect the current Bash implementation and are offered to aid rebuilds, not to constrain them.

  • The script uses set -euo pipefail to make any unhandled failure fatal.
  • set -a; source "$ENV_FILE"; set +a is used to auto-export every variable from the env file into the process environment.
  • The SSH invocation uses -f -N -o ExitOnForwardFailure=yes -L <local_port>:<remote_host>:<remote_port> <ssh_host>. -f forks to background after authentication; -N requests no remote command; -L defines the local port forward.
  • PID resolution uses pgrep -f on the literal SSH command. A rebuild in another language could instead retain the PID returned by a non--f variant and manage backgrounding itself.
  • The script deliberately does not redirect sync engine output; the engine’s native per-table progress reporting is the operator’s primary progress signal.
  • In-repo files are named bin/pgsync-tunnel, .pgsync.yml, .env.pgsync, .env.pgsync.example. Nothing in the contract requires these exact names.

12. Out of Scope

  • Setting up the SSH alias (~/.ssh/config entry, key distribution, ProxyJump rules).
  • Installing pgsync, ssh, or Postgres.
  • Provisioning or schema-migrating the destination database.
  • Provisioning or securing the source (production) database.
  • Any UI, dashboard, or web-based wrapper around the sync.
  • Data masking, PII redaction, sampling, or row-level filtering.
  • Continuous/scheduled sync. This is an on-demand operator tool.
  • Sync direction other than prod → local dev.