Files
fleet-dotfiles-template/docs/architecture.md
T
Anthony Cardinale ebccdda936 Initial public release
A chezmoi-based fleet-dotfiles template for macOS workstations:

- Two-way auto-sync via launchd watcher + 5-min puller
- Mesh SSH via modify_authorized_keys driven by .chezmoidata/fleet.yaml
- age-encrypted secrets file
- Bundled Claude Code agentic team (11 agents) + /lite + /lite-sub commands
- Verify-before-claiming Stop hook
- Generic statusline + project-boundary validate-path hook
- Reference launchd plist for cross-fleet task-durations aggregation
  (companion repo: gitea.tojo.team/cardinale/task-durations)
- AGENTS.md walks an agent through the entire setup Q&A interactively
- docs/ covers architecture, security model, fleet onboarding
2026-05-02 17:26:32 -04:00

8.4 KiB

Fleet sync architecture

This template provides a two-way dotfile sync across N macOS machines. Every change you make on any machine propagates to the others within ~7 minutes. There's no central server — each machine is a peer.

Three moving parts

┌─────────────────────────────────────────────────────────────────────────┐
│  Machine A                          Machine B                            │
│                                                                          │
│  ┌──────────────┐                   ┌──────────────┐                    │
│  │   Watcher    │   on file change  │   Watcher    │                    │
│  │  (launchd)   │ ─────┐            │  (launchd)   │                    │
│  └──────────────┘      │            └──────────────┘                    │
│         │              │                   │                             │
│         │              ▼                   │                             │
│         │      chezmoi-auto-sync.sh        │                             │
│         │      • git pull --rebase         │                             │
│         │      • chezmoi add <managed>     │                             │
│         │      • git commit + push  ──────────► forge (gitea/github)     │
│         │                                  │              │              │
│         │                                  │              │              │
│         ▼                                  ▼              ▼              │
│  ┌──────────────┐                   ┌──────────────┐                    │
│  │    Puller    │  every 5 min:     │    Puller    │  every 5 min:      │
│  │  (launchd)   │  chezmoi update   │  (launchd)   │  chezmoi update    │
│  └──────────────┘                   └──────────────┘                    │
└─────────────────────────────────────────────────────────────────────────┘

Watcher (com.chezmoi.claude-watcher.plist)

Launchd's WatchPaths fires the watcher script (~/.local/bin/chezmoi-auto-sync.sh) within ~2 seconds of any change to a watched path. The script:

  1. Acquires a lockfile (prevents concurrent runs from racing).
  2. Sleeps 2 s to let batch saves settle.
  3. git pull --rebase against the forge to incorporate any updates that landed since.
  4. chezmoi add for each path on the managed list (a hardcoded set of chezmoi add lines in the script).
  5. If chezmoi's autoCommit didn't pick up everything (e.g., direct edits inside ~/.local/share/chezmoi/docs/), a git add -A; git commit; git push fallback catches them.

The watched paths are listed in the plist's WatchPaths array. Adding a new tracked path: edit the plist template AND the chezmoi add block in the script — both are chezmoi-managed and propagate fleet-wide.

Puller (com.chezmoi.claude-puller.plist)

Runs chezmoi update --force every 5 minutes. update is pull + apply: it fetches the forge repo, then materializes any new content to the live disk paths. The --force skips interactive prompts on conflicts (the watcher's git pull --rebase upstream is supposed to keep machines in lockstep, so conflicts should be rare — when they happen, the puller wins with the source's version).

Pull-fleet (com.taskdurations.pull-fleet.plist)

Optional, if you use the bundled task-durations system. Runs pull-fleet.sh every 5 minutes, which mesh-rsyncs each peer's local.parquet into a Hive-partitioned tree, so estimate.sh --fleet can union across the whole fleet. See task-durations' own architecture doc for details.

Why this shape

Choice Why
Two daemons (watcher + puller), not one The watcher is event-driven (instant push); the puller is timer-driven (eventual pull). Different cadences, different jobs.
Forge in the middle, not direct mesh One git server is dead-simple to reason about; conflicts resolve via git pull --rebase semantics; offline machines just lag without breaking the others.
chezmoi add per path (not chezmoi re-add on the whole tree) Surgical — a watcher fire only commits the path that changed.
run_onchange to reload launchd When a plist's rendered content changes, launchd needs an unload/load cycle. The hash-of-template trick in run_onchange_after_reload-launchd-agents.sh.tmpl re-runs the reload only when a plist actually changes.
chezmoi templates with {{ .chezmoi.homeDir }} and {{ .chezmoi.hostname }} Lets the same source render correctly on machines with different usernames (e.g., /Users/alice on one, /Users/bob on another).
age encryption for secrets Decoupled from chezmoi; one private key per machine; secrets file is a single env-style flat file that's encrypted-at-rest in the source repo and decrypted-on-apply at runtime.
modify_ script for authorized_keys Preserves machine-local entries (e.g., GitHub keys) while ensuring fleet pubkeys are always present. Runs on every apply.

Why NOT alternatives

  • Dropbox / iCloud Drive / sync.com / Resilio: great for documents, terrible for ~/.claude/ and dotfiles. Path conflicts, lock files, partial syncs, no encryption boundary, no version history when something breaks.
  • One mega ~/dotfiles git repo with stow / GNU stow: works for one user, but no per-machine templating (HOME path differences, hostname-keyed conditions) and no encrypted secret support.
  • Ansible push from a central machine: reliable but heavyweight. Requires the orchestrator to be online; you can't iterate from a laptop while the orchestrator is asleep.
  • NixOS / nix-darwin: awesome but a much bigger commitment than chezmoi. Makes sense if you're already running Nix.
  • Tailscale Funnel + a central API: introduces a new dependency for something git-over-SSH already does.

Failure modes and what happens

Failure Effect
Forge offline Watcher's push fails; commit stays local. Puller's pull fails; live state stays at last-applied. Both retry on next event/tick.
Two machines edit the same file simultaneously Whichever pushes first wins; the second's git pull --rebase rebases its commit on top. If git can't auto-rebase, the watcher logs WARNING: git pull --rebase failed. Manual fix in $(chezmoi source-path).
chezmoi update --force overwrites an in-flight local edit The watcher's debounce + lockfile makes this rare, but possible. The "managed list" is the contract: anything in the list is sync-managed; anything outside is local-only and won't be touched.
External skill repo (.chezmoiexternal.toml) is unreachable Single-line failure; chezmoi reports exit status 1 but other paths still apply. Switch the entry from HTTPS to SSH (or vice versa) if it's an auth issue.
Age private key compromised All encrypted files in the source repo are now decryptable by the holder. Regenerate: new keypair, decrypt + re-encrypt secrets with the new public key, distribute new private key to fleet via secure channel, force-rotate any tokens that were inside the secrets file.

Adding a new tracked path

  1. Edit ~/.local/bin/chezmoi-auto-sync.sh: append a chezmoi add ~/path/to/new 2>> "$LOG" || true line in the chezmoi add block.
  2. If the path is outside the watcher's existing WatchPaths, edit the watcher plist template at private_Library/LaunchAgents/com.chezmoi.claude-watcher.plist.tmpl to add it.
  3. Run chezmoi add ~/path/to/new once manually to seed the chezmoi source with current content (otherwise the next puller cycle will overwrite your live file with whatever empty/stale content was in the source).
  4. The watcher script and plists are themselves chezmoi-managed, so the change propagates to the fleet within ~7 minutes.