Files
fleet-dotfiles-template/docs/architecture.md
T

89 lines
8.7 KiB
Markdown

# Fleet sync architecture
This template provides a two-way dotfile sync across N macOS machines. Every change you make on any machine propagates to the others within ~7 minutes. There's no central server — each machine is a peer.
Underlying technologies: [`chezmoi`](https://www.chezmoi.io) ([reference](https://www.chezmoi.io/reference/)) for templating + applying dotfiles, [`age`](https://github.com/FiloSottile/age) for encrypting secrets at rest, and [launchd](https://www.launchd.info/) for the daemons that fire on file change and on a 5-minute timer.
## Three moving parts
```
┌─────────────────────────────────────────────────────────────────────────┐
│ Machine A Machine B │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Watcher │ on file change │ Watcher │ │
│ │ (launchd) │ ─────┐ │ (launchd) │ │
│ └──────────────┘ │ └──────────────┘ │
│ │ │ │ │
│ │ ▼ │ │
│ │ chezmoi-auto-sync.sh │ │
│ │ • git pull --rebase │ │
│ │ • chezmoi add <managed> │ │
│ │ • git commit + push ──────────► forge (gitea/github) │
│ │ │ │ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Puller │ every 5 min: │ Puller │ every 5 min: │
│ │ (launchd) │ chezmoi update │ (launchd) │ chezmoi update │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
```
### Watcher (`com.chezmoi.claude-watcher.plist`)
Launchd's `WatchPaths` fires the watcher script (`~/.local/bin/chezmoi-auto-sync.sh`) within ~2 seconds of any change to a watched path. The script:
1. Acquires a lockfile (prevents concurrent runs from racing).
2. Sleeps 2 s to let batch saves settle.
3. `git pull --rebase` against the forge to incorporate any updates that landed since.
4. `chezmoi add` for each path on the managed list (a hardcoded set of `chezmoi add` lines in the script).
5. If chezmoi's autoCommit didn't pick up everything (e.g., direct edits inside `~/.local/share/chezmoi/docs/`), a `git add -A; git commit; git push` fallback catches them.
The watched paths are listed in the plist's `WatchPaths` array. Adding a new tracked path: edit the plist template AND the `chezmoi add` block in the script — both are chezmoi-managed and propagate fleet-wide.
### Puller (`com.chezmoi.claude-puller.plist`)
Runs `chezmoi update --force` every 5 minutes. `update` is `pull` + `apply`: it fetches the forge repo, then materializes any new content to the live disk paths. The `--force` skips interactive prompts on conflicts (the watcher's `git pull --rebase` upstream is supposed to keep machines in lockstep, so conflicts should be rare — when they happen, the puller wins with the source's version).
### Pull-fleet (`com.taskdurations.pull-fleet.plist`)
Optional, if you use the bundled task-durations system. Runs `pull-fleet.sh` every 5 minutes, which mesh-rsyncs each peer's `local.parquet` into a Hive-partitioned tree, so `estimate.sh --fleet` can union across the whole fleet. See [task-durations' own architecture doc](https://gitea.tojo.team/cardinale/task-durations/src/branch/main/docs/fleet-architecture.md) for details.
## Why this shape
| Choice | Why |
|---|---|
| Two daemons (watcher + puller), not one | The watcher is event-driven (instant push); the puller is timer-driven (eventual pull). Different cadences, different jobs. |
| Forge in the middle, not direct mesh | One git server is dead-simple to reason about; conflicts resolve via `git pull --rebase` semantics; offline machines just lag without breaking the others. |
| `chezmoi add` per path (not `chezmoi re-add` on the whole tree) | Surgical — a watcher fire only commits the path that changed. |
| `run_onchange` to reload launchd | When a plist's rendered content changes, launchd needs an unload/load cycle. The hash-of-template trick in `run_onchange_after_reload-launchd-agents.sh.tmpl` re-runs the reload only when a plist actually changes. |
| chezmoi templates with `{{ .chezmoi.homeDir }}` and `{{ .chezmoi.hostname }}` | Lets the same source render correctly on machines with different usernames (e.g., `/Users/alice` on one, `/Users/bob` on another). |
| age encryption for secrets | Decoupled from chezmoi; one private key per machine; secrets file is a single env-style flat file that's encrypted-at-rest in the source repo and decrypted-on-apply at runtime. |
| `modify_` script for `authorized_keys` | Preserves machine-local entries (e.g., GitHub keys) while ensuring fleet pubkeys are always present. Runs on every apply. |
## Why NOT alternatives
- **Dropbox / iCloud Drive / sync.com / Resilio:** great for documents, terrible for `~/.claude/` and dotfiles. Path conflicts, lock files, partial syncs, no encryption boundary, no version history when something breaks.
- **One mega `~/dotfiles` git repo with stow / GNU stow:** works for one user, but no per-machine templating (HOME path differences, hostname-keyed conditions) and no encrypted secret support.
- **Ansible push from a central machine:** reliable but heavyweight. Requires the orchestrator to be online; you can't iterate from a laptop while the orchestrator is asleep.
- **NixOS / nix-darwin:** awesome but a much bigger commitment than chezmoi. Makes sense if you're already running Nix.
- **Tailscale Funnel + a central API:** introduces a new dependency for something git-over-SSH already does.
## Failure modes and what happens
| Failure | Effect |
|---|---|
| Forge offline | Watcher's push fails; commit stays local. Puller's pull fails; live state stays at last-applied. Both retry on next event/tick. |
| Two machines edit the same file simultaneously | Whichever pushes first wins; the second's `git pull --rebase` rebases its commit on top. If git can't auto-rebase, the watcher logs `WARNING: git pull --rebase failed`. Manual fix in `$(chezmoi source-path)`. |
| `chezmoi update --force` overwrites an in-flight local edit | The watcher's debounce + lockfile makes this rare, but possible. The "managed list" is the contract: anything in the list is sync-managed; anything outside is local-only and won't be touched. |
| External skill repo (`.chezmoiexternal.toml`) is unreachable | Single-line failure; chezmoi reports `exit status 1` but other paths still apply. Switch the entry from HTTPS to SSH (or vice versa) if it's an auth issue. |
| Age private key compromised | All encrypted files in the source repo are now decryptable by the holder. **Regenerate**: new keypair, decrypt + re-encrypt secrets with the new public key, distribute new private key to fleet via secure channel, force-rotate any tokens that were inside the secrets file. |
## Adding a new tracked path
1. Edit `~/.local/bin/chezmoi-auto-sync.sh`: append a `chezmoi add ~/path/to/new 2>> "$LOG" || true` line in the `chezmoi add` block.
2. If the path is outside the watcher's existing `WatchPaths`, edit the watcher plist template at `private_Library/LaunchAgents/com.chezmoi.claude-watcher.plist.tmpl` to add it.
3. Run `chezmoi add ~/path/to/new` once manually to seed the chezmoi source with current content (otherwise the next puller cycle will overwrite your live file with whatever empty/stale content was in the source).
4. The watcher script and plists are themselves chezmoi-managed, so the change propagates to the fleet within ~7 minutes.