Files
screenshot-rename/SKILL.md
T
Anthony Cardinale 030a40aa4b add btime fallback, app-library exclusion, --year, --include-untagged
Behavior changes (all opt-in or safety-first):
- prep refuses to operate inside .photoslibrary, .lrlibrary, .aplibrary,
  .fcpbundle, .band, .logicx, .app, etc. unless --allow-app-libraries
- --year YYYY restricts to files whose embedded ts (or btime) starts with YYYY
- --include-untagged accepts hand-named image files (no CleanShot/Screenshot
  prefix) and dates them via stat btime → mtime fallback. Gated on the folder
  containing ≥10 tagged matches to prevent sweeping ~/Pictures or similar
- prep pre-pass auto-normalizes the missing-space typo
  ('foo barCleanShot 2026-...' → 'foo bar CleanShot 2026-...') by os.rename
- plan now iterates the desc-tsv contents instead of the full src dir, with
  alt-extension fallback for Haiku's occasional .jpg-instead-of-.png echo
- build_new_name supports app=None (untagged) — emits
  '<keywords> - <Description> - YYYY-MM-DD.ext'

SKILL.md: gotchas #14-17 documenting each new guard, run-order updated
with the new flags, common-mistakes table extended.

Verified by smoke test with seeded files: --year filter, --include-untagged
threshold gate, app-library refusal, and typo normalization all behave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 11:14:55 -04:00

203 lines
17 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: screenshot-rename
description: Use when renaming a folder of screenshots, images, or short clips with AI-generated descriptive names — particularly CleanShot exports or any directory of images named only by timestamp. Triggers on requests like "rename these screenshots based on their content", "describe each of these images and rename it", or batch rename of files by visual content.
---
# Screenshot Rename
## Overview
Rename a directory of timestamp-named images (PNG / GIF / MP4 / PDF) to include AI-generated content descriptions, dispatched as parallel Haiku subagents from this Claude Code session. Each rename has the form:
```
<original prefix> - <Title Cased Description> - <original timestamp>.<ext>
```
The pipeline is **prep → batch → describe (parallel agents) → validate plan → execute renames** with hard data-loss guards at every stage.
**Core principle:** *Plan in memory, validate exhaustively, then mutate the filesystem in a single pass with `os.rename` and pre-existence checks.* Never let `mv` overwrite — that's how you lose files.
## When to Use
- Renaming CleanShot / screenshot folders by content
- Any image batch where the source filenames are timestamps and the user wants them human-scannable
- ≥ ~10 files (otherwise just rename them inline)
- Files include PNG/GIF and optionally MP4 or PDF (pipeline handles all four)
- Both `CleanShot ...` and Apple `Screenshot ...` filename prefixes are recognized in the same pass
- Files with a leading user-typed keyword prefix (e.g. `jojo travel CleanShot 2026-...png`) are recognized; the keywords are preserved and merged into the new name
- Files already in the renamed form (`App - Description - timestamp.ext`) are detected and skipped — re-running the skill on a folder is safe and idempotent
- Hand-named files with no embedded timestamp (e.g. `flight to australia 1.png`) — pass `--include-untagged`. Date is taken from filesystem btime/mtime. Only allowed when the folder already contains ≥10 tagged screenshots, so we don't sweep up arbitrary photo libraries.
- Restrict to a single year with `--year YYYY` (matches embedded ts or btime).
**Don't use for:**
- Code or text files — vision isn't needed
- Files where the name pattern is already meaningful
- Single-file rename (just do it directly)
- App-managed image catalogs (Apple Photos `.photoslibrary`, Lightroom `.lrlibrary`, Aperture `.aplibrary`, Final Cut, etc.) — the pipeline refuses to run inside these by default. Override with `--allow-app-libraries` only if you know what you're doing.
## Workflow
```
1. Prep
├─ Extract first frame from each .mp4 (ffmpeg) and .pdf (sips) to /tmp/frames/<base>.jpg
├─ Resize every source image to max 1568px on long edge → /tmp/small/<base>.jpg
└─ Build manifest TSV: <small_image_path>\t<original_filename>
2. Batch
└─ Split manifest into N batches of ≤ 20 lines each (file: full-batch-NN)
3. Describe (parallel)
└─ Dispatch N Haiku subagents (model: "haiku") in a single message
Each agent: reads its batch manifest, uses Read on each image_path,
writes desc-full-NN.tsv with: <original_filename>\t<6-8 word description>
4. Plan (Python)
├─ Aggregate all desc-*.tsv into desc-all.tsv
├─ Validate every line: 6+ words, alnum+space only, source exists, target doesn't,
│ no plan-internal collisions
├─ Truncate descriptions to 8 words max, title-case
└─ Write plan-full.tsv: <original>\t<new_name>
5. Execute (Python, NEVER bash)
├─ Read plan, for each line: pre-check src exists & dst doesn't, then os.rename
├─ Audit before/after file count — must be equal
└─ Log failures, report ok/fail counts
```
## The Critical Gotchas (every one of these caused real pain)
1. **Read tool has an image-size cap.** Original Retina screenshots can exceed it. **Always downscale** to ≤ 1568px before handing to a subagent. Use `sips -Z 1568 -s format jpeg`.
2. **Vision API can't read .mp4 or multi-page .pdf directly.** Extract the first frame to a JPEG first (`ffmpeg -ss 1 -i in.mp4 -frames:v 1 out.jpg`, `sips -s format jpeg in.pdf --out out.jpg`).
3. **Bash regex with `[[ =~ ]]` + `BASH_REMATCH` does NOT work in zsh.** zsh uses `$match[1]` etc. instead. Pattern silently fails, target name becomes empty, multiple `mv`s collide on the same empty target, files vanish. **Use Python for any filename mutation.** No exceptions.
4. **`mv` silently overwrites.** A loop that constructs target names from a buggy parse will happily destroy your data. Use `mv -n` (no-clobber) in shell, or `os.rename` after `os.path.exists(dst)` check in Python. Never bare `mv`.
5. **Pre-flight the full plan in memory** before mutating the filesystem. Build a list of `(orig, new)` tuples; verify every `new` is unique within the plan, doesn't collide with anything in the destination directory, and that every `orig` exists. Only then start renaming.
6. **File-count audit.** Record `len(os.listdir(DEST))` before and after — must be equal. Any drop = data loss.
7. **iCloud-synced trees and Time Machine local snapshots:** files in the snapshot are *file-provider stubs*, not the bytes. `cat` / `cp` from a snapshot path inside an iCloud-synced folder returns "Operation timed out" with a 0-byte file. **External backups (Backblaze, Time Machine to a real disk) are the actual recovery source for iCloud data**, not local APFS snapshots.
8. **Bash background jobs in the Claude Code Bash tool can die silently.** A `while read` loop redirected from a file may exit immediately when run in the background. **Run renames foreground via Python** — it's the same code path locally and reliably runs to completion.
9. **Haiku occasionally returns the wrong filename extension** (the resized `.jpg` instead of the original `.png`). The plan-builder must accept that and try alternate extensions when the claimed source isn't found in the destination directory.
10. **Always preserve mp4/pdf source files** — the pipeline reads from the resized JPEG but renames the original mp4/pdf. Don't lose the source extension.
11. **macOS Screenshot files use U+202F (NARROW NO-BREAK SPACE) before AM/PM.** Apple's `Screenshot 2026-MM-DD at H.MM.SS PM.png` filenames have U+202F (not ASCII space) between the seconds and the meridiem marker. The Haiku subagent reliably normalizes it to ASCII space when echoing the filename into the desc TSV, so the desc-dictionary lookup fails silently — every Screenshot file is dropped from the plan with a misleading "NO_DESC" error and untouched on disk. **Fix:** normalize both sides of the lookup with `s.replace("", " ")` AND emit ASCII space in the new filename so the renamed file is keyboard-typable. Detect the offender with `python3 -c "import sys; print(repr(sys.argv[1]))" "filename"``` will appear in the output.
12. **Re-running the skill on a folder is safe iff the parser skips already-renamed files.** Without an `^App - .+ - timestamp\.ext$` skip rule, the parser will pile a second AI description into the name on every run. The pipeline detects and excludes these.
13. **Leading keyword prefix is part of the source signal.** When the user has hand-prefixed a file (e.g. `jojo travel flight ... CleanShot 2026-...png`), those keywords are user knowledge the AI doesn't have. Title-case them and prepend them to the AI description before assembling the new name. Don't drop them.
14. **App library packages are off-limits by default.** Apple Photos (`.photoslibrary`), Lightroom (`.lrlibrary`), Aperture (`.aplibrary`), Final Cut (`.fcpbundle`), GarageBand (`.band`), Logic (`.logicx`) and any `.app` are all bundles whose internals are managed by the host app. Renaming files inside them silently corrupts the catalog. The pipeline checks every segment of the source path against a suffix list and refuses to run if any matches. `--allow-app-libraries` overrides for the rare legitimate case (e.g. a `.app` bundle that happens to contain user-curated screenshots).
15. **Untagged files need a "this is a screenshot dump" gate.** A naive run on `~/Pictures` would happily try to rename every JPEG in sight. The fix: require ≥10 files matching the existing CleanShot/Screenshot regex BEFORE accepting any untagged file as a rename candidate. Without that signal, fall back to a hint-only message ("N untagged file(s) skipped; pass --include-untagged"). The threshold is configurable via `--untagged-threshold`.
16. **Filename embeds a timestamp until it doesn't.** Hand-named files like `flight to australia 1.png` have no `2026-MM-DD at HH.MM.SS` to harvest. Use `stat -f %SB -t %F` for macOS btime when available; mtime if btime is absent or before 1990 (a sentinel for "filesystem doesn't track this"). Date precision drops from `YYYY-MM-DD at HH.MM.SS` to `YYYY-MM-DD` and the new filename uses ` - ` between the kept-stem and the AI description: `<stem> - <Description> - YYYY-MM-DD.ext`.
17. **The missing-space typo (`tabCleanShot 2026-...`) silently excludes files.** Some user-prefixed files lack the space between the user's keyword and `CleanShot`/`Screenshot`. The parser requires `\s+` and drops these. The fix is a pre-pass in `prep` that runs `os.rename` to insert the space (`tabCleanShot ...``tab CleanShot ...`) before parsing. Logged so the user sees what got normalized.
## Quick Reference
| Step | Command |
|---|---|
| Extract mp4 frame | `ffmpeg -y -ss 1 -i "$f" -frames:v 1 -q:v 3 "$out"` |
| Convert pdf to jpg | `sips -s format jpeg "$f" --out "$out"` |
| Resize for vision | `sips -Z 1568 -s format jpeg "$f" --out "$out"` |
| Split TSV into batches of 20 | `awk -v w=DIR 'BEGIN{n=1;c=0} {print > sprintf("%s/batch-%02d", w, n); c++; if(c>=20){c=0;n++}}'` |
| Dispatch agent | Agent tool, `subagent_type=general-purpose`, `model="haiku"`, `run_in_background=true` |
| Execute renames | Python `os.rename` with pre-existence check (NEVER bash `mv` in a loop) |
## Reusable Pipeline
The prep, plan, and rename phases are in `pipeline.py`. The dispatch phase is performed by Claude Code itself (Agent tool calls) and cannot be scripted from inside Python — that's the trade-off of option (b).
Run order:
```bash
# 1. Prep + batch
python3 ~/.claude/skills/screenshot-rename/pipeline.py prep \
--src "/path/to/folder" --batch-size 19
# Optional flags on prep:
# --year 2026 only files whose ts (or btime) starts with 2026
# --include-untagged also rename hand-named images using btime/mtime
# as the date (only if folder has ≥10 tagged files)
# --untagged-threshold N override the ≥10 default
# --allow-app-libraries bypass the .photoslibrary / .lrlibrary guard
# (DANGEROUS — only for the rare legitimate case)
# Now dispatch one Haiku Agent per /tmp/screenshot-rename/full-batch-NN file
# (Claude Code does this — see SKILL.md "Workflow" step 3)
# 2. After all desc-full-NN.tsv files exist:
python3 ~/.claude/skills/screenshot-rename/pipeline.py plan \
--src "/path/to/folder"
# 3. Review the plan, then:
python3 ~/.claude/skills/screenshot-rename/pipeline.py execute \
--src "/path/to/folder"
```
## Subagent Prompt Template
Use exactly this prompt for each batch (substitute the batch number):
```
Describe screenshots so they can be renamed.
Read the manifest at `/tmp/screenshot-rename/full-batch-NN`. Each line: `image_path<TAB>original_filename`.
For EACH line:
1. Use Read on `image_path` (first column) to view the image.
2. Generate a description of EXACTLY 6, 7, or 8 words describing "what app is shown and what the content is". Count your words. Be specific about app names when visible. Use only ASCII letters, numbers, and spaces — NO slashes, colons, dashes, quotes, special characters. Lowercase. 6-8 words.
Output: write `/tmp/screenshot-rename/desc-full-NN.tsv` via Write tool. Each line: `original_filename<TAB>description`. <count> lines total.
Then run `wc -l` on the output file to verify the line count.
Return only "DONE: <count> lines" or an error report.
```
Dispatch all batches **in a single message with multiple Agent tool calls** so they run in parallel. Use `run_in_background=true` so you can keep working.
## Common Mistakes
| Mistake | What goes wrong | Fix |
|---|---|---|
| `mv $f $newname` in a bash loop | One bug → silent overwrite → data loss | `os.rename` in Python with pre-existence check |
| Building target name with bash regex | zsh doesn't populate BASH_REMATCH; empty targets | Use Python `os.path.splitext` and string ops |
| Sending original Retina images to Read | "Image too large" error mid-batch, partial output | Resize to 1568px first |
| Sending .mp4 to vision | Read fails | Extract first frame to JPEG first |
| Skipping the file-count audit | Silent data loss goes unnoticed | `len(os.listdir(DEST))` before & after — must be equal |
| Trusting Haiku's filename column | 30%+ of entries may have wrong extension | Plan-builder tries alt extensions |
| Running rename loop in background `Bash run_in_background=true` | Background `while read` may exit immediately, 0 progress | Run via Python foreground (it's fast — `os.rename` is just a syscall) |
| Looking up Haiku's filename column verbatim | Apple Screenshot files contain U+202F (narrow no-break space); Haiku echoes it as ASCII space, lookup misses every Screenshot file | Normalize U+202F → ASCII space on both sides of the desc dict |
| Hardcoding a single `--prefix` (e.g. `CleanShot`) | Apple Screenshot files and user-prefixed files get silently excluded from the manifest | Parser accepts both `CleanShot` and `Screenshot` and an optional leading keyword phrase |
| Re-running the skill without an already-renamed skip rule | Each run prepends another description; names balloon | Detect `^App - .+ - timestamp\.ext$` and skip |
| Walking into `.photoslibrary` / `.lrlibrary` etc. on a parent dir scan | Renames inside an app-managed bundle silently corrupt the catalog | Refuse if any path segment ends with one of the package suffixes; require `--allow-app-libraries` to override |
| Sweeping arbitrary photos in a non-screenshot folder | A user invokes the skill on `~/Pictures` and the pipeline tries to rename every JPEG | Gate untagged-file inclusion on ≥10 CleanShot/Screenshot matches in the folder, AND require explicit `--include-untagged` |
| Treating filename as the only date source | Hand-named files (e.g. `flight to Australia 1.png`) have no embedded timestamp and get dropped | Fall back to filesystem btime (`stat -f %SB`), then mtime; emit `YYYY-MM-DD` (no time component) in the new filename |
| User keyword abutting `CleanShot` with no space | Files like `weird tabCleanShot 2026-...png` don't match the regex and get silently excluded | Pre-pass in `prep` runs `os.rename` to insert the missing space before parsing |
## Recovery — if something does go wrong
1. **Check `~/Library/Application Support/CleanShot/media/`** — CleanShot keeps a recent media history.
2. **Check external backups (Backblaze, Time Machine to physical disk)** — these contain real file bytes.
3. **Local APFS Time Machine snapshots are NOT useful for iCloud-synced files** — they store file-provider stubs that time out on read.
4. **Check icloud.com → Drive → Recently Deleted** — iCloud keeps deleted files for ~30 days, but `mv` overwrites are NOT "deletes" from iCloud's perspective and may not appear there.
5. **If a Screenshot rename appeared to fail silently** — check for U+202F in the source filename: `python3 -c "import os; [print(repr(n)) for n in os.listdir('.') if 'Screenshot' in n]"`. The `` shows up in the repr; the parser must normalize it.
## Real-World Impact
First run on 196 CleanShot files lost 4 of them due to the bash-regex-in-zsh gotcha (rule #3). After the rebuild with Python and `mv -n`, second run renamed 189 files cleanly with zero loss.
Third run (20 mixed CleanShot + Apple Screenshot + one user-prefixed file) hit the U+202F gotcha (rule #11) on first plan attempt — every Screenshot file was dropped from the plan with a NO_DESC error despite the description being present. Diagnosed via `repr()` of the live filename. After adding U+202F normalization, multi-prefix support, and keyword preservation, all 20 renamed in one pass.
Fourth run (43 files of mixed years in a Dropbox folder containing 2,260 total) needed a year filter and revealed that hand-named files (`flight to Australia 1.png`) silently fell through both the prefix gate and the year-substring filter. Subsequent skill update added `--year`, `--include-untagged` (gated on ≥10 tagged matches), btime/mtime fallback for date inference, automatic missing-space typo normalization, and a hard refusal to walk into Apple Photos / Lightroom / Aperture / Final Cut packages. The "screenshot dump" gate was added specifically to prevent the skill from sweeping `~/Pictures` on a future invocation.
This skill exists so those don't happen again.