sync: dual-prefix support, U+202F handling, keyword preservation

Brings repo HEAD up to current live skill state in ~/.claude/skills/screenshot-rename/. - recognize CleanShot AND Apple Screenshot filenames in one pass - normalize U+202F (NARROW NO-BREAK SPACE) before AM/PM in Apple Screenshot names - preserve user-typed keyword prefix and merge into description - skip files already in renamed form (idempotent re-run) - gotchas #11-13 added to SKILL.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 11:09:54 -04:00
parent 87654866f3
commit 3a9997e990
2 changed files with 197 additions and 89 deletions
@@ -23,6 +23,9 @@ The pipeline is **prep → batch → describe (parallel agents) → validate pla
 - Any image batch where the source filenames are timestamps and the user wants them human-scannable
 - ≥ ~10 files (otherwise just rename them inline)
 - Files include PNG/GIF and optionally MP4 or PDF (pipeline handles all four)
 - Both `CleanShot ...` and Apple `Screenshot ...` filename prefixes are recognized in the same pass
 - Files with a leading user-typed keyword prefix (e.g. `jojo travel CleanShot 2026-...png`) are recognized; the keywords are preserved and merged into the new name
 - Files already in the renamed form (`App - Description - timestamp.ext`) are detected and skipped — re-running the skill on a folder is safe and idempotent
 **Don't use for:**
 - Code or text files — vision isn't needed
@@ -80,6 +83,12 @@ The pipeline is **prep → batch → describe (parallel agents) → validate pla
 10. **Always preserve mp4/pdf source files** — the pipeline reads from the resized JPEG but renames the original mp4/pdf. Don't lose the source extension.
 11. **macOS Screenshot files use U+202F (NARROW NO-BREAK SPACE) before AM/PM.** Apple's `Screenshot 2026-MM-DD at H.MM.SS PM.png` filenames have U+202F (not ASCII space) between the seconds and the meridiem marker. The Haiku subagent reliably normalizes it to ASCII space when echoing the filename into the desc TSV, so the desc-dictionary lookup fails silently — every Screenshot file is dropped from the plan with a misleading "NO_DESC" error and untouched on disk. **Fix:** normalize both sides of the lookup with `s.replace(" ", " ")` AND emit ASCII space in the new filename so the renamed file is keyboard-typable. Detect the offender with `python3 -c "import sys; print(repr(sys.argv[1]))" "filename"` — ` ` will appear in the output.
 12. **Re-running the skill on a folder is safe iff the parser skips already-renamed files.** Without an `^App - .+ - timestamp\.ext$` skip rule, the parser will pile a second AI description into the name on every run. The pipeline detects and excludes these.
 13. **Leading keyword prefix is part of the source signal.** When the user has hand-prefixed a file (e.g. `jojo travel flight ... CleanShot 2026-...png`), those keywords are user knowledge the AI doesn't have. Title-case them and prepend them to the AI description before assembling the new name. Don't drop them.
 ## Quick Reference
 | Step | Command |
@@ -147,6 +156,9 @@ Dispatch all batches **in a single message with multiple Agent tool calls** so t
 | Skipping the file-count audit | Silent data loss goes unnoticed | `len(os.listdir(DEST))` before & after — must be equal |
 | Trusting Haiku's filename column | 30%+ of entries may have wrong extension | Plan-builder tries alt extensions |
 | Running rename loop in background `Bash run_in_background=true` | Background `while read` may exit immediately, 0 progress | Run via Python foreground (it's fast — `os.rename` is just a syscall) |
 | Looking up Haiku's filename column verbatim | Apple Screenshot files contain U+202F (narrow no-break space); Haiku echoes it as ASCII space, lookup misses every Screenshot file | Normalize U+202F → ASCII space on both sides of the desc dict |
 | Hardcoding a single `--prefix` (e.g. `CleanShot`) | Apple Screenshot files and user-prefixed files get silently excluded from the manifest | Parser accepts both `CleanShot` and `Screenshot` and an optional leading keyword phrase |
 | Re-running the skill without an already-renamed skip rule | Each run prepends another description; names balloon | Detect `^App - .+ - timestamp\.ext$` and skip |
 ## Recovery — if something does go wrong
@@ -154,7 +166,12 @@ Dispatch all batches **in a single message with multiple Agent tool calls** so t
 2. **Check external backups (Backblaze, Time Machine to physical disk)** — these contain real file bytes.
 3. **Local APFS Time Machine snapshots are NOT useful for iCloud-synced files** — they store file-provider stubs that time out on read.
 4. **Check icloud.com → Drive → Recently Deleted** — iCloud keeps deleted files for ~30 days, but `mv` overwrites are NOT "deletes" from iCloud's perspective and may not appear there.
 5. **If a Screenshot rename appeared to fail silently** — check for U+202F in the source filename: `python3 -c "import os; [print(repr(n)) for n in os.listdir('.') if 'Screenshot' in n]"`. The ` ` shows up in the repr; the parser must normalize it.
 ## Real-World Impact
-First run on 196 CleanShot files lost 4 of them due to the bash-regex-in-zsh gotcha (rule #3). After the rebuild with Python and `mv -n`, second run renamed 189 files cleanly with zero loss. This skill exists so that doesn't happen again.
+First run on 196 CleanShot files lost 4 of them due to the bash-regex-in-zsh gotcha (rule #3). After the rebuild with Python and `mv -n`, second run renamed 189 files cleanly with zero loss.
 Third run (20 mixed CleanShot + Apple Screenshot + one user-prefixed file) hit the U+202F gotcha (rule #11) on first plan attempt — every Screenshot file was dropped from the plan with a NO_DESC error despite the description being present. Diagnosed via `repr()` of the live filename. After adding U+202F normalization, multi-prefix support, and keyword preservation, all 20 renamed in one pass.
 This skill exists so those don't happen again.
@@ -8,12 +8,15 @@ Three subcommands:
 The Haiku-subagent dispatch step happens between `prep` and `plan` and is
 performed by Claude Code in-session, not by this script.
 Recognizes both `CleanShot ...` and Apple `Screenshot ...` filenames in one
 pass, preserves any leading user-typed keyword prefix, and skips files that
 are already in the renamed `App - Description - timestamp.ext` form.
 """
 import argparse
 import os
 import re
 import shutil
 import subprocess
 import sys
 from pathlib import Path
@@ -22,43 +25,147 @@ WORK = Path("/tmp/screenshot-rename")
 FRAMES = WORK / "frames"
 SMALL = WORK / "small"
 # Apple's Screenshot tool inserts U+202F (narrow no-break space) before AM/PM.
 # Haiku normalizes it to ASCII space when echoing the filename, so desc-dict
 # lookups fail silently. Normalize on both sides AND emit ASCII space.
 NNBSP = " "
 def norm_ws(s: str) -> str:
    return s.replace(NNBSP, " ")
 # Filename parser. Captures:
 #   keywords — optional leading user-typed prefix (e.g. "jojo travel flight")
 #   app      — CleanShot | Screenshot
 #   ts       — "2026-MM-DD at HH.MM.SS" optionally followed by " AM" or " PM"
 #   dup      — optional "(2)" or " 2" duplicate marker
 #   ext      — file extension
 #
 # Run norm_ws() on the filename BEFORE matching so U+202F doesn't break the
 # meridiem branch.
 APP_PATTERN = re.compile(
    r"^(?:(?P<keywords>.+?)\s+)?"
    r"(?P<app>CleanShot|Screenshot)\s+"
    r"(?P<ts>\d{4}-\d{2}-\d{2}\s+at\s+\d{1,2}\.\d{2}\.\d{2}(?:\s*[AP]M)?)"
    r"(?P<dup>\(\d+\)|\s+\d+)?"
    r"\.(?P<ext>[^.]+)$"
 )
 # Already-renamed: "App - <description> - <timestamp>(<dup>)?.<ext>"
 ALREADY_RENAMED = re.compile(
    r"^(?:CleanShot|Screenshot)\s+-\s+.+?\s+-\s+"
    r"\d{4}-\d{2}-\d{2}\s+at\s+\d{1,2}\.\d{2}\.\d{2}(?:\s*[AP]M)?"
    r"(?:\(\d+\))?\.[^.]+$"
 )
 def title_case(s: str) -> str:
    s = re.sub(r"\s+", " ", s.strip())
    return " ".join(w[:1].upper() + w[1:] if w else w for w in s.split(" "))
 def parse_filename(name: str):
    """Return parts dict, or None if the file is not a rename target.
    None means: already renamed, or doesn't look like a screenshot. Caller
    should skip.
    """
    n = norm_ws(name)
    if ALREADY_RENAMED.match(n):
        return None
    m = APP_PATTERN.match(n)
    if not m:
        return None
    return {
        "keywords": (m.group("keywords") or "").strip(),
        "app": m.group("app"),
        "ts": m.group("ts"),
        "dup": m.group("dup") or "",
        "ext": m.group("ext"),
    }
 def build_new_name(parts: dict, ai_desc: str, max_words: int) -> str:
    words = ai_desc.split()[:max_words]
    cleaned = []
    for w in words:
        cw = "".join(c for c in w if c.isalnum())
        if cw:
            cleaned.append(cw)
    if len(cleaned) < 6:
        raise ValueError(f"<6 words after sanitize: {ai_desc!r}")
    titled = title_case(" ".join(cleaned[:max_words]))
    pieces = []
    if parts["keywords"]:
        pieces.append(title_case(parts["keywords"]))
    pieces.append(titled)
    full_desc = " ".join(pieces)
    dup = parts["dup"]
    if dup and not dup.startswith("("):
        dup = "(" + dup.strip() + ")"
    return f'{parts["app"]} - {full_desc} - {parts["ts"]}{dup}.{parts["ext"]}'
 def run(cmd, **kw):
    return subprocess.run(cmd, capture_output=True, text=True, **kw)
 def title_case(s: str) -> str:
    return " ".join(w.capitalize() for w in s.split())
 # ---------- prep ----------
-def prep(src: Path, batch_size: int, prefix: str) -> None:
+
 def prep(src: Path, batch_size: int) -> None:
    if not src.is_dir():
        sys.exit(f"source not a directory: {src}")
    WORK.mkdir(parents=True, exist_ok=True)
    FRAMES.mkdir(exist_ok=True)
    SMALL.mkdir(exist_ok=True)
-    pattern = re.compile(rf"^{re.escape(prefix)}\s+\d{{4}}-\d{{2}}-\d{{2}}.*$")
+    eligible = []
-    files = sorted(p for p in src.iterdir() if p.is_file() and pattern.match(p.name))
+    skipped_already = 0
-    if not files:
+    skipped_other = 0
-        sys.exit(f"no matching files (prefix='{prefix}') in {src}")
+    for p in sorted(src.iterdir()):
-    print(f"found {len(files)} source files")
+        if not p.is_file():
            continue
        parts = parse_filename(p.name)
        if parts is None:
            n = norm_ws(p.name)
            if ALREADY_RENAMED.match(n):
                skipped_already += 1
            else:
                skipped_other += 1
            continue
        eligible.append(p)
    if not eligible:
        sys.exit(
            f"no eligible files in {src} "
            f"(skipped: {skipped_already} already-renamed, {skipped_other} other)"
        )
    print(
        f"found {len(eligible)} eligible files "
        f"(skipped: {skipped_already} already-renamed, {skipped_other} other)"
    )
    manifest = WORK / "all.tsv"
    with manifest.open("w") as out:
-        for f in files:
+        for f in eligible:
            base = f.stem
            ext = f.suffix.lower()
            if ext in (".mp4", ".mov"):
                frame = FRAMES / f"{base}.jpg"
                if not frame.exists():
-                    r = run(["ffmpeg", "-y", "-ss", "1", "-i", str(f),
+                    run(
-                             "-frames:v", "1", "-q:v", "3", str(frame)])
+                        [
-                    if not frame.exists():
+                            "ffmpeg", "-y", "-ss", "1", "-i", str(f),
-                        print(f"WARN ffmpeg failed: {f.name}", file=sys.stderr)
+                            "-frames:v", "1", "-q:v", "3", str(frame),
-                        continue
+                        ]
                    )
                if not frame.exists():
                    print(f"WARN ffmpeg failed: {f.name}", file=sys.stderr)
                    continue
                vision_src = frame
            elif ext == ".pdf":
                frame = FRAMES / f"{base}.jpg"
@@ -76,20 +183,23 @@ def prep(src: Path, batch_size: int, prefix: str) -> None:
            small = SMALL / f"{base}.jpg"
            if not small.exists():
-                run(["sips", "-Z", "1568", "-s", "format", "jpeg",
+                run(
-                     str(vision_src), "--out", str(small)])
+                    [
                        "sips", "-Z", "1568", "-s", "format", "jpeg",
                        str(vision_src), "--out", str(small),
                    ]
                )
            if not small.exists():
                print(f"WARN resize failed: {f.name}", file=sys.stderr)
                continue
            out.write(f"{small}\t{f.name}\n")
    # split into batches
    for old in WORK.glob("full-batch-*"):
        old.unlink()
    lines = manifest.read_text().splitlines()
    n_batches = max(1, (len(lines) + batch_size - 1) // batch_size)
    for i in range(n_batches):
-        chunk = lines[i * batch_size:(i + 1) * batch_size]
+        chunk = lines[i * batch_size : (i + 1) * batch_size]
        (WORK / f"full-batch-{i+1:02d}").write_text("\n".join(chunk) + "\n")
    print(f"prepped {len(lines)} files into {n_batches} batches in {WORK}")
    print(f"\nDispatch {n_batches} Haiku subagents (one per batch).")
@@ -98,79 +208,60 @@ def prep(src: Path, batch_size: int, prefix: str) -> None:
 # ---------- plan ----------
-def plan(src: Path, prefix: str, max_words: int) -> None:
+
 def plan(src: Path, max_words: int) -> None:
    if not src.is_dir():
        sys.exit(f"source not a directory: {src}")
-    descs = sorted(WORK.glob("desc-full-*.tsv"))
+    descs_paths = sorted(WORK.glob("desc-full-*.tsv"))
-    if not descs:
+    if not descs_paths:
        sys.exit("no desc-full-*.tsv files found in /tmp/screenshot-rename")
-    all_lines = []
+
-    for p in descs:
+    # Map normalized-filename → AI description. Haiku may write the filename
-        all_lines.extend(p.read_text().splitlines())
+    # with or without U+202F; normalize on both sides.
-    print(f"aggregated {len(all_lines)} description lines from {len(descs)} batches")
+    descs = {}
    bad_split = []
    for p in descs_paths:
        for lineno, line in enumerate(p.read_text().splitlines(), 1):
            line = line.rstrip()
            if not line:
                continue
            cols = line.split("\t", 1)
            if len(cols) != 2:
                bad_split.append(f"{p.name}:L{lineno}: {line!r}")
                continue
            descs[norm_ws(cols[0])] = cols[1].strip()
    print(f"aggregated {len(descs)} description rows from {len(descs_paths)} batches")
    existing = set(os.listdir(src))
    plan_rows = []
-    errors = []
+    errors = list(bad_split)
    seen = {}
-    for lineno, line in enumerate(all_lines, 1):
+    for actual in sorted(existing):
-        line = line.rstrip()
+        parts = parse_filename(actual)
-        if not line:
+        if parts is None:
            continue
-        parts = line.split("\t", 1)
+        norm_name = norm_ws(actual)
-        if len(parts) != 2:
+        desc = descs.get(norm_name)
-            errors.append(f"L{lineno}: bad split: {line!r}")
+        if not desc:
            errors.append(f"no desc for: {actual!r}")
            continue
-        orig_claimed, desc = parts
+        try:
-
+            new = build_new_name(parts, desc, max_words)
-        if not orig_claimed.startswith(prefix + " "):
+        except ValueError as e:
-            errors.append(f"L{lineno}: prefix: {orig_claimed!r}")
+            errors.append(f"{actual!r}: {e}")
            continue
-
+        if new == actual:
-        # Find the actual file — Haiku occasionally returns .jpg instead of .png
+            errors.append(f"same: {actual!r}")
        orig = orig_claimed
        if orig not in existing:
            base = os.path.splitext(orig_claimed)[0]
            for ext in (".png", ".gif", ".mp4", ".pdf", ".jpg", ".jpeg", ".webp"):
                cand = base + ext
                if cand in existing:
                    orig = cand
                    break
            else:
                errors.append(f"L{lineno}: source not found: {orig_claimed!r}")
                continue
        words = desc.split()
        if len(words) < 6:
            errors.append(f"L{lineno}: <6 words: {orig!r} -> {desc!r}")
            continue
        words = words[:max_words]
        cleaned = []
        for w in words:
            cw = "".join(c for c in w if c.isalnum())
            if cw:
                cleaned.append(cw)
        if len(cleaned) < 6:
            errors.append(f"L{lineno}: <6 after sanitize: {desc!r}")
            continue
        cleaned = cleaned[:max_words]
        titled = title_case(" ".join(cleaned))
        rest = orig[len(prefix) + 1:]  # everything after "Prefix "
        new = f"{prefix} - {titled} - {rest}"
        if new == orig:
            errors.append(f"L{lineno}: same: {orig!r}")
            continue
        if new in existing:
-            errors.append(f"L{lineno}: target exists in DEST: {new!r}")
+            errors.append(f"target exists in DEST: {new!r}")
            continue
        if new in seen:
-            errors.append(f"L{lineno}: plan collision: {new!r} from {orig!r} and {seen[new]!r}")
+            errors.append(f"plan collision: {new!r} from {actual!r} and {seen[new]!r}")
            continue
-        seen[new] = orig
+        seen[new] = actual
-        plan_rows.append((orig, new))
+        plan_rows.append((actual, new))
    print(f"plan: {len(plan_rows)} renames, {len(errors)} errors")
    if errors:
@@ -185,16 +276,18 @@ def plan(src: Path, prefix: str, max_words: int) -> None:
        for orig, new in plan_rows:
            f.write(f"{orig}\t{new}\n")
    print(f"\nplan saved: {plan_path}")
-    print(f"sample (every {max(1, len(plan_rows)//6)}th row):")
+    if plan_rows:
-    step = max(1, len(plan_rows) // 6)
+        print(f"sample (every {max(1, len(plan_rows)//6)}th row):")
-    for i in range(0, len(plan_rows), step):
+        step = max(1, len(plan_rows) // 6)
-        orig, new = plan_rows[i]
+        for i in range(0, len(plan_rows), step):
-        print(f"  {orig}\n   → {new}\n")
+            orig, new = plan_rows[i]
            print(f"  {orig}\n   → {new}\n")
    print(f"if plan looks good: pipeline.py execute --src '{src}'")
 # ---------- execute ----------
 def execute(src: Path) -> None:
    if not src.is_dir():
        sys.exit(f"source not a directory: {src}")
@@ -249,6 +342,7 @@ def execute(src: Path) -> None:
 # ---------- main ----------
 def main() -> None:
    p = argparse.ArgumentParser(description=__doc__)
    sub = p.add_subparsers(dest="cmd", required=True)
@@ -256,12 +350,9 @@ def main() -> None:
    p_prep = sub.add_parser("prep", help="extract frames, resize, build batches")
    p_prep.add_argument("--src", type=Path, required=True)
    p_prep.add_argument("--batch-size", type=int, default=19)
    p_prep.add_argument("--prefix", default="CleanShot",
                        help="filename prefix to match (default CleanShot)")
    p_plan = sub.add_parser("plan", help="build & validate rename plan")
    p_plan.add_argument("--src", type=Path, required=True)
    p_plan.add_argument("--prefix", default="CleanShot")
    p_plan.add_argument("--max-words", type=int, default=8)
    p_exec = sub.add_parser("execute", help="apply rename plan with safety checks")
@@ -269,9 +360,9 @@ def main() -> None:
    args = p.parse_args()
    if args.cmd == "prep":
-        prep(args.src, args.batch_size, args.prefix)
+        prep(args.src, args.batch_size)
    elif args.cmd == "plan":
-        plan(args.src, args.prefix, args.max_words)
+        plan(args.src, args.max_words)
    elif args.cmd == "execute":
        execute(args.src)