Compare commits
1 Commits
0728ae6592
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 030a40aa4b |
@@ -26,11 +26,14 @@ The pipeline is **prep → batch → describe (parallel agents) → validate pla
|
|||||||
- Both `CleanShot ...` and Apple `Screenshot ...` filename prefixes are recognized in the same pass
|
- Both `CleanShot ...` and Apple `Screenshot ...` filename prefixes are recognized in the same pass
|
||||||
- Files with a leading user-typed keyword prefix (e.g. `jojo travel CleanShot 2026-...png`) are recognized; the keywords are preserved and merged into the new name
|
- Files with a leading user-typed keyword prefix (e.g. `jojo travel CleanShot 2026-...png`) are recognized; the keywords are preserved and merged into the new name
|
||||||
- Files already in the renamed form (`App - Description - timestamp.ext`) are detected and skipped — re-running the skill on a folder is safe and idempotent
|
- Files already in the renamed form (`App - Description - timestamp.ext`) are detected and skipped — re-running the skill on a folder is safe and idempotent
|
||||||
|
- Hand-named files with no embedded timestamp (e.g. `flight to australia 1.png`) — pass `--include-untagged`. Date is taken from filesystem btime/mtime. Only allowed when the folder already contains ≥10 tagged screenshots, so we don't sweep up arbitrary photo libraries.
|
||||||
|
- Restrict to a single year with `--year YYYY` (matches embedded ts or btime).
|
||||||
|
|
||||||
**Don't use for:**
|
**Don't use for:**
|
||||||
- Code or text files — vision isn't needed
|
- Code or text files — vision isn't needed
|
||||||
- Files where the name pattern is already meaningful
|
- Files where the name pattern is already meaningful
|
||||||
- Single-file rename (just do it directly)
|
- Single-file rename (just do it directly)
|
||||||
|
- App-managed image catalogs (Apple Photos `.photoslibrary`, Lightroom `.lrlibrary`, Aperture `.aplibrary`, Final Cut, etc.) — the pipeline refuses to run inside these by default. Override with `--allow-app-libraries` only if you know what you're doing.
|
||||||
|
|
||||||
## Workflow
|
## Workflow
|
||||||
|
|
||||||
@@ -89,6 +92,14 @@ The pipeline is **prep → batch → describe (parallel agents) → validate pla
|
|||||||
|
|
||||||
13. **Leading keyword prefix is part of the source signal.** When the user has hand-prefixed a file (e.g. `jojo travel flight ... CleanShot 2026-...png`), those keywords are user knowledge the AI doesn't have. Title-case them and prepend them to the AI description before assembling the new name. Don't drop them.
|
13. **Leading keyword prefix is part of the source signal.** When the user has hand-prefixed a file (e.g. `jojo travel flight ... CleanShot 2026-...png`), those keywords are user knowledge the AI doesn't have. Title-case them and prepend them to the AI description before assembling the new name. Don't drop them.
|
||||||
|
|
||||||
|
14. **App library packages are off-limits by default.** Apple Photos (`.photoslibrary`), Lightroom (`.lrlibrary`), Aperture (`.aplibrary`), Final Cut (`.fcpbundle`), GarageBand (`.band`), Logic (`.logicx`) and any `.app` are all bundles whose internals are managed by the host app. Renaming files inside them silently corrupts the catalog. The pipeline checks every segment of the source path against a suffix list and refuses to run if any matches. `--allow-app-libraries` overrides for the rare legitimate case (e.g. a `.app` bundle that happens to contain user-curated screenshots).
|
||||||
|
|
||||||
|
15. **Untagged files need a "this is a screenshot dump" gate.** A naive run on `~/Pictures` would happily try to rename every JPEG in sight. The fix: require ≥10 files matching the existing CleanShot/Screenshot regex BEFORE accepting any untagged file as a rename candidate. Without that signal, fall back to a hint-only message ("N untagged file(s) skipped; pass --include-untagged"). The threshold is configurable via `--untagged-threshold`.
|
||||||
|
|
||||||
|
16. **Filename embeds a timestamp until it doesn't.** Hand-named files like `flight to australia 1.png` have no `2026-MM-DD at HH.MM.SS` to harvest. Use `stat -f %SB -t %F` for macOS btime when available; mtime if btime is absent or before 1990 (a sentinel for "filesystem doesn't track this"). Date precision drops from `YYYY-MM-DD at HH.MM.SS` to `YYYY-MM-DD` and the new filename uses ` - ` between the kept-stem and the AI description: `<stem> - <Description> - YYYY-MM-DD.ext`.
|
||||||
|
|
||||||
|
17. **The missing-space typo (`tabCleanShot 2026-...`) silently excludes files.** Some user-prefixed files lack the space between the user's keyword and `CleanShot`/`Screenshot`. The parser requires `\s+` and drops these. The fix is a pre-pass in `prep` that runs `os.rename` to insert the space (`tabCleanShot ...` → `tab CleanShot ...`) before parsing. Logged so the user sees what got normalized.
|
||||||
|
|
||||||
## Quick Reference
|
## Quick Reference
|
||||||
|
|
||||||
| Step | Command |
|
| Step | Command |
|
||||||
@@ -111,6 +122,14 @@ Run order:
|
|||||||
python3 ~/.claude/skills/screenshot-rename/pipeline.py prep \
|
python3 ~/.claude/skills/screenshot-rename/pipeline.py prep \
|
||||||
--src "/path/to/folder" --batch-size 19
|
--src "/path/to/folder" --batch-size 19
|
||||||
|
|
||||||
|
# Optional flags on prep:
|
||||||
|
# --year 2026 only files whose ts (or btime) starts with 2026
|
||||||
|
# --include-untagged also rename hand-named images using btime/mtime
|
||||||
|
# as the date (only if folder has ≥10 tagged files)
|
||||||
|
# --untagged-threshold N override the ≥10 default
|
||||||
|
# --allow-app-libraries bypass the .photoslibrary / .lrlibrary guard
|
||||||
|
# (DANGEROUS — only for the rare legitimate case)
|
||||||
|
|
||||||
# Now dispatch one Haiku Agent per /tmp/screenshot-rename/full-batch-NN file
|
# Now dispatch one Haiku Agent per /tmp/screenshot-rename/full-batch-NN file
|
||||||
# (Claude Code does this — see SKILL.md "Workflow" step 3)
|
# (Claude Code does this — see SKILL.md "Workflow" step 3)
|
||||||
|
|
||||||
@@ -159,6 +178,10 @@ Dispatch all batches **in a single message with multiple Agent tool calls** so t
|
|||||||
| Looking up Haiku's filename column verbatim | Apple Screenshot files contain U+202F (narrow no-break space); Haiku echoes it as ASCII space, lookup misses every Screenshot file | Normalize U+202F → ASCII space on both sides of the desc dict |
|
| Looking up Haiku's filename column verbatim | Apple Screenshot files contain U+202F (narrow no-break space); Haiku echoes it as ASCII space, lookup misses every Screenshot file | Normalize U+202F → ASCII space on both sides of the desc dict |
|
||||||
| Hardcoding a single `--prefix` (e.g. `CleanShot`) | Apple Screenshot files and user-prefixed files get silently excluded from the manifest | Parser accepts both `CleanShot` and `Screenshot` and an optional leading keyword phrase |
|
| Hardcoding a single `--prefix` (e.g. `CleanShot`) | Apple Screenshot files and user-prefixed files get silently excluded from the manifest | Parser accepts both `CleanShot` and `Screenshot` and an optional leading keyword phrase |
|
||||||
| Re-running the skill without an already-renamed skip rule | Each run prepends another description; names balloon | Detect `^App - .+ - timestamp\.ext$` and skip |
|
| Re-running the skill without an already-renamed skip rule | Each run prepends another description; names balloon | Detect `^App - .+ - timestamp\.ext$` and skip |
|
||||||
|
| Walking into `.photoslibrary` / `.lrlibrary` etc. on a parent dir scan | Renames inside an app-managed bundle silently corrupt the catalog | Refuse if any path segment ends with one of the package suffixes; require `--allow-app-libraries` to override |
|
||||||
|
| Sweeping arbitrary photos in a non-screenshot folder | A user invokes the skill on `~/Pictures` and the pipeline tries to rename every JPEG | Gate untagged-file inclusion on ≥10 CleanShot/Screenshot matches in the folder, AND require explicit `--include-untagged` |
|
||||||
|
| Treating filename as the only date source | Hand-named files (e.g. `flight to Australia 1.png`) have no embedded timestamp and get dropped | Fall back to filesystem btime (`stat -f %SB`), then mtime; emit `YYYY-MM-DD` (no time component) in the new filename |
|
||||||
|
| User keyword abutting `CleanShot` with no space | Files like `weird tabCleanShot 2026-...png` don't match the regex and get silently excluded | Pre-pass in `prep` runs `os.rename` to insert the missing space before parsing |
|
||||||
|
|
||||||
## Recovery — if something does go wrong
|
## Recovery — if something does go wrong
|
||||||
|
|
||||||
@@ -174,4 +197,6 @@ First run on 196 CleanShot files lost 4 of them due to the bash-regex-in-zsh got
|
|||||||
|
|
||||||
Third run (20 mixed CleanShot + Apple Screenshot + one user-prefixed file) hit the U+202F gotcha (rule #11) on first plan attempt — every Screenshot file was dropped from the plan with a NO_DESC error despite the description being present. Diagnosed via `repr()` of the live filename. After adding U+202F normalization, multi-prefix support, and keyword preservation, all 20 renamed in one pass.
|
Third run (20 mixed CleanShot + Apple Screenshot + one user-prefixed file) hit the U+202F gotcha (rule #11) on first plan attempt — every Screenshot file was dropped from the plan with a NO_DESC error despite the description being present. Diagnosed via `repr()` of the live filename. After adding U+202F normalization, multi-prefix support, and keyword preservation, all 20 renamed in one pass.
|
||||||
|
|
||||||
|
Fourth run (43 files of mixed years in a Dropbox folder containing 2,260 total) needed a year filter and revealed that hand-named files (`flight to Australia 1.png`) silently fell through both the prefix gate and the year-substring filter. Subsequent skill update added `--year`, `--include-untagged` (gated on ≥10 tagged matches), btime/mtime fallback for date inference, automatic missing-space typo normalization, and a hard refusal to walk into Apple Photos / Lightroom / Aperture / Final Cut packages. The "screenshot dump" gate was added specifically to prevent the skill from sweeping `~/Pictures` on a future invocation.
|
||||||
|
|
||||||
This skill exists so those don't happen again.
|
This skill exists so those don't happen again.
|
||||||
|
|||||||
+298
-43
@@ -12,6 +12,20 @@ performed by Claude Code in-session, not by this script.
|
|||||||
Recognizes both `CleanShot ...` and Apple `Screenshot ...` filenames in one
|
Recognizes both `CleanShot ...` and Apple `Screenshot ...` filenames in one
|
||||||
pass, preserves any leading user-typed keyword prefix, and skips files that
|
pass, preserves any leading user-typed keyword prefix, and skips files that
|
||||||
are already in the renamed `App - Description - timestamp.ext` form.
|
are already in the renamed `App - Description - timestamp.ext` form.
|
||||||
|
|
||||||
|
Also handles, behind opt-in flags:
|
||||||
|
--year YYYY restrict to files whose embedded ts (or file btime)
|
||||||
|
starts with YYYY
|
||||||
|
--include-untagged include image files that lack any CleanShot/Screenshot
|
||||||
|
prefix, dating them from filesystem btime/mtime;
|
||||||
|
requires the folder to look like a screenshot dump
|
||||||
|
(≥10 tagged matches) so we don't sweep up arbitrary
|
||||||
|
photos.
|
||||||
|
|
||||||
|
Refuses to operate on paths inside known app library packages
|
||||||
|
(.photoslibrary, .aplibrary, .lrlibrary, etc.) unless --allow-app-libraries
|
||||||
|
is passed — guards against accidental runs over Apple Photos / Lightroom
|
||||||
|
catalogs when invoked on a parent dir.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
@@ -19,6 +33,7 @@ import os
|
|||||||
import re
|
import re
|
||||||
import subprocess
|
import subprocess
|
||||||
import sys
|
import sys
|
||||||
|
from datetime import datetime
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
WORK = Path("/tmp/screenshot-rename")
|
WORK = Path("/tmp/screenshot-rename")
|
||||||
@@ -59,6 +74,74 @@ ALREADY_RENAMED = re.compile(
|
|||||||
r"(?:\(\d+\))?\.[^.]+$"
|
r"(?:\(\d+\))?\.[^.]+$"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Untagged-already-renamed: "<keywords> - <description> - YYYY-MM-DD.<ext>"
|
||||||
|
# We use this to skip the result of a previous --include-untagged run.
|
||||||
|
UNTAGGED_RENAMED = re.compile(
|
||||||
|
r"^.+?\s+-\s+.+?\s+-\s+\d{4}-\d{2}-\d{2}(?:\(\d+\))?\.[^.]+$"
|
||||||
|
)
|
||||||
|
|
||||||
|
# User keyword abutting CleanShot/Screenshot with no space.
|
||||||
|
# e.g. "weird hightlighted tabCleanShot 2026-..." → insert space.
|
||||||
|
MISSING_SPACE_PATTERN = re.compile(
|
||||||
|
r"(?P<pre>\S)(?P<app>CleanShot|Screenshot)(?P<post>\s+\d{4}-)"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Folder-name patterns we refuse to walk into. Apple Photos packages, Lightroom
|
||||||
|
# catalogs, Aperture, Final Cut, etc. — these contain images managed by other
|
||||||
|
# apps and should never be renamed by this skill.
|
||||||
|
APP_LIB_SUFFIXES = (
|
||||||
|
".photoslibrary",
|
||||||
|
".aplibrary",
|
||||||
|
".lrlibrary",
|
||||||
|
".lrcat",
|
||||||
|
".lrcat-data",
|
||||||
|
".tvlibrary",
|
||||||
|
".tvprojcache",
|
||||||
|
".fcpbundle",
|
||||||
|
".band",
|
||||||
|
".logicx",
|
||||||
|
".app",
|
||||||
|
)
|
||||||
|
APP_LIB_NAMES = ("Photo Booth Library", "Photos Library")
|
||||||
|
|
||||||
|
IMAGE_EXTS = (".png", ".gif", ".jpg", ".jpeg", ".webp", ".heic")
|
||||||
|
VIDEO_EXTS = (".mp4", ".mov")
|
||||||
|
PDF_EXTS = (".pdf",)
|
||||||
|
|
||||||
|
|
||||||
|
def is_in_app_library(p: Path) -> bool:
|
||||||
|
"""True if any segment of p is an app library package (or a known name)."""
|
||||||
|
try:
|
||||||
|
rp = p.resolve()
|
||||||
|
except OSError:
|
||||||
|
rp = p
|
||||||
|
for seg in rp.parts:
|
||||||
|
if any(seg.endswith(suf) for suf in APP_LIB_SUFFIXES):
|
||||||
|
return True
|
||||||
|
if seg in APP_LIB_NAMES:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def file_date(p: Path) -> str:
|
||||||
|
"""YYYY-MM-DD from stat btime when sane, else mtime.
|
||||||
|
|
||||||
|
On macOS `stat -f %SB -t %F` returns the file's birth time. If unset or
|
||||||
|
before 1990 (suggests fallback or broken metadata), use mtime instead.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
r = subprocess.run(
|
||||||
|
["stat", "-f", "%SB", "-t", "%F", str(p)],
|
||||||
|
capture_output=True, text=True, timeout=5,
|
||||||
|
)
|
||||||
|
if r.returncode == 0:
|
||||||
|
s = r.stdout.strip()
|
||||||
|
if s and s.startswith(("19", "20")) and s >= "1990-01-01":
|
||||||
|
return s
|
||||||
|
except (OSError, subprocess.SubprocessError):
|
||||||
|
pass
|
||||||
|
return datetime.fromtimestamp(p.stat().st_mtime).strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
|
||||||
def title_case(s: str) -> str:
|
def title_case(s: str) -> str:
|
||||||
s = re.sub(r"\s+", " ", s.strip())
|
s = re.sub(r"\s+", " ", s.strip())
|
||||||
@@ -66,11 +149,7 @@ def title_case(s: str) -> str:
|
|||||||
|
|
||||||
|
|
||||||
def parse_filename(name: str):
|
def parse_filename(name: str):
|
||||||
"""Return parts dict, or None if the file is not a rename target.
|
"""Parts dict for tagged filenames; None for already-renamed or non-match."""
|
||||||
|
|
||||||
None means: already renamed, or doesn't look like a screenshot. Caller
|
|
||||||
should skip.
|
|
||||||
"""
|
|
||||||
n = norm_ws(name)
|
n = norm_ws(name)
|
||||||
if ALREADY_RENAMED.match(n):
|
if ALREADY_RENAMED.match(n):
|
||||||
return None
|
return None
|
||||||
@@ -86,6 +165,38 @@ def parse_filename(name: str):
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def synthesize_untagged_parts(p: Path):
|
||||||
|
"""Parts dict for an untagged file (no CleanShot/Screenshot prefix).
|
||||||
|
|
||||||
|
Date is the file's btime/mtime since the filename has no embedded ts.
|
||||||
|
Returns None if file doesn't exist or has no extension.
|
||||||
|
"""
|
||||||
|
if not p.is_file():
|
||||||
|
return None
|
||||||
|
name = norm_ws(p.name)
|
||||||
|
if UNTAGGED_RENAMED.match(name):
|
||||||
|
return None
|
||||||
|
stem, dotext = os.path.splitext(name)
|
||||||
|
if not dotext:
|
||||||
|
return None
|
||||||
|
return {
|
||||||
|
"keywords": stem,
|
||||||
|
"app": None,
|
||||||
|
"ts": file_date(p),
|
||||||
|
"dup": "",
|
||||||
|
"ext": dotext[1:],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_typo_filename(name: str) -> str:
|
||||||
|
"""Insert space between user-keyword and CleanShot/Screenshot if abutting.
|
||||||
|
|
||||||
|
'weird tabCleanShot 2026-...' → 'weird tab CleanShot 2026-...'
|
||||||
|
No-op if the pattern doesn't match.
|
||||||
|
"""
|
||||||
|
return MISSING_SPACE_PATTERN.sub(r"\g<pre> \g<app>\g<post>", name)
|
||||||
|
|
||||||
|
|
||||||
def build_new_name(parts: dict, ai_desc: str, max_words: int) -> str:
|
def build_new_name(parts: dict, ai_desc: str, max_words: int) -> str:
|
||||||
words = ai_desc.split()[:max_words]
|
words = ai_desc.split()[:max_words]
|
||||||
cleaned = []
|
cleaned = []
|
||||||
@@ -97,77 +208,168 @@ def build_new_name(parts: dict, ai_desc: str, max_words: int) -> str:
|
|||||||
raise ValueError(f"<6 words after sanitize: {ai_desc!r}")
|
raise ValueError(f"<6 words after sanitize: {ai_desc!r}")
|
||||||
titled = title_case(" ".join(cleaned[:max_words]))
|
titled = title_case(" ".join(cleaned[:max_words]))
|
||||||
|
|
||||||
|
dup = parts["dup"]
|
||||||
|
if dup and not dup.startswith("("):
|
||||||
|
dup = "(" + dup.strip() + ")"
|
||||||
|
|
||||||
|
if parts["app"]:
|
||||||
pieces = []
|
pieces = []
|
||||||
if parts["keywords"]:
|
if parts["keywords"]:
|
||||||
pieces.append(title_case(parts["keywords"]))
|
pieces.append(title_case(parts["keywords"]))
|
||||||
pieces.append(titled)
|
pieces.append(titled)
|
||||||
full_desc = " ".join(pieces)
|
full_desc = " ".join(pieces)
|
||||||
|
|
||||||
dup = parts["dup"]
|
|
||||||
if dup and not dup.startswith("("):
|
|
||||||
dup = "(" + dup.strip() + ")"
|
|
||||||
return f'{parts["app"]} - {full_desc} - {parts["ts"]}{dup}.{parts["ext"]}'
|
return f'{parts["app"]} - {full_desc} - {parts["ts"]}{dup}.{parts["ext"]}'
|
||||||
|
# Untagged: <keywords> - <ai-desc> - <date>.<ext> with explicit separator
|
||||||
|
kw = title_case(parts["keywords"]) if parts["keywords"] else ""
|
||||||
|
if kw:
|
||||||
|
return f"{kw} - {titled} - {parts['ts']}{dup}.{parts['ext']}"
|
||||||
|
return f"{titled} - {parts['ts']}{dup}.{parts['ext']}"
|
||||||
|
|
||||||
|
|
||||||
def run(cmd, **kw):
|
def run(cmd, **kw):
|
||||||
return subprocess.run(cmd, capture_output=True, text=True, **kw)
|
return subprocess.run(cmd, capture_output=True, text=True, **kw)
|
||||||
|
|
||||||
|
|
||||||
|
def parts_year(parts) -> str:
|
||||||
|
"""Extract YYYY from parts (tagged or untagged)."""
|
||||||
|
m = re.match(r"(\d{4})", parts["ts"])
|
||||||
|
return m.group(1) if m else ""
|
||||||
|
|
||||||
|
|
||||||
# ---------- prep ----------
|
# ---------- prep ----------
|
||||||
|
|
||||||
|
|
||||||
def prep(src: Path, batch_size: int) -> None:
|
def prep(
|
||||||
|
src: Path,
|
||||||
|
batch_size: int,
|
||||||
|
year: str | None = None,
|
||||||
|
include_untagged: bool = False,
|
||||||
|
allow_app_libraries: bool = False,
|
||||||
|
untagged_threshold: int = 10,
|
||||||
|
) -> None:
|
||||||
if not src.is_dir():
|
if not src.is_dir():
|
||||||
sys.exit(f"source not a directory: {src}")
|
sys.exit(f"source not a directory: {src}")
|
||||||
|
if is_in_app_library(src) and not allow_app_libraries:
|
||||||
|
sys.exit(
|
||||||
|
f"refusing to run inside an app library package: {src}\n"
|
||||||
|
f"if intentional, pass --allow-app-libraries"
|
||||||
|
)
|
||||||
WORK.mkdir(parents=True, exist_ok=True)
|
WORK.mkdir(parents=True, exist_ok=True)
|
||||||
FRAMES.mkdir(exist_ok=True)
|
FRAMES.mkdir(exist_ok=True)
|
||||||
SMALL.mkdir(exist_ok=True)
|
SMALL.mkdir(exist_ok=True)
|
||||||
|
|
||||||
eligible = []
|
# Pre-pass: normalize missing-space typos in source filenames.
|
||||||
skipped_already = 0
|
typo_renamed = 0
|
||||||
skipped_other = 0
|
|
||||||
for p in sorted(src.iterdir()):
|
for p in sorted(src.iterdir()):
|
||||||
if not p.is_file():
|
if not p.is_file():
|
||||||
continue
|
continue
|
||||||
parts = parse_filename(p.name)
|
|
||||||
if parts is None:
|
|
||||||
n = norm_ws(p.name)
|
n = norm_ws(p.name)
|
||||||
if ALREADY_RENAMED.match(n):
|
fixed = normalize_typo_filename(n)
|
||||||
|
if fixed != n:
|
||||||
|
new_path = src / fixed
|
||||||
|
if not new_path.exists():
|
||||||
|
os.rename(p, new_path)
|
||||||
|
typo_renamed += 1
|
||||||
|
print(f"normalized typo: {p.name!r} → {fixed!r}")
|
||||||
|
if typo_renamed:
|
||||||
|
print(f"pre-pass: normalized {typo_renamed} missing-space typo(s)\n")
|
||||||
|
|
||||||
|
# Main pass: classify each file.
|
||||||
|
tagged_count = 0
|
||||||
|
untagged_candidates = []
|
||||||
|
eligible = [] # list of (path, parts) tuples
|
||||||
|
skipped_already = 0
|
||||||
|
skipped_other = 0
|
||||||
|
skipped_year = 0
|
||||||
|
refused_lib = 0
|
||||||
|
for p in sorted(src.iterdir()):
|
||||||
|
if not p.is_file():
|
||||||
|
continue
|
||||||
|
if is_in_app_library(p) and not allow_app_libraries:
|
||||||
|
refused_lib += 1
|
||||||
|
continue
|
||||||
|
parts = parse_filename(p.name)
|
||||||
|
if parts is not None:
|
||||||
|
tagged_count += 1
|
||||||
|
if year and parts_year(parts) != year:
|
||||||
|
skipped_year += 1
|
||||||
|
continue
|
||||||
|
eligible.append((p, parts))
|
||||||
|
continue
|
||||||
|
n = norm_ws(p.name)
|
||||||
|
if ALREADY_RENAMED.match(n) or UNTAGGED_RENAMED.match(n):
|
||||||
skipped_already += 1
|
skipped_already += 1
|
||||||
|
continue
|
||||||
|
# Untagged candidate — defer until we know whether the folder qualifies
|
||||||
|
# as a screenshot dump.
|
||||||
|
if p.suffix.lower() in IMAGE_EXTS + VIDEO_EXTS + PDF_EXTS:
|
||||||
|
untagged_candidates.append(p)
|
||||||
else:
|
else:
|
||||||
skipped_other += 1
|
skipped_other += 1
|
||||||
|
|
||||||
|
if include_untagged:
|
||||||
|
if tagged_count >= untagged_threshold:
|
||||||
|
for p in untagged_candidates:
|
||||||
|
parts = synthesize_untagged_parts(p)
|
||||||
|
if parts is None:
|
||||||
|
skipped_other += 1
|
||||||
continue
|
continue
|
||||||
eligible.append(p)
|
if year and parts_year(parts) != year:
|
||||||
|
skipped_year += 1
|
||||||
|
continue
|
||||||
|
eligible.append((p, parts))
|
||||||
|
else:
|
||||||
|
print(
|
||||||
|
f"--include-untagged ignored: only {tagged_count} tagged file(s), "
|
||||||
|
f"need ≥{untagged_threshold} for the folder to qualify as a screenshot dump"
|
||||||
|
)
|
||||||
|
skipped_other += len(untagged_candidates)
|
||||||
|
else:
|
||||||
|
if untagged_candidates:
|
||||||
|
print(
|
||||||
|
f"hint: {len(untagged_candidates)} untagged image/video file(s) skipped; "
|
||||||
|
f"pass --include-untagged to include them (date from btime/mtime)"
|
||||||
|
)
|
||||||
|
skipped_other += len(untagged_candidates)
|
||||||
|
|
||||||
if not eligible:
|
if not eligible:
|
||||||
sys.exit(
|
sys.exit(
|
||||||
f"no eligible files in {src} "
|
f"no eligible files in {src} "
|
||||||
f"(skipped: {skipped_already} already-renamed, {skipped_other} other)"
|
f"(skipped: {skipped_already} already-renamed, "
|
||||||
|
f"{skipped_year} wrong-year, "
|
||||||
|
f"{skipped_other} other"
|
||||||
|
+ (f", {refused_lib} in app libraries" if refused_lib else "")
|
||||||
|
+ ")"
|
||||||
)
|
)
|
||||||
print(
|
summary = (
|
||||||
f"found {len(eligible)} eligible files "
|
f"found {len(eligible)} eligible files "
|
||||||
f"(skipped: {skipped_already} already-renamed, {skipped_other} other)"
|
f"(skipped: {skipped_already} already-renamed, "
|
||||||
|
f"{skipped_year} wrong-year, "
|
||||||
|
f"{skipped_other} other"
|
||||||
)
|
)
|
||||||
|
if refused_lib:
|
||||||
|
summary += f", {refused_lib} in app libraries"
|
||||||
|
summary += ")"
|
||||||
|
print(summary)
|
||||||
|
|
||||||
|
# Resize/extract for vision and write manifest.
|
||||||
manifest = WORK / "all.tsv"
|
manifest = WORK / "all.tsv"
|
||||||
with manifest.open("w") as out:
|
with manifest.open("w") as out:
|
||||||
for f in eligible:
|
for f, _parts in eligible:
|
||||||
base = f.stem
|
base = f.stem
|
||||||
ext = f.suffix.lower()
|
ext = f.suffix.lower()
|
||||||
if ext in (".mp4", ".mov"):
|
if ext in VIDEO_EXTS:
|
||||||
frame = FRAMES / f"{base}.jpg"
|
frame = FRAMES / f"{base}.jpg"
|
||||||
if not frame.exists():
|
if not frame.exists():
|
||||||
run(
|
run([
|
||||||
[
|
|
||||||
"ffmpeg", "-y", "-ss", "1", "-i", str(f),
|
"ffmpeg", "-y", "-ss", "1", "-i", str(f),
|
||||||
"-frames:v", "1", "-q:v", "3", str(frame),
|
"-frames:v", "1", "-q:v", "3", str(frame),
|
||||||
]
|
])
|
||||||
)
|
|
||||||
if not frame.exists():
|
if not frame.exists():
|
||||||
print(f"WARN ffmpeg failed: {f.name}", file=sys.stderr)
|
print(f"WARN ffmpeg failed: {f.name}", file=sys.stderr)
|
||||||
continue
|
continue
|
||||||
vision_src = frame
|
vision_src = frame
|
||||||
elif ext == ".pdf":
|
elif ext in PDF_EXTS:
|
||||||
frame = FRAMES / f"{base}.jpg"
|
frame = FRAMES / f"{base}.jpg"
|
||||||
if not frame.exists():
|
if not frame.exists():
|
||||||
run(["sips", "-s", "format", "jpeg", str(f), "--out", str(frame)])
|
run(["sips", "-s", "format", "jpeg", str(f), "--out", str(frame)])
|
||||||
@@ -175,7 +377,7 @@ def prep(src: Path, batch_size: int) -> None:
|
|||||||
print(f"WARN sips failed on pdf: {f.name}", file=sys.stderr)
|
print(f"WARN sips failed on pdf: {f.name}", file=sys.stderr)
|
||||||
continue
|
continue
|
||||||
vision_src = frame
|
vision_src = frame
|
||||||
elif ext in (".png", ".gif", ".jpg", ".jpeg", ".webp"):
|
elif ext in IMAGE_EXTS:
|
||||||
vision_src = f
|
vision_src = f
|
||||||
else:
|
else:
|
||||||
print(f"SKIP unknown ext: {f.name}", file=sys.stderr)
|
print(f"SKIP unknown ext: {f.name}", file=sys.stderr)
|
||||||
@@ -183,12 +385,10 @@ def prep(src: Path, batch_size: int) -> None:
|
|||||||
|
|
||||||
small = SMALL / f"{base}.jpg"
|
small = SMALL / f"{base}.jpg"
|
||||||
if not small.exists():
|
if not small.exists():
|
||||||
run(
|
run([
|
||||||
[
|
|
||||||
"sips", "-Z", "1568", "-s", "format", "jpeg",
|
"sips", "-Z", "1568", "-s", "format", "jpeg",
|
||||||
str(vision_src), "--out", str(small),
|
str(vision_src), "--out", str(small),
|
||||||
]
|
])
|
||||||
)
|
|
||||||
if not small.exists():
|
if not small.exists():
|
||||||
print(f"WARN resize failed: {f.name}", file=sys.stderr)
|
print(f"WARN resize failed: {f.name}", file=sys.stderr)
|
||||||
continue
|
continue
|
||||||
@@ -209,6 +409,19 @@ def prep(src: Path, batch_size: int) -> None:
|
|||||||
# ---------- plan ----------
|
# ---------- plan ----------
|
||||||
|
|
||||||
|
|
||||||
|
def _find_alt_extension(orig: str, existing: set[str]) -> str | None:
|
||||||
|
"""Haiku sometimes returns the resized .jpg extension instead of the
|
||||||
|
real .png/.gif/.mp4. Try alt extensions of the same stem."""
|
||||||
|
stem, dotext = os.path.splitext(orig)
|
||||||
|
if not dotext:
|
||||||
|
return None
|
||||||
|
for alt in IMAGE_EXTS + VIDEO_EXTS + PDF_EXTS:
|
||||||
|
cand = stem + alt
|
||||||
|
if cand != orig and cand in existing:
|
||||||
|
return cand
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def plan(src: Path, max_words: int) -> None:
|
def plan(src: Path, max_words: int) -> None:
|
||||||
if not src.is_dir():
|
if not src.is_dir():
|
||||||
sys.exit(f"source not a directory: {src}")
|
sys.exit(f"source not a directory: {src}")
|
||||||
@@ -216,8 +429,6 @@ def plan(src: Path, max_words: int) -> None:
|
|||||||
if not descs_paths:
|
if not descs_paths:
|
||||||
sys.exit("no desc-full-*.tsv files found in /tmp/screenshot-rename")
|
sys.exit("no desc-full-*.tsv files found in /tmp/screenshot-rename")
|
||||||
|
|
||||||
# Map normalized-filename → AI description. Haiku may write the filename
|
|
||||||
# with or without U+202F; normalize on both sides.
|
|
||||||
descs = {}
|
descs = {}
|
||||||
bad_split = []
|
bad_split = []
|
||||||
for p in descs_paths:
|
for p in descs_paths:
|
||||||
@@ -237,15 +448,26 @@ def plan(src: Path, max_words: int) -> None:
|
|||||||
errors = list(bad_split)
|
errors = list(bad_split)
|
||||||
seen = {}
|
seen = {}
|
||||||
|
|
||||||
for actual in sorted(existing):
|
for orig in sorted(descs.keys()):
|
||||||
|
# Locate the actual file in src (may have an alt extension if Haiku
|
||||||
|
# echoed the resized .jpg).
|
||||||
|
if orig in existing:
|
||||||
|
actual = orig
|
||||||
|
else:
|
||||||
|
alt = _find_alt_extension(orig, existing)
|
||||||
|
if alt is None:
|
||||||
|
errors.append(f"src not found: {orig!r}")
|
||||||
|
continue
|
||||||
|
actual = alt
|
||||||
|
|
||||||
parts = parse_filename(actual)
|
parts = parse_filename(actual)
|
||||||
if parts is None:
|
if parts is None:
|
||||||
|
parts = synthesize_untagged_parts(src / actual)
|
||||||
|
if parts is None:
|
||||||
|
errors.append(f"can't parse: {actual!r}")
|
||||||
continue
|
continue
|
||||||
norm_name = norm_ws(actual)
|
|
||||||
desc = descs.get(norm_name)
|
desc = descs[orig]
|
||||||
if not desc:
|
|
||||||
errors.append(f"no desc for: {actual!r}")
|
|
||||||
continue
|
|
||||||
try:
|
try:
|
||||||
new = build_new_name(parts, desc, max_words)
|
new = build_new_name(parts, desc, max_words)
|
||||||
except ValueError as e:
|
except ValueError as e:
|
||||||
@@ -258,7 +480,9 @@ def plan(src: Path, max_words: int) -> None:
|
|||||||
errors.append(f"target exists in DEST: {new!r}")
|
errors.append(f"target exists in DEST: {new!r}")
|
||||||
continue
|
continue
|
||||||
if new in seen:
|
if new in seen:
|
||||||
errors.append(f"plan collision: {new!r} from {actual!r} and {seen[new]!r}")
|
errors.append(
|
||||||
|
f"plan collision: {new!r} from {actual!r} and {seen[new]!r}"
|
||||||
|
)
|
||||||
continue
|
continue
|
||||||
seen[new] = actual
|
seen[new] = actual
|
||||||
plan_rows.append((actual, new))
|
plan_rows.append((actual, new))
|
||||||
@@ -277,8 +501,8 @@ def plan(src: Path, max_words: int) -> None:
|
|||||||
f.write(f"{orig}\t{new}\n")
|
f.write(f"{orig}\t{new}\n")
|
||||||
print(f"\nplan saved: {plan_path}")
|
print(f"\nplan saved: {plan_path}")
|
||||||
if plan_rows:
|
if plan_rows:
|
||||||
print(f"sample (every {max(1, len(plan_rows)//6)}th row):")
|
|
||||||
step = max(1, len(plan_rows) // 6)
|
step = max(1, len(plan_rows) // 6)
|
||||||
|
print(f"sample (every {step}th row):")
|
||||||
for i in range(0, len(plan_rows), step):
|
for i in range(0, len(plan_rows), step):
|
||||||
orig, new = plan_rows[i]
|
orig, new = plan_rows[i]
|
||||||
print(f" {orig}\n → {new}\n")
|
print(f" {orig}\n → {new}\n")
|
||||||
@@ -350,6 +574,30 @@ def main() -> None:
|
|||||||
p_prep = sub.add_parser("prep", help="extract frames, resize, build batches")
|
p_prep = sub.add_parser("prep", help="extract frames, resize, build batches")
|
||||||
p_prep.add_argument("--src", type=Path, required=True)
|
p_prep.add_argument("--src", type=Path, required=True)
|
||||||
p_prep.add_argument("--batch-size", type=int, default=19)
|
p_prep.add_argument("--batch-size", type=int, default=19)
|
||||||
|
p_prep.add_argument(
|
||||||
|
"--year",
|
||||||
|
type=str,
|
||||||
|
default=None,
|
||||||
|
help="restrict to YYYY (matches embedded ts or btime)",
|
||||||
|
)
|
||||||
|
p_prep.add_argument(
|
||||||
|
"--include-untagged",
|
||||||
|
action="store_true",
|
||||||
|
help="include image files that lack a CleanShot/Screenshot prefix; "
|
||||||
|
"requires the folder to have ≥10 tagged files (configurable)",
|
||||||
|
)
|
||||||
|
p_prep.add_argument(
|
||||||
|
"--untagged-threshold",
|
||||||
|
type=int,
|
||||||
|
default=10,
|
||||||
|
help="minimum tagged-file count for a folder to be treated as a "
|
||||||
|
"screenshot dump (default 10)",
|
||||||
|
)
|
||||||
|
p_prep.add_argument(
|
||||||
|
"--allow-app-libraries",
|
||||||
|
action="store_true",
|
||||||
|
help="bypass the .photoslibrary / .lrlibrary etc. guard (DANGEROUS)",
|
||||||
|
)
|
||||||
|
|
||||||
p_plan = sub.add_parser("plan", help="build & validate rename plan")
|
p_plan = sub.add_parser("plan", help="build & validate rename plan")
|
||||||
p_plan.add_argument("--src", type=Path, required=True)
|
p_plan.add_argument("--src", type=Path, required=True)
|
||||||
@@ -360,7 +608,14 @@ def main() -> None:
|
|||||||
|
|
||||||
args = p.parse_args()
|
args = p.parse_args()
|
||||||
if args.cmd == "prep":
|
if args.cmd == "prep":
|
||||||
prep(args.src, args.batch_size)
|
prep(
|
||||||
|
args.src,
|
||||||
|
args.batch_size,
|
||||||
|
year=args.year,
|
||||||
|
include_untagged=args.include_untagged,
|
||||||
|
allow_app_libraries=args.allow_app_libraries,
|
||||||
|
untagged_threshold=args.untagged_threshold,
|
||||||
|
)
|
||||||
elif args.cmd == "plan":
|
elif args.cmd == "plan":
|
||||||
plan(args.src, args.max_words)
|
plan(args.src, args.max_words)
|
||||||
elif args.cmd == "execute":
|
elif args.cmd == "execute":
|
||||||
|
|||||||
Reference in New Issue
Block a user