chatgpt-thumbnail-spec_en.html
Filename convention note: this document uses the package standard of lowercase, hyphen-separated filenames with the language suffix _en.html.
This document defines the production standard for generating YouTube thumbnails from source images using ChatGPT or similar image-generation workflows. It exists to eliminate repeated manual post-processing, cropping, letterboxing, compositing failures, and instruction-text artifacts.
The governing rule is simple: source art is sacred. A thumbnail workflow succeeds only when it preserves all original visible content and expands to a production-ready 16:9 deliverable without requiring any further editing by the operator.
Each source image must produce one separate final thumbnail.
The default operation is horizontal outpainting, not cropping, not zooming, and not scene replacement.
Any result meeting one or more of the following conditions is an automatic failure:
Everything visible in the source image must remain fully visible in the final image. Important elements near the top or side edges require special protection.
Aggressive mode is permitted and often preferred when the operator wants more than neutral filler. In this mode, new content should meaningfully extend the narrative world on the left and right sides.
The following instruction block is the canonical production prompt:
You are performing production thumbnail outpainting. Task: Create ONE separate final image for EACH uploaded source image. Output requirements: - final size: 1920x1080 - aspect ratio: exact 16:9 - format: PNG - deliver as separate files in a ZIP - each output must be immediately usable as a YouTube thumbnail Transformation rules: - preserve 100% of the original uploaded image - do not crop any content visible in the original - do not combine source images - do not create a composite, collage, triptych, comparison board, or mockup - do not render instructions as visible text - do not add borders or empty space - do not shrink the original within a larger frame Required method: - use horizontal outpainting only - keep the original composition intact - extend the canvas left and right with new AI-generated content - maintain the same art style, lighting, palette, perspective, density, mood, symbolism, and narrative logic - all newly added content must feel like a natural continuation of the original scene Composition protection: - all original visible content must remain fully visible - protect important top-edge and side-edge details - do not clip symbols, faces, signs, crowns, pyramids, eyes, banners, or key props - preserve the original vertical framing unless explicitly told otherwise Mode: - aggressive expansion - add meaningful, scene-consistent information on the left and right sides - avoid generic filler
| Test | Pass condition |
|---|---|
| Separate file test | One final image per source. |
| Aspect ratio test | Exactly 16:9. |
| Resolution test | At least 1920×1080. |
| Full preservation test | Everything visible in the original remains visible. |
| No composite test | Only one artwork appears in each result. |
| No border test | No bars, margins, or decorative framing. |
| No text artifact test | No rendered prompt or instruction language. |
| Continuity test | New side content matches the original world convincingly. |
| Immediate upload test | The image can be uploaded directly to YouTube with no further editing. |
Never crop source art to fit 16:9. Preserve the original completely and outpaint horizontally until the image becomes a full production-ready thumbnail.
This rule should remain stable across future tools, sessions, automations, and human handoffs. Any workflow that requires routine manual re-processing has failed the production standard.