Dataset Preparation Guide
The quality of your training dataset directly determines the quality of your AI-generated creatives. This guide will help you prepare optimal datasets to teach the model about your brand, products, and creative style.
While AI can produce impressive results, the quality of your output directly depends on the quality of your input. Invest time in curating an excellent dataset to see the benefits in your generated creatives.
What Makes a Good Dataset
When selecting images for your dataset, apply this simple test: "Would I consider this a good image if it were generated by AI?" If the answer is yes, include it.
-
High Resolution Images: Include images of at least 1000px per dimension (e.g., 1920×1080, 1000×1000, 2000×2000). Higher resolution provides more detail for the model to learn from.
-
Sharp, Clear Images: Avoid blurry or pixelated images unless blur is intentionally part of your brand's style (e.g., motion blur for sports or action shots).
-
Diverse Perspectives: Capture your product in various angles, environments, and contexts:
- Different viewing angles (front, side, three-quarter, etc.)
- Various environments (indoor, outdoor, urban, nature)
- Different lighting conditions (daylight, evening, studio lighting)
- Range of colors if applicable
- Various use cases (static display, in action, being used)
-
Product Consistency: Ensure all images within a category represent the same product or consistent style. For example, if training on "Ford Everest 2025," don't mix in other vehicle models or previous year versions.
-
Sufficient Quantity: Aim for at least 100 images per category. More diverse images lead to better results and greater creative flexibility.
Using Categories Effectively
Create separate categories to organize different aspects of your brand:
- Product Categories: Separate different product lines or models (e.g., "Ford Everest," "Ford Raptor").
- Style Categories: Group images that represent a consistent visual style or campaign aesthetic.
- Environment Categories: Collect images showing specific settings or contexts.
This organization allows you to reference specific categories in your prompts, combining elements from different categories in your generated creatives.
General Category Strategy
Maintain a "General" category as a flexible container for:
- Brand identity elements
- Overall style references
- Mixed contexts that don't fit neatly elsewhere
Examples of Effective Datasets
Note that the examples below are to illustrate the quality and diversity aspects that we recommend. To train an effective model, we need more examples as recommended above.
Ford AI Model Dataset
Category 1: Ford Everest
Category 2: Ford Raptor
Why this works: High-resolution images showing consistent a product in each category from multiple angles, in different environments and lighting conditions, with various colors represented.
Pepsi AI Model Dataset
Category: Pepsi Zero Sugar
Why this works: Consistent product representation with various usage contexts and compositions.
Examples of Problematic Datasets
Pepsi
Firefighters
Problems:
- Low resolution images (less than 1000×1000px)
- Blurry or poorly focused shots
- Mixed product models in the same category
- Duplicate or nearly identical images
- Watermarks