Skip to main content

Dataset Preparation Guide

The quality of your training dataset directly determines the quality of your AI-generated creatives. This guide will help you prepare optimal datasets to teach the model about your brand, products, and creative style.

While AI can produce impressive results, the quality of your output directly depends on the quality of your input. Invest time in curating an excellent dataset to see the benefits in your generated creatives.

What Makes a Good Dataset

When selecting images for your dataset, apply this simple test: "Would I consider this a good image if it were generated by AI?" If the answer is yes, include it.

  • High Resolution Images: Include images of at least 1000px per dimension (e.g., 1920×1080, 1000×1000, 2000×2000). Higher resolution provides more detail for the model to learn from.

  • Sharp, Clear Images: Avoid blurry or pixelated images unless blur is intentionally part of your brand's style (e.g., motion blur for sports or action shots).

  • Diverse Perspectives: Capture your product in various angles, environments, and contexts:

    • Different viewing angles (front, side, three-quarter, etc.)
    • Various environments (indoor, outdoor, urban, nature)
    • Different lighting conditions (daylight, evening, studio lighting)
    • Range of colors if applicable
    • Various use cases (static display, in action, being used)
  • Product Consistency: Ensure all images within a category represent the same product or consistent style. For example, if training on "Ford Everest 2025," don't mix in other vehicle models or previous year versions.

  • Sufficient Quantity: Aim for at least 100 images per category. More diverse images lead to better results and greater creative flexibility.

Using Categories Effectively

Create separate categories to organize different aspects of your brand:

  • Product Categories: Separate different product lines or models (e.g., "Ford Everest," "Ford Raptor").
  • Style Categories: Group images that represent a consistent visual style or campaign aesthetic.
  • Environment Categories: Collect images showing specific settings or contexts.

This organization allows you to reference specific categories in your prompts, combining elements from different categories in your generated creatives.

General Category Strategy

Maintain a "General" category as a flexible container for:

  • Brand identity elements
  • Overall style references
  • Mixed contexts that don't fit neatly elsewhere

Examples of Effective Datasets

Note that the examples below are to illustrate the quality and diversity aspects that we recommend. To train an effective model, we need more examples as recommended above.

Ford AI Model Dataset

Category 1: Ford Everest

Ford Everest example 1
Ford Everest example 2
Ford Everest example 3
Ford Everest example 4
Ford Everest example 5
Ford Everest example 6
Ford Everest example 7
Ford Everest example 8

Category 2: Ford Raptor

Ford Raptor example 3
Ford Raptor example 5
Ford Raptor example 6
Ford Raptor example 7
Ford Raptor example 8
Ford Raptor example 9
Ford Raptor example 10
Ford Raptor example 11

Why this works: High-resolution images showing consistent a product in each category from multiple angles, in different environments and lighting conditions, with various colors represented.

Pepsi AI Model Dataset

Category: Pepsi Zero Sugar

Pepsi example 1
Pepsi example 2
Pepsi example 10
Pepsi example 11
Pepsi example 12
Pepsi example 13
Pepsi example 14

Why this works: Consistent product representation with various usage contexts and compositions.

Examples of Problematic Datasets

Pepsi

Pepsi bad example 3
Pepsi bad example 3
Pepsi bad example 3
Pepsi bad example 4

Firefighters

Firefighters bad example 1
Firefighters bad example 2
Firefighters bad example 3
Firefighters bad example 4

Problems:

  • Low resolution images (less than 1000×1000px)
  • Blurry or poorly focused shots
  • Mixed product models in the same category
  • Duplicate or nearly identical images
  • Watermarks