Pidgeot: AI-Powered Ad Image Compliance Validation System
By ZoomInfo Engineering, Software Engineering Intern, Alara Dasdan, August 13, 2024Problem Statement
In the fast-paced world of digital advertising, ensuring that visual assets align with publisher policies is critical for brand safety. Demand-Side Platforms (DSPs) and ad exchanges enforce strict guidelines, making it challenging to manage the vast volumes of images advertisers submit. ZoomInfo Marketing’s clients need to know immediately if their advertisements (ads) are flagged and rejected by ad exchanges. Currently, our process relies heavily on manual review, which creates a bottleneck as our customer base grows.
The primary issue is the need for a scalable solution to moderate visual assets. The current manual review process, while thorough, is labor-intensive and takes time, making it difficult to handle a high volume of images efficiently. Publishers will block ads that may threaten brand image, and the delay between uploading ad images and receiving feedback—especially if a potential ad is rejected—could significantly lower the quality of user experience. The solution must not only speed up the review process but also ensure consistency in evaluating images according to DSP and ad exchange policies. The functional requirements for this project are to create an automated, scalable solution, reduce the review time to a few seconds, and standardize ad image review results to improve the overall user experience.
Solution: Introducing “Pidgeot”
To address these challenges, we developed Pidgeot, a service that uses an AI vision model to automatically evaluate images for ad campaigns. The system ensures that assets comply with DSP policies by classifying images that violate guidelines into specific categories, allowing for quick filtering and feedback to customers.
The system architecture comprises four primary components:
- Campaign Orchestration API: Provides instructions to the system, stores images uploaded by customers in Amazon S3, stores validation results in a Tower database, and creates the necessary ad campaigns.
- Validation Queue: Ads are placed in a queue for validation after campaign creation.
- Vision Model: Regularly evaluates images from the validation queue.
Several AI vision models were evaluated to determine the best fit for this task:
– AWS Rekognition
– Anthropic Vision
– YOLO V8
– MidJourney Describe
– GPT-4 Vision
Testing the models primarily focused on stress testing and prompt engineering to evaluate both the customizability of the input prompts and the models’ accuracy even in “gray areas” such as smaller or concealed flaggable items within images. AWS Rekognition and YOLO V8 were found to have lower accuracy and less prompt customizability for this task in our setting. Anthropic Vision was ruled out due to its requirement for NSFW content, which violated DSP policy, and its terms of use prevented evaluation of such content. GPT-4 Vision was ultimately chosen for its higher accuracy rate and flexible input prompt options.
We tested GPT-4 Vision with up to 100 ad images, producing benchmarks and iteratively improving the input prompt. Benchmark testing indicated an average evaluation time of 3-5 seconds per image. To enhance scalability and maintain UI responsiveness, we adopted an asynchronous approach, allowing users to continue working with a responsive interface while GPT-4 processes the images. This approach also removes dependencies, ensuring that the API continues handling other requests concurrently, even if image moderation goes offline. For added security, a retry method is implemented in the task runner to re-run batches if they fail, and images flagged for manual review if the model cannot process them or rejects them, providing an additional layer of security against false positives. The model also runs on “low res” mode which sends 512 pixel by 512 pixel versions of the image and represents it with a budget of 85 tokens—this approach allows the API to return faster responses and consume fewer input tokens for use cases that do not need high detail.
Architecture
The moderation criteria of material for the Open AI client to flag in an image include:
- Offensive, harmful, or violent content
- Politically sensitive material
- Religious or dogmatic content
- Controversial, polarizing content
- Hate speech and discriminating content
- Explicit or suggestive content
- Controlled substances
- Containing words featured in the flagged word list
The GPT-4 model processes an ad image provided by the Ad database and outputs the validation results in a JSON type that includes:
- Flagged: A boolean value indicating if the image is flagged as inappropriate.
- Categories: A plaintext list of the violated categories.
- Categories numerical: An integer list corresponding to the violated categories.
- Image description: A plaintext description of the image interpreted by the model.
- Reasoning: A plaintext description of why the image is flagged or not.
- Violations: A plaintext description of what exactly in the image violates the moderation criteria if the image has been flagged.
- Confidence: A float value confidence score (0-100%) of the model’s evaluation.
Unlike pre-existing validation methods, this model does not check for URL validity, dimensions, or image type, as ad images are sourced from a pre-validated database. The model’s validation runs every five minutes, processing batches of 50 images.
The Ad Image Validation database will be populated and updated accordingly with ads from the existing Ad database with the following attributes. The category attributes are populated with a confidence score when the vision model associates the ad image with a certain moderation criteria or approval for use.
Here’s an example of the JSON output generated when GPT-4 Vision evaluates an ad for approval:
{
“flagged”: false,
“categories”: [],
“categories numerical”: [],
“descriptions”: “The image is a vertical banner advertisement for ZoomInfo, with text that reads
\’Get a FREE Customized Prospecting List Now!\’ and a button labeled \’Start Here\’. The background is dark with gradient shades of purple and blue, and features abstract circular shapes and lines.”,
“reasoning”: “The image is a promotional banner advertising a service without containing any offensive or prohibited content.”,
“violations”: “None”,
“confidence”: 100.0
}
Summary and Recommendations for the Future
Implementing GPT-4 Vision for ad image moderation has significantly enhanced the speed and accuracy of image evaluations. This solution addresses the scalability challenges of manual review, enabling publishers to better manage their brand image while adhering to DSP guidelines.
For future updates, it is crucial to continuously test GPT-4 Vision’s effectiveness and fine-tune the moderation parameters and manual review policy based on initial feedback and performance metrics. This may include more robust handling of false negatives and positives. Additionally, the service can be expanded to support DHTML and video ads. By adopting these recommendations, we can further refine our ad image moderation strategies, enhancing brand safety and operational efficiency in the ever-evolving digital advertising landscape.