Adaptive prompt elicitation for text-to-image generation
| dc.contributor | Aalto-yliopisto | fi |
| dc.contributor | Aalto University | en |
| dc.contributor.advisor | Hegemann, Lena | |
| dc.contributor.author | Wen, Xinyi | |
| dc.contributor.school | Perustieteiden korkeakoulu | fi |
| dc.contributor.school | School of Science | en |
| dc.contributor.supervisor | Oulasvirta, Antti | |
| dc.date.accessioned | 2025-10-21T17:02:24Z | |
| dc.date.available | 2025-10-21T17:02:24Z | |
| dc.date.issued | 2025-09-29 | |
| dc.description.abstract | Text-to-image generative models have rapidly advanced in producing high-quality and diverse visual content. However, aligning generated outputs with user intent remains challenging, particularly when users have a specific vision but struggle to articulate it precisely in natural language, leading to tedious trial-and-error refinement. This thesis presents Adaptive Prompt Elicitation, a principled method designed to bridge the gap between user intent and effective prompts for text-to-image generation. Building on an information-theoretic foundation, the approach contributes two key innovations: (1) a prompt elicitation framework that interactively elicits user intent through informative queries and automatically generates optimized prompts; (2) an efficient query selection strategy that maximizes expected alignment gain based on Bayesian experimental design. We evaluate the method on two established benchmarks for complex text-to-image generation—IDEA-Bench and DesignBench—comparing it against automatic prompt optimization, in-context query generation, and non-optimized baselines. Results demonstrate that Adaptive Prompt Elicitation achieves superior effectiveness and efficiency in aligning generated images with target intent while maintaining consistent performance across diverse scenarios. Overall, this work contributes a systematic methodology for interactive prompt optimization, advancing the usability, interpretability, and reliability of text-to-image generation systems. | en |
| dc.format.extent | 62 | |
| dc.format.mimetype | application/pdf | en |
| dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/140253 | |
| dc.identifier.urn | URN:NBN:fi:aalto-202510218421 | |
| dc.language.iso | en | en |
| dc.programme | Master's Programme in Computer, Communication and Information Sciences | en |
| dc.programme.major | Machine Learning, Data Science and Artificial Intelligence | en |
| dc.subject.keyword | text-to-image generation | en |
| dc.subject.keyword | human-AI alignment | en |
| dc.subject.keyword | Bayesian experimental design | en |
| dc.subject.keyword | preference elicitation | en |
| dc.subject.keyword | prompt optimization | en |
| dc.subject.keyword | interactive machine learning | en |
| dc.title | Adaptive prompt elicitation for text-to-image generation | en |
| dc.type | G2 Pro gradu, diplomityö | fi |
| dc.type.ontasot | Master's thesis | en |
| dc.type.ontasot | Diplomityö | fi |
| local.aalto.electroniconly | yes | |
| local.aalto.openaccess | no |