Prompt-to-Prompt Image Editing with Cross-Attention Control

Prompt-to-Prompt Image Editing with Cross-Attention Control
Text-based image synthesis models are appealing to humans because they can verbally describe their intent. However, these models are challenging to edit because a small modification of the text prompt often leads to a completely different outcome. Editing is challenging for these models because an innate property of an editing technique is to preserve most of the original image, but in the text-based models, even a small modification of the text often leads to a completely different outcome. One way to preserve that is by providing a spatial mask to localize the edit, but that ignores the original structure and content within the masked region

The author presents a method for editing images that do not require a mask and demonstrate how this method can be used to edit images by replacing or adding words to the text prompt.

