Apple made an AI image tool that lets you make edits by describing them
Apple unveils text-based image editing with MGIE, allowing users to describe desired changes for natural, intuitive photo manipulation. This model empowers seamless editing without traditional software.
Apple researchers, in collaboration with the University of California, Santa Barbara, have developed a novel model named MGIE (MLLM-Guided Image Editing) for text-based image manipulation. This model eliminates the need for traditional photo editing software by allowing users to describe their desired changes in natural language.
MGIE utilizes multimodal language models (MLLMs) to capably interpret user prompts and translate them into specific editing instructions. This opens up a vast array of editing possibilities, ranging from basic tasks like cropping and resizing to intricate modifications like adjusting object shapes or brightness levels.
The researchers describe two key functionalities of MGIE: prompt interpretation and edit visualization. Initially, the model deciphers the user's intended edits, comprehending nuances and translating them into actionable instructions. Subsequently, it leverages its understanding to project the anticipated visual outcome. For example, a request to "make the sky bluer" translates to increasing the saturation of the sky region in the image.
Users interact with MGIE by simply typing their desired modifications. The research paper provides illustrative examples, showcasing how edits as simple as adding vegetable toppings to a pizza image or increasing contrast for a brighter effect in a photo can be achieved through text prompts.
According to the research team, "MGIE surpasses previous approaches by deriving explicit visual-aware intentions from user prompts, leading to more precise and targeted image editing. Extensive evaluations across various editing tasks demonstrate significant performance improvements with high efficiency. We believe this MLLM-guided framework paves the way for future advancements in vision-and-language research."
Apple has made MGIE readily accessible through both GitHub download and a web demo on Hugging Face Spaces. While the company's future plans for the model remain undisclosed, its release aligns with CEO Tim Cook's statement regarding the integration of more AI features into Apple devices in 2024. This initiative follows the release of MLX, an open-source machine learning framework designed to facilitate AI model training on Apple Silicon chips.
This development positions Apple as a potential contender in the generative AI space, traditionally dominated by players like Microsoft, Meta, and Google. With MGIE, Apple demonstrates its dedication to exploring innovative approaches to user-centric image manipulation.