Home Software Development MediaPipe On-Machine Textual content-to-Picture Era Resolution Now Accessible for Android Builders

MediaPipe On-Machine Textual content-to-Picture Era Resolution Now Accessible for Android Builders

MediaPipe On-Machine Textual content-to-Picture Era Resolution Now Accessible for Android Builders


Posted by Paul Ruiz – Senior Developer Relations Engineer, and Kris Tonthat – Technical Author

Earlier this 12 months, we previewed on-device text-to-image era with diffusion fashions for Android by way of MediaPipe Options. In the present day we’re pleased to announce that that is obtainable as an early, experimental answer, Picture Generator, for builders to check out on Android gadgets, permitting you to simply generate photographs completely on-device in as shortly as ~15 seconds on greater finish gadgets. We are able to’t wait to see what you create!

There are three major ways in which you need to use the brand new MediaPipe Picture Generator process:

  1. Textual content-to-image era primarily based on textual content prompts utilizing customary diffusion fashions.
  2. Controllable text-to-image era primarily based on textual content prompts and conditioning photographs utilizing diffusion plugins.
  3. Personalized text-to-image era primarily based on textual content prompts utilizing Low-Rank Adaptation (LoRA) weights that permit you to create photographs of particular ideas that you just pre-define on your distinctive use-cases.


Earlier than we get into the entire enjoyable and thrilling components of this new MediaPipe process, it’s vital to know that our Picture Era API helps any fashions that precisely match the Secure Diffusion v1.5 structure. You need to use a pretrained mannequin or your fine-tuned fashions by changing it to a mannequin format supported by MediaPipe Picture Generator utilizing our conversion script.

You may also customise a basis mannequin by way of MediaPipe Diffusion LoRA fine-tuning on Vertex AI, injecting new ideas right into a basis mannequin with out having to fine-tune the entire mannequin. You will discover extra details about this course of in our official documentation.

If you wish to do this process out at the moment with none customization, we additionally present hyperlinks to some verified working fashions in that very same documentation.

Picture Era by means of Diffusion Fashions

Essentially the most simple solution to attempt the Picture Generator process is to offer it a textual content immediate, after which obtain a outcome picture utilizing a diffusion mannequin.

Like MediaPipe’s different duties, you’ll begin by creating an choices object. On this case you’ll solely must outline the trail to your basis mannequin recordsdata on the system. After you have that choices object, you’ll be able to create the ImageGenerator.

val choices = ImageGeneratorOptions.builder().setImageGeneratorModelDirectory(MODEL_PATH).construct()
imageGenerator = ImageGenerator.createFromOptions(context, choices)

After creating your new ImageGenerator, you’ll be able to create a brand new picture by passing within the immediate, the variety of iterations the generator ought to undergo for producing, and a seed worth. It will run a blocking operation to create a brand new picture, so you’ll want to run it in a background thread earlier than returning your new Bitmap outcome object.

val outcome = imageGenerator.generate(prompt_string, iterations, seed)
val bitmap = BitmapExtractor.extract(outcome?.generatedImage()

Along with this easy enter in/outcome out format, we additionally assist a approach so that you can step by means of every iteration manually by means of the execute() perform, receiving the intermediate outcome photographs again at totally different phases to point out the generative progress. Whereas getting intermediate outcomes again isn’t really useful for many apps as a consequence of efficiency and complexity, it’s a good solution to reveal what’s taking place beneath the hood. This is a bit more of an in-depth course of, however you could find this demo, in addition to the opposite examples proven on this submit, in our official instance app on GitHub.

Moving image of an image generating in MediaPipe from the following prompt: a colorful cartoon racoon wearing a floppy wide brimmed hat holding a stick walking through the forest, animated, three-quarter view, painting

Picture Era with Plugins

Whereas having the ability to create new photographs from solely a immediate on a tool is already a enormous step, we’ve taken it a bit of additional by implementing a brand new plugin system which allows the diffusion mannequin to just accept a situation picture together with a textual content immediate as its inputs.

We presently assist three alternative ways that you would be able to present a basis on your generations: facial buildings, edge detection, and depth consciousness. The plugins provide the capability to supply a picture, extract particular buildings from it, after which create new photographs utilizing these buildings.

Moving image of an image generating in MediaPipe from a provided image of a beige toy car, plus the following prompt: cool green race car

LoRA Weights

The third main characteristic we’re rolling out at the moment is the power to customise the Picture Generator process with LoRA to show a basis mannequin a couple of new idea, comparable to particular objects, individuals, or kinds introduced throughout coaching. With the brand new LoRA weights, the Picture Generator turns into a specialised generator that is ready to inject particular ideas into generated photographs.

LoRA weights are helpful for instances the place you might have considered trying each picture to be within the fashion of an oil portray, or a specific teapot to look in any created setting. You will discover extra details about LoRA weights on Vertex AI within the MediaPipe Secure Diffusion LoRA mannequin card, and create them utilizing this pocket book. As soon as generated, you’ll be able to deploy the LoRA weights on-device utilizing the MediaPipe Duties Picture Generator API, or for optimized server inference by means of Vertex AI’s one-click deployment.

Within the instance under, we created LoRA weights utilizing a number of photographs of a teapot from the Dreambooth teapot coaching picture set. Then we use the weights to generate a brand new picture of the teapot in numerous settings.

A grid of four photos of teapots generated with training prompt 'a photo of a monadikos teapot'on the left, and a moving image showing an image being generated in MediaPipe from the propmt 'a bright purple monadikos teapot sitting in top of a green table with orange teacups'

Picture era with the LoRA weights

Subsequent Steps

That is only the start of what we plan to assist with on-device picture era. We’re trying ahead to seeing the entire nice issues the developer group builds, so you’ll want to submit them on X (formally Twitter) with the hashtag #MediaPipeImageGen and tag @GoogleDevs. You possibly can try the official pattern on GitHub demonstrating every little thing you’ve simply discovered about, learn by means of our official documentation for much more particulars, and regulate the Google for Builders YouTube channel for updates and tutorials as they’re launched by the MediaPipe crew.


We’d wish to thank all crew members who contributed to this work: Lu Wang, Yi-Chun Kuo, Sebastian Schmidt, Kris Tonthat, Jiuqiang Tang, Khanh LeViet, Paul Ruiz, Qifei Wang, Yang Zhao, Yuqi Li, Lawrence Chan, Tingbo Hou, Joe Zou, Raman Sarokin, Juhyun Lee, Geng Yan, Ekaterina Ignasheva, Shanthal Vasanth, Glenn Cameron, Mark Sherwood, Andrei Kulik, Chuo-Ling Chang, and Matthias Grundmann from the Core ML crew, in addition to Changyu Zhu, Genquan Duan, Bo Wu, Ting Yu, and Shengyang Dai from Google Cloud.



Please enter your comment!
Please enter your name here