Google’s latest multimodal model, gemini-2.5-flash-image-preview, is now available in the Call LLM node in Image generation mode. This model is designed for fast image generation and editing with strong consistency and creative control.
⸻
Key Capabilities
- Text → Image: Generate high-quality visuals from natural language prompts.
- Prompt-based Editing: Upload an image and describe edits (e.g., “remove the stain and blur the background”).
- Multi-Image Fusion: Provide multiple images and ask the model to blend them (e.g., “place the product from A on the desk in B”).
- Character & Style Consistency: Maintain identity or artistic style across edits by supplying reference images.
- Interleaved Output: Return both text and images together (e.g., illustrated tutorials, recipes).
- Safe by Default: Every output carries an invisible SynthID watermark.
⸻
Usage Patterns in HyperFlow
- Text → Image
- User prompt:
“A cinematic 16:9 shot of a neon cityscape at night”
- Edit an Image
- User prompt:
“Remove the logo and blur the background”
- Content:
input.png
- Multi-Image Fusion
- User prompt:
“Place the product from image A on the desk in image B”
- Content:
product.png, office.png
- Character or Style Consistency