Nano Banana Pro (Gemini Image Gen)

Generate or edit images via Gemini 3 Pro Image. Supports text-to-image, image editing, and multi-image composition.

Nano Banana Pro is OpenClaw's skill for generating and editing images using Google's Gemini image generation models. Originally built around Gemini 2.0 Flash's native image output capability (internally codenamed 'Nano Banana'), it now supports the full lineage: Gemini 2.0 Flash, 2.5 Flash Image, and Gemini 3 Pro Image — each bringing better quality, faster generation, and finer creative control. The architecture is elegant: instead of a separate image model, Gemini generates images natively within its multimodal output. You send a text prompt (or text + reference images), and Gemini returns both text and generated images in a single response. This means you can have conversational image editing — 'make the sky more dramatic' or 'add a coffee cup to the table' — without switching models or APIs. The skill wraps Google's Generative AI Python SDK via `uv` (the fast Python package manager), so there's no heavy dependency installation. It supports text-to-image generation, image editing with natural language, multi-image composition, and style transfer. The Gemini API's free tier (60 RPM, 1000 RPD with a personal Google account) makes this one of the most accessible AI image generation options available. Compared to DALL-E 3 or Midjourney, Gemini's image generation excels at text rendering within images (signs, labels, UI mockups), following complex compositional prompts, and maintaining consistency across edits. The 2.5 Flash Image variant topped the LMArena image-editing leaderboard when released. Best suited for: quick image generation without OpenAI API costs, image editing workflows where you iterate with natural language, creating presentation slides and mockups, anyone wanting free-tier AI image generation.

Tags: image-generation, ai, gemini, creative

Category: AI

Use Cases

  • Quick image generation for social media posts and blog headers
  • Iterative image editing with natural language
  • Creating presentation slide visuals and mockups
  • Generating product photos and marketing materials
  • Text-heavy image generation (infographics, signs, UI screenshots)
  • Style transfer: apply one image's style to another

Tips

  • Use Gemini's conversational nature: generate an image, then refine with 'make the colors warmer'
  • For text-in-images (signs, UI mockups), Gemini significantly outperforms DALL-E 3
  • Pass reference images alongside text prompts for style matching or editing
  • Use the free tier for prototyping, then switch to paid for production batch generation
  • Combine with the nano-pdf skill to generate and insert images into presentations
  • For best quality, use specific detailed prompts rather than vague descriptions
  • Multi-image composition works well for before/after comparisons and collages

Known Issues & Gotchas

  • Requires GEMINI_API_KEY env var — get one from ai.google.dev
  • Image generation is subject to Google's safety filters — some prompts will be blocked without clear explanation
  • Free tier has rate limits: 60 requests/min and 1,000 requests/day
  • Earlier model versions (Gemini 2.0 Flash, 3 Pro Preview) have been deprecated — check current model availability
  • Generated images may have artifacts with complex scenes — iterate with follow-up prompts
  • The skill uses `uv` to manage Python dependencies — install uv first via Homebrew
  • Image output quality varies significantly between model versions — 2.5 Flash Image is the sweet spot

Alternatives

  • OpenAI Image Gen (DALL-E 3 / GPT Image)
  • Stable Diffusion (local)
  • Midjourney
  • Flux (Black Forest Labs)

Community Feedback

Google improves Gemini AI image editing with 'nano banana' model. Gemini 2.5 Flash Image is currently atop LMArena's image-editing leaderboard.

— Ars Technica

Google's new AI image model gives users finer control over editing photos, a step meant to catch up with OpenAI's GPT Image capabilities.

— TechCrunch

We have suspended all use of the API for the time being and would like more information about rate limits and content filtering on image generation.

— Reddit r/GeminiAI

Gemini 2.0 Flash is making waves with its groundbreaking native image generation. This 'workhorse' AI now crafts and edits visuals directly from text prompts in a single API call.

— Medium

Configuration Examples

Setup and basic generation

# Install uv
brew install uv

# Set API key
export GEMINI_API_KEY="your-key-from-ai.google.dev"

# Generate an image
uv run nano-banana-pro "A minimalist logo for a coffee shop in warm earth tones"

Edit an existing image

# Edit with reference image
uv run nano-banana-pro --image photo.jpg "Remove the background and replace with a gradient sunset"

Multi-image composition

# Combine multiple images
uv run nano-banana-pro --image logo.png --image bg.jpg "Place the logo centered on the background with a subtle drop shadow"

Installation

brew install uv

Homepage: https://ai.google.dev/

Source: bundled