SNAG Screenshot-to-Markdown

Hotkey a screen region → Gemini vision → instant Markdown in your clipboard. Quick capture tool for developers.

SNAG is a screenshot-to-text CLI tool that turns any screen region into clean Markdown via vision AI. You hit a hotkey, drag to select a region, and the captured image is sent to Google Gemini (or OpenRouter/Z.AI) for transcription. The resulting Markdown — whether it's code, a diagram, a table, or plain text — lands in your clipboard instantly, ready to paste into an LLM prompt, documentation, or chat. The tool supports multi-monitor setups, works cross-platform (Linux X11/Wayland, Windows, macOS), and handles a wide range of content types including code blocks, UI elements, charts, and handwritten notes. Installation is dead simple with `uv tool install`, and configuration is a one-time `snag --setup` to pick your vision provider and API key. For developers, SNAG fills a very specific gap: getting visual content into text-based workflows without manual transcription. It's particularly useful when you need to paste error screenshots into bug reports, convert whiteboard diagrams into documentation, or feed visual context into an LLM conversation. The tool is in active beta development with regular updates.

Tags: devtools, screenshots, markdown, vision

Category: devtools

Tips

Install with `uv tool install git+https://github.com/am-will/snag.git` for the cleanest setup — no dependency conflicts
Bind SNAG to a global hotkey (e.g., Ctrl+Shift+S) for instant access without touching the terminal
Use Gemini Flash for speed on simple text captures, and Gemini Pro for complex diagrams or code with syntax highlighting
On macOS, grant Screen Recording permissions on first run — the tool will prompt you but requires a restart to take effect
Pair SNAG output with your OpenClaw agent by pasting captured Markdown directly into your chat for visual context

Community Feedback

Screenshot-to-text CLI tool powered by vision AI. Capture any region of your screen and instantly get a markdown description in your clipboard — ready to paste into an LLM.
— GitHub

SNAG is the kind of small utility that becomes indispensable once you start using it. Screenshot → Markdown in your clipboard in seconds.
— OpenClaw Community

Frequently Asked Questions

Which vision AI providers does SNAG support?

SNAG supports Google Gemini (recommended), OpenRouter (for access to multiple models), and Z.AI (GLM-4.6V). You choose your provider during the initial setup with `snag --setup`.

Does it work with multi-monitor setups?

Yes, SNAG has full multi-monitor support. You can select a region on any connected display, and it handles different DPI scaling across monitors.

Can SNAG handle code screenshots accurately?

Yes, it's particularly good with code. The vision model preserves syntax, indentation, and formatting. Output is wrapped in proper Markdown code blocks with language detection.

Is there a cost per screenshot?

Each capture costs a small amount of vision API tokens. With Gemini Flash, a typical screenshot costs fractions of a cent. Heavy users processing dozens of screenshots daily would still spend well under $1/month.