OpenRouter and Grok Models Research

VerifiedSafe

Research current OpenRouter API specifications and Grok model availability for SFUMATO. Focuses on the vision capability needed for the judge service to analyze generated images, and covers error handling, retry strategies, and integration patterns for the openrouter_tool.py implementation.

Sby Skills Guide Bot
DevelopmentIntermediate
1206/2/2026
Claude Code
#openrouter#grok#vision#api-integration#xai

Recommended for

Our review

This skill researches the OpenRouter API to integrate Grok models, including vision capabilities, for evaluating generated images in the SFUMATO project.

Strengths

  • Comprehensive coverage of OpenRouter APIs (authentication, endpoints, request formats)
  • Precise identification of Grok models supporting vision, critical for judge_service
  • Documentation of base64 image formats for vision calls
  • Error handling and retry strategy adapted to common HTTP status codes

Limitations

  • Depends on freshness of web search results (models or APIs may change)
  • Provides a blueprint, not a ready-to-use implementation
  • Does not cover alternative providers or models
When to use it

When implementing or debugging OpenRouter integration for text and vision calls with Grok.

When not to use it

If you are using a different API provider or if the Grok models do not require vision capabilities.

Security analysis

Safe
Quality score95/100

This skill instructs only research activities using WebSearch, WebFetch, and Read tools to gather API documentation. No destructive, exfiltrating, or obfuscated actions are present. No executable commands or dangerous operations are suggested.

No concerns found

Examples

OpenRouter research for Grok vision
Research the OpenRouter API to find which Grok models support vision/image input for evaluating generated images in our judge service.
OpenRouter error handling
Research OpenRouter error handling patterns: how to handle 429, 402, and 503 errors, and what retry strategy to use for async httpx calls.
OpenRouter image input format
Find the correct message format for sending base64 images to OpenRouter's vision API for Grok models in a chat completion request.

name: openrouter-research description: Research OpenRouter API docs, available Grok model IDs, vision capability for the judge service, and integration patterns. Use when implementing openrouter_tool.py, when checking which Grok model supports vision/image input for judge_service.py, when OpenRouter returns unexpected errors, or when verifying model availability and context limits. tools: WebSearch, WebFetch, Read

OpenRouter Research

Research current OpenRouter API specifications and Grok model availability for SFUMATO.

Context

All text LLM calls in SFUMATO go through app/tools/openrouter_tool.py:

  • prompt_service.generate_prompt() — Grok, text-only input
  • prompt_service.revise_prompt() — Grok, text-only input
  • judge_service.evaluate()Grok with vision (must analyze generated image)

ENV vars: OPENROUTER_API_KEY, OPENROUTER_BASE_URL=https://openrouter.ai/api/v1 Client: httpx.AsyncClient — OpenAI-compatible API

Read existing file first if it exists:

  • app/tools/openrouter_tool.py
  • config.py

Step 1: OpenRouter API Fundamentals

  1. Fetch: https://openrouter.ai/docs/api-reference/overview
  2. Fetch: https://openrouter.ai/docs/requests
  3. Search: OpenRouter API python httpx async chat completions 2025

Capture:

  • Base URL: https://openrouter.ai/api/v1
  • Auth header format (Authorization: Bearer vs API-Key)
  • Required request headers: HTTP-Referer, X-Title
  • Chat completions endpoint: POST /v1/chat/completions
  • Request body schema: model, messages, temperature, max_tokens, stream
  • Response schema: choices[0].message.content
  • Error response format and status codes (429, 402, 503, 500)

Step 2: Available Grok Models

  1. Fetch: https://openrouter.ai/models?q=grok
  2. Search: OpenRouter xAI Grok models list 2025
  3. Fetch: https://openrouter.ai/x-ai if available

For each available Grok model, capture:

  • Full model ID string (e.g. x-ai/grok-beta, x-ai/grok-vision-beta)
  • Context window size (tokens)
  • Supports image input (vision)? — critical for judge_service
  • Cost per 1M input/output tokens

Determine the best model for each use case:

  • prompt_gen: text-only, large context
  • prompt_revise: text-only
  • judge: MUST support vision/image input

Step 3: Image Input Format for Vision Models

Since judge_service.py sends a generated image to Grok for evaluation:

  1. Search: OpenRouter vision model image input base64 format 2025
  2. Fetch: https://openrouter.ai/docs/features/vision
  3. Search: OpenAI-compatible vision API image_url content type format

Capture:

  • Message content format for image:
    {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
    
    vs URL-based image reference
  • Max image size limits
  • Supported formats (JPEG, PNG, WebP)
  • Whether OpenRouter passes base64 to xAI directly or requires URL

Step 4: Error Handling and Retry Strategy

  1. Search: OpenRouter API error handling retry 429 503 2025
  2. Fetch: https://openrouter.ai/docs/api-reference/errors if available

Capture:

  • 429 rate limit: retry-after header? fixed backoff interval?
  • 402 payment/balance: raise immediately, no retry
  • 503 model unavailable: retry with backoff
  • Recommended timeout values for text calls vs vision calls
  • Max retries pattern

Output Format

Section 1: openrouter_tool.py Implementation Blueprint

# Async httpx client structure
BASE_URL = "https://openrouter.ai/api/v1"

# Required headers
headers = {
    "Authorization": f"Bearer {api_key}",
    "HTTP-Referer": "...",
    "X-Title": "SFUMATO",
    "Content-Type": "application/json",
}

# chat_completion(model, messages, temperature, max_tokens) -> str
# vision_completion(model, text_messages_plus_image_message) -> str
# Retry pattern for 429/503 (show how many retries, backoff)
# Error handling: which codes to retry vs raise immediately

Section 2: Model Selection Table

| Use Case | Recommended Model ID | Context | Vision | Notes | |----------|---------------------|---------|--------|-------| | prompt_gen | x-ai/grok-... | N | No | | | prompt_revise | x-ai/grok-... | N | No | | | judge | x-ai/grok-... | N | YES | Required |

Section 3: Image Message Format

Exact Python dict structure to use when sending image to judge:

{
    "role": "user",
    "content": [
        {"type": "text", "text": "...judge prompt..."},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
    ]
}

Section 4: Recommended Constants for config.py

OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
OPENROUTER_MODEL_PROMPT_GEN = "x-ai/grok-..."
OPENROUTER_MODEL_PROMPT_REVISE = "x-ai/grok-..."
OPENROUTER_MODEL_JUDGE = "x-ai/grok-vision-..."  # must support vision
OPENROUTER_TIMEOUT_TEXT = 30
OPENROUTER_TIMEOUT_VISION = 45
OPENROUTER_MAX_RETRIES = 2

Notes

  • Never log API key, response.text for large vision calls, or base64 image data
  • Image for judge: read from data/sessions/<id>/iter_<n>.jpg, encode as base64
  • The judge call is the only one that uses vision; wrap it separately in vision_completion()
  • HTTP-Referer should be "http://localhost:5000" for local dev
Related skills