Bez popisu

adri efe04cadd4 idk před 1 měsícem
.gitignore 38edac768b essentials před 1 měsícem
.python-version 3d705f790b almost there před 1 měsícem
Dockerfile 3d705f790b almost there před 1 měsícem
LICENSE 38edac768b essentials před 1 měsícem
README.md 3d705f790b almost there před 1 měsícem
app.py efe04cadd4 idk před 1 měsícem
docker-compose.yml 49a6d214a1 update compose před 1 měsícem
env.example 3d705f790b almost there před 1 měsícem
main.py 3d705f790b almost there před 1 měsícem
pyproject.toml 3d705f790b almost there před 1 měsícem
uv.lock 3d705f790b almost there před 1 měsícem

README.md

POST Request Monitor & OpenAI Proxy 📬

A Docker app that proxies Google Gemini API requests to OpenAI-compatible endpoints with full multimodal support.

What It Does

  1. Monitors POST requests - Real-time web UI at http://localhost:5005
  2. Gemini → OpenAI Conversion - Automatically converts API formats
  3. Multimodal Support - Handles text, images, and video
  4. Request Forwarding - Sends to your OpenAI-compatible endpoint (vLLM, Ollama, OpenAI, etc.)
  5. Response Conversion - Converts OpenAI responses back to Gemini format

Perfect for: Using Gemini-format applications with local vision models (InternVL3, Qwen2-VL) or OpenAI API.

Multimodal Capabilities

  • Images: All formats (JPEG, PNG, WebP, etc.) - Universally supported
  • Video: MP4, WebM, etc. - Supported by video-capable models (InternVL3, Qwen2-VL, GPT-4o)
  • Multiple media: Send multiple images/videos in single request
  • Mixed content: Text + images + video together

Quick Start

1. Create the Docker Network

If you don't already have the llm_internal network:

docker network create llm_internal

2. Configure Environment Variables

Create a .env file (or copy from .env.example):

cp .env.example .env

Edit .env to set your OpenAI-compatible endpoint:

# Your OpenAI-compatible endpoint
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions

# API key if required
OPENAI_API_KEY=none

# Model name - IMPORTANT: Use the exact model name
# For InternVL3-8B-AWQ with vLLM/Ollama:
OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ

# Video format - Use 'openai' for video-capable models
# (InternVL3, Qwen2-VL, GPT-4o, etc.)
VIDEO_FORMAT=openai

For InternVL3-8B-AWQ: This model DOES support video - use VIDEO_FORMAT=openai

Note: If your other services are on the same llm_internal Docker network, you can use their container names instead of host.docker.internal. For example: http://vllm-server:8000/v1/chat/completions

3. Start the Service

Using Docker Compose (Recommended)

docker-compose up --build

Using Docker

docker build -t post-monitor .
docker run -p 5005:5000 --network llm_internal post-monitor

Usage

  1. View the Web UI: Open http://localhost:5005 in your browser

    • See all incoming requests
    • View converted OpenAI format
    • See responses from your endpoint
  2. Configure your application to send Gemini-format requests to:

    • http://localhost:5005/webhook/models/model:generateContent?key=none
    • Or any path under /webhook/
  3. The proxy will:

    • Extract text, images, and videos from Gemini format
    • Convert to OpenAI format
    • Forward to your OpenAI-compatible endpoint
    • Convert OpenAI response → Gemini format
    • Return to your application

The web interface will automatically refresh every 2 seconds to show new requests.

Example Requests

Text + Image + Video (Gemini Format)

# Text-only request
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{
        "text": "Explain quantum computing in simple terms"
      }]
    }],
    "generationConfig": {
      "maxOutputTokens": 4096
    }
  }'

# With image (base64)
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Describe this image"},
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w 0 image.jpg)'"
          }
        }
      ]
    }]
  }'

# With video (base64) - Requires VIDEO_FORMAT=openai
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Describe this video"},
        {
          "inline_data": {
            "mime_type": "video/mp4",
            "data": "'$(base64 -w 0 video.mp4)'"
          }
        }
      ]
    }],
    "generationConfig": {
      "maxOutputTokens": 4096
    }
  }'

# Multiple images in one request
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Compare these two images"},
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w 0 image1.jpg)'"
          }
        },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w 0 image2.jpg)'"
          }
        }
      ]
    }]
  }'

Features

  • Gemini → OpenAI Format Conversion: Automatically converts API formats
  • Image Support: Full support for images (JPEG, PNG, WebP, etc.)
  • Video Support: Configurable video handling for video-capable models
  • Request Forwarding: Proxies to your OpenAI-compatible endpoint
  • Response Conversion: Converts OpenAI responses back to Gemini format
  • Real-time Monitoring: Web UI shows all requests, conversions, and responses
  • Multiple Media: Handle multiple images/videos in single request
  • Catch-all Routes: Accepts any path under /webhook/
  • Auto-refresh UI: Updates every 2 seconds
  • Error Handling: Shows detailed errors for debugging
  • Request History: Stores last 50 requests in memory
  • Docker Network: Uses llm_internal network for container communication

Format Conversion Details

Gemini → OpenAI

Text:

  • contents[].parts[].textmessages[].content (text type)

Images (all formats supported):

  • contents[].parts[].inline_data (image/jpeg, image/png, etc.)
    • messages[].content (image_url type)
    • Format: data:image/jpeg;base64,{base64_data}
  • ✅ Universally supported by all vision models

Videos (format depends on VIDEO_FORMAT):

  • contents[].parts[].inline_data (video/mp4, etc.)
    • When VIDEO_FORMAT=openai: → messages[].content (image_url type with video MIME)
    • When VIDEO_FORMAT=vllm: → messages[].content (video_url type)
    • When VIDEO_FORMAT=skip: → Replaced with text note
  • ⚠️ Only supported by video-capable models (InternVL3, Qwen2-VL, GPT-4o, etc.)

Generation Config:

  • generationConfig.maxOutputTokensmax_tokens
  • generationConfig.temperaturetemperature

OpenAI → Gemini

Response:

  • choices[0].message.contentcandidates[0].content.parts[0].text

Usage:

  • usageusageMetadata

Compatible Endpoints

This proxy works with any OpenAI-compatible endpoint:

  • OpenAI API (api.openai.com)
  • Local LLMs (LM Studio, Ollama with OpenAI compatibility, Jan)
  • vLLM deployments
  • Text Generation WebUI (with OpenAI extension)
  • LocalAI
  • Any other OpenAI-compatible API

Endpoints

  • GET / - Web UI to view all requests, conversions, and responses
  • POST /webhook/* - Main proxy endpoint (converts Gemini → OpenAI → Gemini)
  • POST /clear - Clear all stored requests

Configuration

Environment Variables

  • OPENAI_ENDPOINT - Your OpenAI-compatible endpoint URL
  • OPENAI_API_KEY - API key if required (use 'none' if not needed)
  • OPENAI_MODEL - Model name to use (check your endpoint's available models)
    • For vLLM/Ollama with InternVL3: OpenGVLab/InternVL3-8B-AWQ
    • For OpenAI API: gpt-4o, gpt-4-turbo, etc.
    • For SGLang: Use exact model name from deployment
  • VIDEO_FORMAT - How to send video content (default: 'openai')
    • openai - Standard format (use for InternVL3, Qwen2-VL, GPT-4o, vLLM, Ollama)
    • vllm - Experimental vLLM-specific format (try if 'openai' fails)
    • skip - Don't send video (use for endpoints without video support)
    • error - Fail if video is present

Docker Networking

The proxy uses the llm_internal Docker network for communication with other services:

Communication between containers on same network:

# If your vLLM/Ollama server is also on llm_internal network
OPENAI_ENDPOINT=http://vllm-container-name:8000/v1/chat/completions

Communication with host services:

# For services running on your host machine (not in Docker)
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions

Port mapping:

  • Host: localhost:5005
  • Container: port 5000
  • Access web UI: http://localhost:5005
  • Send requests to: http://localhost:5005/webhook/*

Troubleshooting

SGLang Image Processing Errors

If you see errors like "cannot identify image file" with SGLang:

  1. Check the model name: SGLang requires the exact model name from your deployment

    # Check available models
    curl http://localhost:8000/v1/models
       
    # Set in .env
    OPENAI_MODEL=your-actual-model-name
    
  2. Verify base64 encoding: Check the console logs for "Base64 data appears valid"

    • If validation fails, the base64 data from your app might be corrupted
  3. Test with simple image: Try with a small test image first

    # Create test image and convert to base64
    echo "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==" > test.b64
    
  4. Check SGLang logs: Look for more detailed errors in your SGLang server logs

  5. Model compatibility: Ensure your SGLang model supports vision/multimodal inputs

    • Not all models work with images
    • Check your model's documentation

Connection Issues

If the proxy can't reach your endpoint:

  1. Check if your endpoint is running: curl http://localhost:8000/v1/models
  2. Verify the endpoint URL in .env
  3. Make sure you're using host.docker.internal for host services
  4. Check Docker logs: docker-compose logs -f

Supported Media Types

Images (✅ Always Supported)

All vision models support images. The proxy handles these formats:

  • image/jpeg
  • image/png
  • image/webp
  • image/gif
  • Any image MIME type

Console output when processing images:

🖼️  Adding image: image/jpeg
📊 Media summary:
   Images: 2 (image/jpeg, image/png)

Videos (⚠️ Model-Dependent)

Video support depends on your model and VIDEO_FORMAT setting:

Supported formats:

  • video/mp4
  • video/mpeg
  • video/webm
  • Any video MIME type

Console output when processing video:

# When VIDEO_FORMAT=openai (sending video)
📹 Adding video (OpenAI format): video/mp4
📊 Media summary:
   Videos: sent as openai (video/mp4)

# When VIDEO_FORMAT=skip (skipping video)
⏭️  Skipping video: video/mp4 (VIDEO_FORMAT=skip)
📊 Media summary:
   Videos: skipped (video/mp4)

Mixed Media Requests

You can send text + multiple images + video in a single request:

{
  "contents": [{
    "parts": [
      {"text": "Analyze these media files"},
      {"inline_data": {"mime_type": "image/jpeg", "data": "..."}},
      {"inline_data": {"mime_type": "image/png", "data": "..."}},
      {"inline_data": {"mime_type": "video/mp4", "data": "..."}}
    ]
  }]
}

Video Support by Runner/Model

✅ VIDEO SUPPORTED (use VIDEO_FORMAT=openai):

  • vLLM with video-capable models:
    • InternVL3-8B-AWQ ✅
    • Qwen2-VL series ✅
    • LLaVA-Video ✅
  • Ollama with video models:
    • InternVL ✅
    • Qwen2-VL ✅
  • OpenAI API:
    • gpt-4o ✅
    • gpt-4-turbo ✅

❌ VIDEO NOT SUPPORTED (use VIDEO_FORMAT=skip):

  • SGLang - Does not support video
  • Most text-only models - Even if served via vLLM/Ollama
  • Image-only vision models - Can only process images, not video

InternVL3-8B-AWQ Configuration

If you're using InternVL3-8B-AWQ (which does support video), your .env should be:

OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions
OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ
VIDEO_FORMAT=openai

Troubleshooting InternVL3 video:

  1. Verify your vLLM/Ollama server is serving InternVL3: curl http://localhost:8000/v1/models
  2. Check the model name exactly matches what the server reports
  3. If you get "cannot identify image" errors:
    • Your runner might not have video support enabled
    • Try VIDEO_FORMAT=vllm as an alternative
    • Check your vLLM/Ollama version supports video

How to test if your setup supports video:

  1. Set VIDEO_FORMAT=openai
  2. Send a test request with video
  3. Check the logs - if you see errors about "cannot identify image", try VIDEO_FORMAT=skip

Why video might fail:

  1. Model doesn't support video (only images)
  2. Runner doesn't support video
  3. Wrong VIDEO_FORMAT for your runner

Notes

  • Requests and responses are stored in memory only (not persisted)
  • Maximum 50 requests are kept in memory
  • The app runs on port 5000 by default
  • Base64 data is truncated in the web UI for readability but fully sent to the endpoint