No Description

adri 3d705f790b almost there		1 month ago
.gitignore	3d705f790b almost there	1 month ago
.python-version	3d705f790b almost there	1 month ago
Dockerfile	3d705f790b almost there	1 month ago
README.md	3d705f790b almost there	1 month ago
app.py	3d705f790b almost there	1 month ago
docker-compose.yml	3d705f790b almost there	1 month ago
env.example	3d705f790b almost there	1 month ago
main.py	3d705f790b almost there	1 month ago
pyproject.toml	3d705f790b almost there	1 month ago
uv.lock	3d705f790b almost there	1 month ago

POST Request Monitor & OpenAI Proxy 📬

A Docker app that proxies Google Gemini API requests to OpenAI-compatible endpoints with full multimodal support.

What It Does

Monitors POST requests - Real-time web UI at http://localhost:5005
Gemini → OpenAI Conversion - Automatically converts API formats
Multimodal Support - Handles text, images, and video
Request Forwarding - Sends to your OpenAI-compatible endpoint (vLLM, Ollama, OpenAI, etc.)
Response Conversion - Converts OpenAI responses back to Gemini format

Perfect for: Using Gemini-format applications with local vision models (InternVL3, Qwen2-VL) or OpenAI API.

Multimodal Capabilities

✅ Images: All formats (JPEG, PNG, WebP, etc.) - Universally supported
✅ Video: MP4, WebM, etc. - Supported by video-capable models (InternVL3, Qwen2-VL, GPT-4o)
✅ Multiple media: Send multiple images/videos in single request
✅ Mixed content: Text + images + video together

Quick Start

1. Create the Docker Network

If you don't already have the llm_internal network:

docker network create llm_internal

2. Configure Environment Variables

Create a .env file (or copy from .env.example):

cp .env.example .env

Edit .env to set your OpenAI-compatible endpoint:

# Your OpenAI-compatible endpoint
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions

# API key if required
OPENAI_API_KEY=none

# Model name - IMPORTANT: Use the exact model name
# For InternVL3-8B-AWQ with vLLM/Ollama:
OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ

# Video format - Use 'openai' for video-capable models
# (InternVL3, Qwen2-VL, GPT-4o, etc.)
VIDEO_FORMAT=openai

For InternVL3-8B-AWQ: This model DOES support video - use VIDEO_FORMAT=openai

Note: If your other services are on the same llm_internal Docker network, you can use their container names instead of host.docker.internal. For example: http://vllm-server:8000/v1/chat/completions

3. Start the Service

Using Docker Compose (Recommended)

docker-compose up --build

Using Docker

docker build -t post-monitor .
docker run -p 5005:5000 --network llm_internal post-monitor

Usage

View the Web UI: Open http://localhost:5005 in your browser
- See all incoming requests
- View converted OpenAI format
- See responses from your endpoint
Configure your application to send Gemini-format requests to:
- http://localhost:5005/webhook/models/model:generateContent?key=none
- Or any path under /webhook/
The proxy will:
- Extract text, images, and videos from Gemini format
- Convert to OpenAI format
- Forward to your OpenAI-compatible endpoint
- Convert OpenAI response → Gemini format
- Return to your application

The web interface will automatically refresh every 2 seconds to show new requests.

Example Requests

Text + Image + Video (Gemini Format)

# Text-only request
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{
        "text": "Explain quantum computing in simple terms"
      }]
    }],
    "generationConfig": {
      "maxOutputTokens": 4096
    }
  }'

# With image (base64)
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Describe this image"},
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w 0 image.jpg)'"
          }
        }
      ]
    }]
  }'

# With video (base64) - Requires VIDEO_FORMAT=openai
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Describe this video"},
        {
          "inline_data": {
            "mime_type": "video/mp4",
            "data": "'$(base64 -w 0 video.mp4)'"
          }
        }
      ]
    }],
    "generationConfig": {
      "maxOutputTokens": 4096
    }
  }'

# Multiple images in one request
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Compare these two images"},
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w 0 image1.jpg)'"
          }
        },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w 0 image2.jpg)'"
          }
        }
      ]
    }]
  }'

Features

✅ Gemini → OpenAI Format Conversion: Automatically converts API formats
✅ Image Support: Full support for images (JPEG, PNG, WebP, etc.)
✅ Video Support: Configurable video handling for video-capable models
✅ Request Forwarding: Proxies to your OpenAI-compatible endpoint
✅ Response Conversion: Converts OpenAI responses back to Gemini format
✅ Real-time Monitoring: Web UI shows all requests, conversions, and responses
✅ Multiple Media: Handle multiple images/videos in single request
✅ Catch-all Routes: Accepts any path under /webhook/
✅ Auto-refresh UI: Updates every 2 seconds
✅ Error Handling: Shows detailed errors for debugging
✅ Request History: Stores last 50 requests in memory
✅ Docker Network: Uses llm_internal network for container communication

Format Conversion Details

Gemini → OpenAI

Text:

contents[].parts[].text → messages[].content (text type)

Images (all formats supported):

contents[].parts[].inline_data (image/jpeg, image/png, etc.)
- → messages[].content (image_url type)
- Format: data:image/jpeg;base64,{base64_data}
✅ Universally supported by all vision models

Videos (format depends on VIDEO_FORMAT):

contents[].parts[].inline_data (video/mp4, etc.)
- When VIDEO_FORMAT=openai: → messages[].content (image_url type with video MIME)
- When VIDEO_FORMAT=vllm: → messages[].content (video_url type)
- When VIDEO_FORMAT=skip: → Replaced with text note
⚠️ Only supported by video-capable models (InternVL3, Qwen2-VL, GPT-4o, etc.)

Generation Config:

generationConfig.maxOutputTokens → max_tokens
generationConfig.temperature → temperature

OpenAI → Gemini

Response:

choices[0].message.content → candidates[0].content.parts[0].text

Usage:

usage → usageMetadata

Compatible Endpoints

This proxy works with any OpenAI-compatible endpoint:

OpenAI API (api.openai.com)
Local LLMs (LM Studio, Ollama with OpenAI compatibility, Jan)
vLLM deployments
Text Generation WebUI (with OpenAI extension)
LocalAI
Any other OpenAI-compatible API

Endpoints

GET / - Web UI to view all requests, conversions, and responses
POST /webhook/* - Main proxy endpoint (converts Gemini → OpenAI → Gemini)
POST /clear - Clear all stored requests

Configuration

Environment Variables

OPENAI_ENDPOINT - Your OpenAI-compatible endpoint URL
OPENAI_API_KEY - API key if required (use 'none' if not needed)
OPENAI_MODEL - Model name to use (check your endpoint's available models)
- For vLLM/Ollama with InternVL3: OpenGVLab/InternVL3-8B-AWQ
- For OpenAI API: gpt-4o, gpt-4-turbo, etc.
- For SGLang: Use exact model name from deployment
VIDEO_FORMAT - How to send video content (default: 'openai')
- openai - Standard format (use for InternVL3, Qwen2-VL, GPT-4o, vLLM, Ollama)
- vllm - Experimental vLLM-specific format (try if 'openai' fails)
- skip - Don't send video (use for endpoints without video support)
- error - Fail if video is present

Docker Networking

The proxy uses the llm_internal Docker network for communication with other services:

Communication between containers on same network:

# If your vLLM/Ollama server is also on llm_internal network
OPENAI_ENDPOINT=http://vllm-container-name:8000/v1/chat/completions

Communication with host services:

# For services running on your host machine (not in Docker)
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions

Port mapping:

Host: localhost:5005
Container: port 5000
Access web UI: http://localhost:5005
Send requests to: http://localhost:5005/webhook/*

Troubleshooting

SGLang Image Processing Errors

If you see errors like "cannot identify image file" with SGLang:

Check the model name: SGLang requires the exact model name from your deployment

# Check available models
curl http://localhost:8000/v1/models
   
# Set in .env
OPENAI_MODEL=your-actual-model-name

Verify base64 encoding: Check the console logs for "Base64 data appears valid"
- If validation fails, the base64 data from your app might be corrupted

Test with simple image: Try with a small test image first

# Create test image and convert to base64
echo "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==" > test.b64

Check SGLang logs: Look for more detailed errors in your SGLang server logs
Model compatibility: Ensure your SGLang model supports vision/multimodal inputs
- Not all models work with images
- Check your model's documentation

Connection Issues

If the proxy can't reach your endpoint:

Check if your endpoint is running: curl http://localhost:8000/v1/models
Verify the endpoint URL in .env
Make sure you're using host.docker.internal for host services
Check Docker logs: docker-compose logs -f

Supported Media Types

Images (✅ Always Supported)

All vision models support images. The proxy handles these formats:

image/jpeg
image/png
image/webp
image/gif
Any image MIME type

Console output when processing images:

🖼️  Adding image: image/jpeg
📊 Media summary:
   Images: 2 (image/jpeg, image/png)

Videos (⚠️ Model-Dependent)

Video support depends on your model and VIDEO_FORMAT setting:

Supported formats:

video/mp4
video/mpeg
video/webm
Any video MIME type

Console output when processing video:

# When VIDEO_FORMAT=openai (sending video)
📹 Adding video (OpenAI format): video/mp4
📊 Media summary:
   Videos: sent as openai (video/mp4)

# When VIDEO_FORMAT=skip (skipping video)
⏭️  Skipping video: video/mp4 (VIDEO_FORMAT=skip)
📊 Media summary:
   Videos: skipped (video/mp4)

Mixed Media Requests

You can send text + multiple images + video in a single request:

{
  "contents": [{
    "parts": [
      {"text": "Analyze these media files"},
      {"inline_data": {"mime_type": "image/jpeg", "data": "..."}},
      {"inline_data": {"mime_type": "image/png", "data": "..."}},
      {"inline_data": {"mime_type": "video/mp4", "data": "..."}}
    ]
  }]
}

Video Support by Runner/Model

✅ VIDEO SUPPORTED (use VIDEO_FORMAT=openai):

vLLM with video-capable models:
- InternVL3-8B-AWQ ✅
- Qwen2-VL series ✅
- LLaVA-Video ✅
Ollama with video models:
- InternVL ✅
- Qwen2-VL ✅
OpenAI API:
- gpt-4o ✅
- gpt-4-turbo ✅

❌ VIDEO NOT SUPPORTED (use VIDEO_FORMAT=skip):

SGLang - Does not support video
Most text-only models - Even if served via vLLM/Ollama
Image-only vision models - Can only process images, not video

InternVL3-8B-AWQ Configuration

If you're using InternVL3-8B-AWQ (which does support video), your .env should be:

OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions
OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ
VIDEO_FORMAT=openai

Troubleshooting InternVL3 video:

Verify your vLLM/Ollama server is serving InternVL3: curl http://localhost:8000/v1/models
Check the model name exactly matches what the server reports
If you get "cannot identify image" errors:
- Your runner might not have video support enabled
- Try VIDEO_FORMAT=vllm as an alternative
- Check your vLLM/Ollama version supports video

How to test if your setup supports video:

Set VIDEO_FORMAT=openai
Send a test request with video
Check the logs - if you see errors about "cannot identify image", try VIDEO_FORMAT=skip

Why video might fail:

Model doesn't support video (only images)
Runner doesn't support video
Wrong VIDEO_FORMAT for your runner

Notes

Requests and responses are stored in memory only (not persisted)
Maximum 50 requests are kept in memory
The app runs on port 5000 by default
Base64 data is truncated in the web UI for readability but fully sent to the endpoint

README.md