# POST Request Monitor & OpenAI Proxy 📬

A Docker app that proxies Google Gemini API requests to OpenAI-compatible endpoints with full multimodal support.

## What It Does

1. **Monitors POST requests** - Real-time web UI at `http://localhost:5005`
2. **Gemini → OpenAI Conversion** - Automatically converts API formats
3. **Multimodal Support** - Handles text, images, and video
4. **Request Forwarding** - Sends to your OpenAI-compatible endpoint (vLLM, Ollama, OpenAI, etc.)
5. **Response Conversion** - Converts OpenAI responses back to Gemini format

**Perfect for:** Using Gemini-format applications with local vision models (InternVL3, Qwen2-VL) or OpenAI API.

## Multimodal Capabilities

- ✅ **Images**: All formats (JPEG, PNG, WebP, etc.) - Universally supported
- ✅ **Video**: MP4, WebM, etc. - Supported by video-capable models (InternVL3, Qwen2-VL, GPT-4o)
- ✅ **Multiple media**: Send multiple images/videos in single request
- ✅ **Mixed content**: Text + images + video together

## Quick Start

### 1. Create the Docker Network

If you don't already have the `llm_internal` network:
```bash
docker network create llm_internal
```

### 2. Configure Environment Variables

Create a `.env` file (or copy from `.env.example`):
```bash
cp .env.example .env
```

Edit `.env` to set your OpenAI-compatible endpoint:
```bash
# Your OpenAI-compatible endpoint
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions

# API key if required
OPENAI_API_KEY=none

# Model name - IMPORTANT: Use the exact model name
# For InternVL3-8B-AWQ with vLLM/Ollama:
OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ

# Video format - Use 'openai' for video-capable models
# (InternVL3, Qwen2-VL, GPT-4o, etc.)
VIDEO_FORMAT=openai
```

**For InternVL3-8B-AWQ**: This model **DOES support video** - use `VIDEO_FORMAT=openai`

**Note**: If your other services are on the same `llm_internal` Docker network, you can use their container names instead of `host.docker.internal`. For example: `http://vllm-server:8000/v1/chat/completions`

### 3. Start the Service

#### Using Docker Compose (Recommended)
```bash
docker-compose up --build
```

#### Using Docker
```bash
docker build -t post-monitor .
docker run -p 5005:5000 --network llm_internal post-monitor
```

## Usage

1. **View the Web UI**: Open http://localhost:5005 in your browser
   - See all incoming requests
   - View converted OpenAI format  
   - See responses from your endpoint

2. **Configure your application** to send Gemini-format requests to:
   - `http://localhost:5005/webhook/models/model:generateContent?key=none`
   - Or any path under `/webhook/`

3. **The proxy will**:
   - Extract text, images, and videos from Gemini format
   - Convert to OpenAI format
   - Forward to your OpenAI-compatible endpoint
   - Convert OpenAI response → Gemini format
   - Return to your application

The web interface will automatically refresh every 2 seconds to show new requests.

## Example Requests

### Text + Image + Video (Gemini Format)
```bash
# Text-only request
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{
        "text": "Explain quantum computing in simple terms"
      }]
    }],
    "generationConfig": {
      "maxOutputTokens": 4096
    }
  }'

# With image (base64)
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Describe this image"},
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w 0 image.jpg)'"
          }
        }
      ]
    }]
  }'

# With video (base64) - Requires VIDEO_FORMAT=openai
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Describe this video"},
        {
          "inline_data": {
            "mime_type": "video/mp4",
            "data": "'$(base64 -w 0 video.mp4)'"
          }
        }
      ]
    }],
    "generationConfig": {
      "maxOutputTokens": 4096
    }
  }'

# Multiple images in one request
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Compare these two images"},
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w 0 image1.jpg)'"
          }
        },
        {
          "inline_data": {
            "mime_type": "image/jpeg",
            "data": "'$(base64 -w 0 image2.jpg)'"
          }
        }
      ]
    }]
  }'
```

## Features

- ✅ **Gemini → OpenAI Format Conversion**: Automatically converts API formats
- ✅ **Image Support**: Full support for images (JPEG, PNG, WebP, etc.)
- ✅ **Video Support**: Configurable video handling for video-capable models
- ✅ **Request Forwarding**: Proxies to your OpenAI-compatible endpoint
- ✅ **Response Conversion**: Converts OpenAI responses back to Gemini format
- ✅ **Real-time Monitoring**: Web UI shows all requests, conversions, and responses
- ✅ **Multiple Media**: Handle multiple images/videos in single request
- ✅ **Catch-all Routes**: Accepts any path under `/webhook/`
- ✅ **Auto-refresh UI**: Updates every 2 seconds
- ✅ **Error Handling**: Shows detailed errors for debugging
- ✅ **Request History**: Stores last 50 requests in memory
- ✅ **Docker Network**: Uses `llm_internal` network for container communication

## Format Conversion Details

### Gemini → OpenAI

**Text:**
- `contents[].parts[].text` → `messages[].content` (text type)

**Images (all formats supported):**
- `contents[].parts[].inline_data` (image/jpeg, image/png, etc.)
  - → `messages[].content` (image_url type)
  - Format: `data:image/jpeg;base64,{base64_data}`
- ✅ Universally supported by all vision models

**Videos (format depends on VIDEO_FORMAT):**
- `contents[].parts[].inline_data` (video/mp4, etc.)
  - When `VIDEO_FORMAT=openai`: → `messages[].content` (image_url type with video MIME)
  - When `VIDEO_FORMAT=vllm`: → `messages[].content` (video_url type)
  - When `VIDEO_FORMAT=skip`: → Replaced with text note
- ⚠️ Only supported by video-capable models (InternVL3, Qwen2-VL, GPT-4o, etc.)

**Generation Config:**
- `generationConfig.maxOutputTokens` → `max_tokens`
- `generationConfig.temperature` → `temperature`

### OpenAI → Gemini

**Response:**
- `choices[0].message.content` → `candidates[0].content.parts[0].text`

**Usage:**
- `usage` → `usageMetadata`

## Compatible Endpoints

This proxy works with any OpenAI-compatible endpoint:
- **OpenAI API** (api.openai.com)
- **Local LLMs** (LM Studio, Ollama with OpenAI compatibility, Jan)
- **vLLM** deployments
- **Text Generation WebUI** (with OpenAI extension)
- **LocalAI**
- **Any other OpenAI-compatible API**

## Endpoints

- `GET /` - Web UI to view all requests, conversions, and responses
- `POST /webhook/*` - Main proxy endpoint (converts Gemini → OpenAI → Gemini)
- `POST /clear` - Clear all stored requests

## Configuration

### Environment Variables
- `OPENAI_ENDPOINT` - Your OpenAI-compatible endpoint URL
- `OPENAI_API_KEY` - API key if required (use 'none' if not needed)
- `OPENAI_MODEL` - Model name to use (check your endpoint's available models)
  - For vLLM/Ollama with InternVL3: `OpenGVLab/InternVL3-8B-AWQ`
  - For OpenAI API: `gpt-4o`, `gpt-4-turbo`, etc.
  - For SGLang: Use exact model name from deployment
- `VIDEO_FORMAT` - How to send video content (default: 'openai')
  - `openai` - Standard format (use for InternVL3, Qwen2-VL, GPT-4o, vLLM, Ollama)
  - `vllm` - Experimental vLLM-specific format (try if 'openai' fails)
  - `skip` - Don't send video (use for endpoints without video support)
  - `error` - Fail if video is present

### Docker Networking

The proxy uses the `llm_internal` Docker network for communication with other services:

**Communication between containers on same network:**
```bash
# If your vLLM/Ollama server is also on llm_internal network
OPENAI_ENDPOINT=http://vllm-container-name:8000/v1/chat/completions
```

**Communication with host services:**
```bash
# For services running on your host machine (not in Docker)
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions
```

**Port mapping:**
- Host: `localhost:5005`
- Container: port `5000`
- Access web UI: `http://localhost:5005`
- Send requests to: `http://localhost:5005/webhook/*`

## Troubleshooting

### SGLang Image Processing Errors
If you see errors like "cannot identify image file" with SGLang:

1. **Check the model name**: SGLang requires the exact model name from your deployment
   ```bash
   # Check available models
   curl http://localhost:8000/v1/models
   
   # Set in .env
   OPENAI_MODEL=your-actual-model-name
   ```

2. **Verify base64 encoding**: Check the console logs for "Base64 data appears valid"
   - If validation fails, the base64 data from your app might be corrupted

3. **Test with simple image**: Try with a small test image first
   ```bash
   # Create test image and convert to base64
   echo "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==" > test.b64
   ```

4. **Check SGLang logs**: Look for more detailed errors in your SGLang server logs

5. **Model compatibility**: Ensure your SGLang model supports vision/multimodal inputs
   - Not all models work with images
   - Check your model's documentation

### Connection Issues
If the proxy can't reach your endpoint:
1. Check if your endpoint is running: `curl http://localhost:8000/v1/models`
2. Verify the endpoint URL in `.env`
3. Make sure you're using `host.docker.internal` for host services
4. Check Docker logs: `docker-compose logs -f`

## Supported Media Types

### Images (✅ Always Supported)
All vision models support images. The proxy handles these formats:
- `image/jpeg`
- `image/png`  
- `image/webp`
- `image/gif`
- Any image MIME type

**Console output when processing images:**
```
🖼️  Adding image: image/jpeg
📊 Media summary:
   Images: 2 (image/jpeg, image/png)
```

### Videos (⚠️ Model-Dependent)
Video support depends on your model and `VIDEO_FORMAT` setting:

**Supported formats:**
- `video/mp4`
- `video/mpeg`
- `video/webm`
- Any video MIME type

**Console output when processing video:**
```
# When VIDEO_FORMAT=openai (sending video)
📹 Adding video (OpenAI format): video/mp4
📊 Media summary:
   Videos: sent as openai (video/mp4)

# When VIDEO_FORMAT=skip (skipping video)
⏭️  Skipping video: video/mp4 (VIDEO_FORMAT=skip)
📊 Media summary:
   Videos: skipped (video/mp4)
```

### Mixed Media Requests
You can send text + multiple images + video in a single request:
```json
{
  "contents": [{
    "parts": [
      {"text": "Analyze these media files"},
      {"inline_data": {"mime_type": "image/jpeg", "data": "..."}},
      {"inline_data": {"mime_type": "image/png", "data": "..."}},
      {"inline_data": {"mime_type": "video/mp4", "data": "..."}}
    ]
  }]
}
```

## Video Support by Runner/Model

**✅ VIDEO SUPPORTED** (use `VIDEO_FORMAT=openai`):
- **vLLM** with video-capable models:
  - InternVL3-8B-AWQ ✅
  - Qwen2-VL series ✅
  - LLaVA-Video ✅
- **Ollama** with video models:
  - InternVL ✅
  - Qwen2-VL ✅  
- **OpenAI API**:
  - gpt-4o ✅
  - gpt-4-turbo ✅

**❌ VIDEO NOT SUPPORTED** (use `VIDEO_FORMAT=skip`):
- **SGLang** - Does not support video
- **Most text-only models** - Even if served via vLLM/Ollama
- **Image-only vision models** - Can only process images, not video

### InternVL3-8B-AWQ Configuration

If you're using InternVL3-8B-AWQ (which **does support video**), your `.env` should be:

```bash
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions
OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ
VIDEO_FORMAT=openai
```

**Troubleshooting InternVL3 video**:
1. Verify your vLLM/Ollama server is serving InternVL3: `curl http://localhost:8000/v1/models`
2. Check the model name exactly matches what the server reports
3. If you get "cannot identify image" errors:
   - Your runner might not have video support enabled
   - Try `VIDEO_FORMAT=vllm` as an alternative
   - Check your vLLM/Ollama version supports video

**How to test if your setup supports video**:
1. Set `VIDEO_FORMAT=openai`
2. Send a test request with video
3. Check the logs - if you see errors about "cannot identify image", try `VIDEO_FORMAT=skip`

**Why video might fail**:
1. Model doesn't support video (only images)
2. Runner doesn't support video
3. Wrong VIDEO_FORMAT for your runner

## Notes

- Requests and responses are stored in memory only (not persisted)
- Maximum 50 requests are kept in memory
- The app runs on port 5000 by default
- Base64 data is truncated in the web UI for readability but fully sent to the endpoint