# POST Request Monitor & OpenAI Proxy 📬 A Docker app that proxies Google Gemini API requests to OpenAI-compatible endpoints with full multimodal support. ## What It Does 1. **Monitors POST requests** - Real-time web UI at `http://localhost:5005` 2. **Gemini → OpenAI Conversion** - Automatically converts API formats 3. **Multimodal Support** - Handles text, images, and video 4. **Request Forwarding** - Sends to your OpenAI-compatible endpoint (vLLM, Ollama, OpenAI, etc.) 5. **Response Conversion** - Converts OpenAI responses back to Gemini format **Perfect for:** Using Gemini-format applications with local vision models (InternVL3, Qwen2-VL) or OpenAI API. ## Multimodal Capabilities - ✅ **Images**: All formats (JPEG, PNG, WebP, etc.) - Universally supported - ✅ **Video**: MP4, WebM, etc. - Supported by video-capable models (InternVL3, Qwen2-VL, GPT-4o) - ✅ **Multiple media**: Send multiple images/videos in single request - ✅ **Mixed content**: Text + images + video together ## Quick Start ### 1. Create the Docker Network If you don't already have the `llm_internal` network: ```bash docker network create llm_internal ``` ### 2. Configure Environment Variables Create a `.env` file (or copy from `.env.example`): ```bash cp .env.example .env ``` Edit `.env` to set your OpenAI-compatible endpoint: ```bash # Your OpenAI-compatible endpoint OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions # API key if required OPENAI_API_KEY=none # Model name - IMPORTANT: Use the exact model name # For InternVL3-8B-AWQ with vLLM/Ollama: OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ # Video format - Use 'openai' for video-capable models # (InternVL3, Qwen2-VL, GPT-4o, etc.) VIDEO_FORMAT=openai ``` **For InternVL3-8B-AWQ**: This model **DOES support video** - use `VIDEO_FORMAT=openai` **Note**: If your other services are on the same `llm_internal` Docker network, you can use their container names instead of `host.docker.internal`. For example: `http://vllm-server:8000/v1/chat/completions` ### 3. Start the Service #### Using Docker Compose (Recommended) ```bash docker-compose up --build ``` #### Using Docker ```bash docker build -t post-monitor . docker run -p 5005:5000 --network llm_internal post-monitor ``` ## Usage 1. **View the Web UI**: Open http://localhost:5005 in your browser - See all incoming requests - View converted OpenAI format - See responses from your endpoint 2. **Configure your application** to send Gemini-format requests to: - `http://localhost:5005/webhook/models/model:generateContent?key=none` - Or any path under `/webhook/` 3. **The proxy will**: - Extract text, images, and videos from Gemini format - Convert to OpenAI format - Forward to your OpenAI-compatible endpoint - Convert OpenAI response → Gemini format - Return to your application The web interface will automatically refresh every 2 seconds to show new requests. ## Example Requests ### Text + Image + Video (Gemini Format) ```bash # Text-only request curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [{ "text": "Explain quantum computing in simple terms" }] }], "generationConfig": { "maxOutputTokens": 4096 } }' # With image (base64) curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Describe this image"}, { "inline_data": { "mime_type": "image/jpeg", "data": "'$(base64 -w 0 image.jpg)'" } } ] }] }' # With video (base64) - Requires VIDEO_FORMAT=openai curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Describe this video"}, { "inline_data": { "mime_type": "video/mp4", "data": "'$(base64 -w 0 video.mp4)'" } } ] }], "generationConfig": { "maxOutputTokens": 4096 } }' # Multiple images in one request curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Compare these two images"}, { "inline_data": { "mime_type": "image/jpeg", "data": "'$(base64 -w 0 image1.jpg)'" } }, { "inline_data": { "mime_type": "image/jpeg", "data": "'$(base64 -w 0 image2.jpg)'" } } ] }] }' ``` ## Features - ✅ **Gemini → OpenAI Format Conversion**: Automatically converts API formats - ✅ **Image Support**: Full support for images (JPEG, PNG, WebP, etc.) - ✅ **Video Support**: Configurable video handling for video-capable models - ✅ **Request Forwarding**: Proxies to your OpenAI-compatible endpoint - ✅ **Response Conversion**: Converts OpenAI responses back to Gemini format - ✅ **Real-time Monitoring**: Web UI shows all requests, conversions, and responses - ✅ **Multiple Media**: Handle multiple images/videos in single request - ✅ **Catch-all Routes**: Accepts any path under `/webhook/` - ✅ **Auto-refresh UI**: Updates every 2 seconds - ✅ **Error Handling**: Shows detailed errors for debugging - ✅ **Request History**: Stores last 50 requests in memory - ✅ **Docker Network**: Uses `llm_internal` network for container communication ## Format Conversion Details ### Gemini → OpenAI **Text:** - `contents[].parts[].text` → `messages[].content` (text type) **Images (all formats supported):** - `contents[].parts[].inline_data` (image/jpeg, image/png, etc.) - → `messages[].content` (image_url type) - Format: `data:image/jpeg;base64,{base64_data}` - ✅ Universally supported by all vision models **Videos (format depends on VIDEO_FORMAT):** - `contents[].parts[].inline_data` (video/mp4, etc.) - When `VIDEO_FORMAT=openai`: → `messages[].content` (image_url type with video MIME) - When `VIDEO_FORMAT=vllm`: → `messages[].content` (video_url type) - When `VIDEO_FORMAT=skip`: → Replaced with text note - ⚠️ Only supported by video-capable models (InternVL3, Qwen2-VL, GPT-4o, etc.) **Generation Config:** - `generationConfig.maxOutputTokens` → `max_tokens` - `generationConfig.temperature` → `temperature` ### OpenAI → Gemini **Response:** - `choices[0].message.content` → `candidates[0].content.parts[0].text` **Usage:** - `usage` → `usageMetadata` ## Compatible Endpoints This proxy works with any OpenAI-compatible endpoint: - **OpenAI API** (api.openai.com) - **Local LLMs** (LM Studio, Ollama with OpenAI compatibility, Jan) - **vLLM** deployments - **Text Generation WebUI** (with OpenAI extension) - **LocalAI** - **Any other OpenAI-compatible API** ## Endpoints - `GET /` - Web UI to view all requests, conversions, and responses - `POST /webhook/*` - Main proxy endpoint (converts Gemini → OpenAI → Gemini) - `POST /clear` - Clear all stored requests ## Configuration ### Environment Variables - `OPENAI_ENDPOINT` - Your OpenAI-compatible endpoint URL - `OPENAI_API_KEY` - API key if required (use 'none' if not needed) - `OPENAI_MODEL` - Model name to use (check your endpoint's available models) - For vLLM/Ollama with InternVL3: `OpenGVLab/InternVL3-8B-AWQ` - For OpenAI API: `gpt-4o`, `gpt-4-turbo`, etc. - For SGLang: Use exact model name from deployment - `VIDEO_FORMAT` - How to send video content (default: 'openai') - `openai` - Standard format (use for InternVL3, Qwen2-VL, GPT-4o, vLLM, Ollama) - `vllm` - Experimental vLLM-specific format (try if 'openai' fails) - `skip` - Don't send video (use for endpoints without video support) - `error` - Fail if video is present ### Docker Networking The proxy uses the `llm_internal` Docker network for communication with other services: **Communication between containers on same network:** ```bash # If your vLLM/Ollama server is also on llm_internal network OPENAI_ENDPOINT=http://vllm-container-name:8000/v1/chat/completions ``` **Communication with host services:** ```bash # For services running on your host machine (not in Docker) OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions ``` **Port mapping:** - Host: `localhost:5005` - Container: port `5000` - Access web UI: `http://localhost:5005` - Send requests to: `http://localhost:5005/webhook/*` ## Troubleshooting ### SGLang Image Processing Errors If you see errors like "cannot identify image file" with SGLang: 1. **Check the model name**: SGLang requires the exact model name from your deployment ```bash # Check available models curl http://localhost:8000/v1/models # Set in .env OPENAI_MODEL=your-actual-model-name ``` 2. **Verify base64 encoding**: Check the console logs for "Base64 data appears valid" - If validation fails, the base64 data from your app might be corrupted 3. **Test with simple image**: Try with a small test image first ```bash # Create test image and convert to base64 echo "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==" > test.b64 ``` 4. **Check SGLang logs**: Look for more detailed errors in your SGLang server logs 5. **Model compatibility**: Ensure your SGLang model supports vision/multimodal inputs - Not all models work with images - Check your model's documentation ### Connection Issues If the proxy can't reach your endpoint: 1. Check if your endpoint is running: `curl http://localhost:8000/v1/models` 2. Verify the endpoint URL in `.env` 3. Make sure you're using `host.docker.internal` for host services 4. Check Docker logs: `docker-compose logs -f` ## Supported Media Types ### Images (✅ Always Supported) All vision models support images. The proxy handles these formats: - `image/jpeg` - `image/png` - `image/webp` - `image/gif` - Any image MIME type **Console output when processing images:** ``` 🖼️ Adding image: image/jpeg 📊 Media summary: Images: 2 (image/jpeg, image/png) ``` ### Videos (⚠️ Model-Dependent) Video support depends on your model and `VIDEO_FORMAT` setting: **Supported formats:** - `video/mp4` - `video/mpeg` - `video/webm` - Any video MIME type **Console output when processing video:** ``` # When VIDEO_FORMAT=openai (sending video) 📹 Adding video (OpenAI format): video/mp4 📊 Media summary: Videos: sent as openai (video/mp4) # When VIDEO_FORMAT=skip (skipping video) ⏭️ Skipping video: video/mp4 (VIDEO_FORMAT=skip) 📊 Media summary: Videos: skipped (video/mp4) ``` ### Mixed Media Requests You can send text + multiple images + video in a single request: ```json { "contents": [{ "parts": [ {"text": "Analyze these media files"}, {"inline_data": {"mime_type": "image/jpeg", "data": "..."}}, {"inline_data": {"mime_type": "image/png", "data": "..."}}, {"inline_data": {"mime_type": "video/mp4", "data": "..."}} ] }] } ``` ## Video Support by Runner/Model **✅ VIDEO SUPPORTED** (use `VIDEO_FORMAT=openai`): - **vLLM** with video-capable models: - InternVL3-8B-AWQ ✅ - Qwen2-VL series ✅ - LLaVA-Video ✅ - **Ollama** with video models: - InternVL ✅ - Qwen2-VL ✅ - **OpenAI API**: - gpt-4o ✅ - gpt-4-turbo ✅ **❌ VIDEO NOT SUPPORTED** (use `VIDEO_FORMAT=skip`): - **SGLang** - Does not support video - **Most text-only models** - Even if served via vLLM/Ollama - **Image-only vision models** - Can only process images, not video ### InternVL3-8B-AWQ Configuration If you're using InternVL3-8B-AWQ (which **does support video**), your `.env` should be: ```bash OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ VIDEO_FORMAT=openai ``` **Troubleshooting InternVL3 video**: 1. Verify your vLLM/Ollama server is serving InternVL3: `curl http://localhost:8000/v1/models` 2. Check the model name exactly matches what the server reports 3. If you get "cannot identify image" errors: - Your runner might not have video support enabled - Try `VIDEO_FORMAT=vllm` as an alternative - Check your vLLM/Ollama version supports video **How to test if your setup supports video**: 1. Set `VIDEO_FORMAT=openai` 2. Send a test request with video 3. Check the logs - if you see errors about "cannot identify image", try `VIDEO_FORMAT=skip` **Why video might fail**: 1. Model doesn't support video (only images) 2. Runner doesn't support video 3. Wrong VIDEO_FORMAT for your runner ## Notes - Requests and responses are stored in memory only (not persisted) - Maximum 50 requests are kept in memory - The app runs on port 5000 by default - Base64 data is truncated in the web UI for readability but fully sent to the endpoint