|
|
1 month ago | |
|---|---|---|
| .gitignore | 1 month ago | |
| .python-version | 1 month ago | |
| Dockerfile | 1 month ago | |
| README.md | 1 month ago | |
| app.py | 1 month ago | |
| docker-compose.yml | 1 month ago | |
| env.example | 1 month ago | |
| main.py | 1 month ago | |
| pyproject.toml | 1 month ago | |
| uv.lock | 1 month ago |
A Docker app that proxies Google Gemini API requests to OpenAI-compatible endpoints with full multimodal support.
http://localhost:5005Perfect for: Using Gemini-format applications with local vision models (InternVL3, Qwen2-VL) or OpenAI API.
If you don't already have the llm_internal network:
docker network create llm_internal
Create a .env file (or copy from .env.example):
cp .env.example .env
Edit .env to set your OpenAI-compatible endpoint:
# Your OpenAI-compatible endpoint
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions
# API key if required
OPENAI_API_KEY=none
# Model name - IMPORTANT: Use the exact model name
# For InternVL3-8B-AWQ with vLLM/Ollama:
OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ
# Video format - Use 'openai' for video-capable models
# (InternVL3, Qwen2-VL, GPT-4o, etc.)
VIDEO_FORMAT=openai
For InternVL3-8B-AWQ: This model DOES support video - use VIDEO_FORMAT=openai
Note: If your other services are on the same llm_internal Docker network, you can use their container names instead of host.docker.internal. For example: http://vllm-server:8000/v1/chat/completions
docker-compose up --build
docker build -t post-monitor .
docker run -p 5005:5000 --network llm_internal post-monitor
View the Web UI: Open http://localhost:5005 in your browser
Configure your application to send Gemini-format requests to:
http://localhost:5005/webhook/models/model:generateContent?key=none/webhook/The proxy will:
The web interface will automatically refresh every 2 seconds to show new requests.
# Text-only request
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [{
"text": "Explain quantum computing in simple terms"
}]
}],
"generationConfig": {
"maxOutputTokens": 4096
}
}'
# With image (base64)
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [
{"text": "Describe this image"},
{
"inline_data": {
"mime_type": "image/jpeg",
"data": "'$(base64 -w 0 image.jpg)'"
}
}
]
}]
}'
# With video (base64) - Requires VIDEO_FORMAT=openai
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [
{"text": "Describe this video"},
{
"inline_data": {
"mime_type": "video/mp4",
"data": "'$(base64 -w 0 video.mp4)'"
}
}
]
}],
"generationConfig": {
"maxOutputTokens": 4096
}
}'
# Multiple images in one request
curl -X POST "http://localhost:5005/webhook/models/model:generateContent?key=none" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [
{"text": "Compare these two images"},
{
"inline_data": {
"mime_type": "image/jpeg",
"data": "'$(base64 -w 0 image1.jpg)'"
}
},
{
"inline_data": {
"mime_type": "image/jpeg",
"data": "'$(base64 -w 0 image2.jpg)'"
}
}
]
}]
}'
/webhook/llm_internal network for container communicationText:
contents[].parts[].text → messages[].content (text type)Images (all formats supported):
contents[].parts[].inline_data (image/jpeg, image/png, etc.)
messages[].content (image_url type)data:image/jpeg;base64,{base64_data}Videos (format depends on VIDEO_FORMAT):
contents[].parts[].inline_data (video/mp4, etc.)
VIDEO_FORMAT=openai: → messages[].content (image_url type with video MIME)VIDEO_FORMAT=vllm: → messages[].content (video_url type)VIDEO_FORMAT=skip: → Replaced with text noteGeneration Config:
generationConfig.maxOutputTokens → max_tokensgenerationConfig.temperature → temperatureResponse:
choices[0].message.content → candidates[0].content.parts[0].textUsage:
usage → usageMetadataThis proxy works with any OpenAI-compatible endpoint:
GET / - Web UI to view all requests, conversions, and responsesPOST /webhook/* - Main proxy endpoint (converts Gemini → OpenAI → Gemini)POST /clear - Clear all stored requestsOPENAI_ENDPOINT - Your OpenAI-compatible endpoint URLOPENAI_API_KEY - API key if required (use 'none' if not needed)OPENAI_MODEL - Model name to use (check your endpoint's available models)
OpenGVLab/InternVL3-8B-AWQgpt-4o, gpt-4-turbo, etc.VIDEO_FORMAT - How to send video content (default: 'openai')
openai - Standard format (use for InternVL3, Qwen2-VL, GPT-4o, vLLM, Ollama)vllm - Experimental vLLM-specific format (try if 'openai' fails)skip - Don't send video (use for endpoints without video support)error - Fail if video is presentThe proxy uses the llm_internal Docker network for communication with other services:
Communication between containers on same network:
# If your vLLM/Ollama server is also on llm_internal network
OPENAI_ENDPOINT=http://vllm-container-name:8000/v1/chat/completions
Communication with host services:
# For services running on your host machine (not in Docker)
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions
Port mapping:
localhost:50055000http://localhost:5005http://localhost:5005/webhook/*If you see errors like "cannot identify image file" with SGLang:
Check the model name: SGLang requires the exact model name from your deployment
# Check available models
curl http://localhost:8000/v1/models
# Set in .env
OPENAI_MODEL=your-actual-model-name
Verify base64 encoding: Check the console logs for "Base64 data appears valid"
Test with simple image: Try with a small test image first
# Create test image and convert to base64
echo "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==" > test.b64
Check SGLang logs: Look for more detailed errors in your SGLang server logs
Model compatibility: Ensure your SGLang model supports vision/multimodal inputs
If the proxy can't reach your endpoint:
curl http://localhost:8000/v1/models.envhost.docker.internal for host servicesdocker-compose logs -fAll vision models support images. The proxy handles these formats:
image/jpegimage/pngimage/webpimage/gifConsole output when processing images:
🖼️ Adding image: image/jpeg
📊 Media summary:
Images: 2 (image/jpeg, image/png)
Video support depends on your model and VIDEO_FORMAT setting:
Supported formats:
video/mp4video/mpegvideo/webmConsole output when processing video:
# When VIDEO_FORMAT=openai (sending video)
📹 Adding video (OpenAI format): video/mp4
📊 Media summary:
Videos: sent as openai (video/mp4)
# When VIDEO_FORMAT=skip (skipping video)
⏭️ Skipping video: video/mp4 (VIDEO_FORMAT=skip)
📊 Media summary:
Videos: skipped (video/mp4)
You can send text + multiple images + video in a single request:
{
"contents": [{
"parts": [
{"text": "Analyze these media files"},
{"inline_data": {"mime_type": "image/jpeg", "data": "..."}},
{"inline_data": {"mime_type": "image/png", "data": "..."}},
{"inline_data": {"mime_type": "video/mp4", "data": "..."}}
]
}]
}
✅ VIDEO SUPPORTED (use VIDEO_FORMAT=openai):
❌ VIDEO NOT SUPPORTED (use VIDEO_FORMAT=skip):
If you're using InternVL3-8B-AWQ (which does support video), your .env should be:
OPENAI_ENDPOINT=http://host.docker.internal:8000/v1/chat/completions
OPENAI_MODEL=OpenGVLab/InternVL3-8B-AWQ
VIDEO_FORMAT=openai
Troubleshooting InternVL3 video:
curl http://localhost:8000/v1/modelsVIDEO_FORMAT=vllm as an alternativeHow to test if your setup supports video:
VIDEO_FORMAT=openaiVIDEO_FORMAT=skipWhy video might fail: