# Qwen 3 + Qwen 2.5 Pipeline - Complete Usage Guide ## Overview This object-oriented pipeline processes Signal chat messages for legal discovery using: - **Primary Model**: Qwen 3 235B (state-of-the-art, April 2025) - **Secondary Model**: Qwen 2.5 72B (proven 24.85% benchmark) - **Architecture**: Object-oriented with base classes and inheritance - **Total Cost**: $515-968 (including attorney labeling) ## Installation ```bash cd pipeline pip install -r requirements.txt ``` ## Step-by-Step Usage ### Step 1: Run Preprocessing ```bash python main_pipeline.py /path/to/signal_messages.csv --step preprocess ``` This will: 1. Load and normalize 200K messages 2. Create 20-message chunks with 5-message overlap 3. Apply keyword filtering (~60% reduction) 4. Apply dual-model semantic filtering (~97% total reduction) 5. Select 20 random stratified samples 6. Generate attorney labeling template 7. Prepare inference requests **Output**: `pipeline_output/attorney_labeling_template.txt` ### Step 2: Attorney Completes Labeling Attorney reviews and labels 15-20 sample messages in the template: - Mark each as RESPONSIVE: YES or NO - Provide REASONING for decision - Note which CRITERIA matched (1-7) **Time**: 2-2.5 hours **Cost**: $500-937 @ $250-375/hr ### Step 3: Deploy Models ```python from pipeline.utils.deployment_helper import ModelDeployer deployer = ModelDeployer() deployer.print_deployment_instructions() ``` **On Vast.ai GPU 1 (4 × A100):** ```bash pip install vllm transformers accelerate python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen3-235B-Instruct \ --tensor-parallel-size 4 \ --quantization awq \ --port 8000 \ --max-model-len 4096 ``` **On Vast.ai GPU 2 (2 × A100):** ```bash python -m vllm.entrypoints.openai.api_server \ --model Qwen/Qwen2.5-72B-Instruct \ --tensor-parallel-size 2 \ --port 8001 \ --max-model-len 4096 ``` **Cost**: $3.84/hr × 4-8 hours = $15.36-30.72 ### Step 4: Run Inference ```bash python utils/inference_runner.py \ pipeline_output/dual_qwen_inference_requests.jsonl \ --qwen3-url http://localhost:8000 \ --qwen25-url http://localhost:8001 ``` This runs inference on both models and saves results: - `pipeline_output/qwen3_results.jsonl` - `pipeline_output/qwen25_results.jsonl` ### Step 5: Merge Results ```bash python main_pipeline.py /path/to/signal_messages.csv --step merge \ --qwen3-results pipeline_output/qwen3_results.jsonl \ --qwen25-results pipeline_output/qwen25_results.jsonl ``` This merges results with confidence scoring: - **High confidence**: Both models agree - **Medium confidence**: One model flags - **Low confidence**: Disagreement **Output**: `pipeline_output/merged_results.json` ## Individual Step Usage Each step can be run independently: ```python from pipeline.steps.step1_load_data import DataLoader loader = DataLoader('signal_messages.csv') df = loader.execute() ``` ## Customization Edit `pipeline/common_defs.py` to customize: - Case-specific criteria - Keyword lists - Model configurations - Semantic queries ## Expected Results For 200K message corpus: - **Recall**: 88-97% (finds most responsive messages) - **Precision**: 65-85% (acceptable with attorney review) - **High confidence**: 60-70% of chunks (minimal review) - **Medium confidence**: 25-35% of chunks (standard review) - **Low confidence**: 5-10% of chunks (detailed review) ## Troubleshooting **Issue**: Model deployment fails - Check GPU memory (need 4 × 80GB for Qwen 3) - Verify vLLM installation - Check quantization settings **Issue**: Inference times out - Increase timeout in inference_runner.py - Check model health endpoints - Verify network connectivity **Issue**: Low agreement between models - Review few-shot examples - Adjust semantic thresholds - Check prompt formatting