USAGE_GUIDE.md 3.8 KB

Qwen 3 + Qwen 2.5 Pipeline - Complete Usage Guide

Overview

This object-oriented pipeline processes Signal chat messages for legal discovery using:

  • Primary Model: Qwen 3 235B (state-of-the-art, April 2025)
  • Secondary Model: Qwen 2.5 72B (proven 24.85% benchmark)
  • Architecture: Object-oriented with base classes and inheritance
  • Total Cost: $515-968 (including attorney labeling)

Installation

cd pipeline
pip install -r requirements.txt

Step-by-Step Usage

Step 1: Run Preprocessing

python main_pipeline.py /path/to/signal_messages.csv --step preprocess

This will:

  1. Load and normalize 200K messages
  2. Create 20-message chunks with 5-message overlap
  3. Apply keyword filtering (~60% reduction)
  4. Apply dual-model semantic filtering (~97% total reduction)
  5. Select 20 random stratified samples
  6. Generate attorney labeling template
  7. Prepare inference requests

Output: pipeline_output/attorney_labeling_template.txt

Step 2: Attorney Completes Labeling

Attorney reviews and labels 15-20 sample messages in the template:

  • Mark each as RESPONSIVE: YES or NO
  • Provide REASONING for decision
  • Note which CRITERIA matched (1-7)

Time: 2-2.5 hours Cost: $500-937 @ $250-375/hr

Step 3: Deploy Models

from pipeline.utils.deployment_helper import ModelDeployer

deployer = ModelDeployer()
deployer.print_deployment_instructions()

On Vast.ai GPU 1 (4 × A100):

pip install vllm transformers accelerate

python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3-235B-Instruct \
    --tensor-parallel-size 4 \
    --quantization awq \
    --port 8000 \
    --max-model-len 4096

On Vast.ai GPU 2 (2 × A100):

python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen2.5-72B-Instruct \
    --tensor-parallel-size 2 \
    --port 8001 \
    --max-model-len 4096

Cost: $3.84/hr × 4-8 hours = $15.36-30.72

Step 4: Run Inference

python utils/inference_runner.py \
    pipeline_output/dual_qwen_inference_requests.jsonl \
    --qwen3-url http://localhost:8000 \
    --qwen25-url http://localhost:8001

This runs inference on both models and saves results:

  • pipeline_output/qwen3_results.jsonl
  • pipeline_output/qwen25_results.jsonl

Step 5: Merge Results

python main_pipeline.py /path/to/signal_messages.csv --step merge \
    --qwen3-results pipeline_output/qwen3_results.jsonl \
    --qwen25-results pipeline_output/qwen25_results.jsonl

This merges results with confidence scoring:

  • High confidence: Both models agree
  • Medium confidence: One model flags
  • Low confidence: Disagreement

Output: pipeline_output/merged_results.json

Individual Step Usage

Each step can be run independently:

from pipeline.steps.step1_load_data import DataLoader

loader = DataLoader('signal_messages.csv')
df = loader.execute()

Customization

Edit pipeline/common_defs.py to customize:

  • Case-specific criteria
  • Keyword lists
  • Model configurations
  • Semantic queries

Expected Results

For 200K message corpus:

  • Recall: 88-97% (finds most responsive messages)
  • Precision: 65-85% (acceptable with attorney review)
  • High confidence: 60-70% of chunks (minimal review)
  • Medium confidence: 25-35% of chunks (standard review)
  • Low confidence: 5-10% of chunks (detailed review)

Troubleshooting

Issue: Model deployment fails

  • Check GPU memory (need 4 × 80GB for Qwen 3)
  • Verify vLLM installation
  • Check quantization settings

Issue: Inference times out

  • Increase timeout in inference_runner.py
  • Check model health endpoints
  • Verify network connectivity

Issue: Low agreement between models

  • Review few-shot examples
  • Adjust semantic thresholds
  • Check prompt formatting