Qwen 3 + Qwen 2.5 Pipeline - Complete Usage Guide

Overview

This object-oriented pipeline processes Signal chat messages for legal discovery using:

Primary Model: Qwen 3 235B (state-of-the-art, April 2025)
Secondary Model: Qwen 2.5 72B (proven 24.85% benchmark)
Architecture: Object-oriented with base classes and inheritance
Total Cost: $515-968 (including attorney labeling)

Installation

cd pipeline
pip install -r requirements.txt

Step-by-Step Usage

Step 1: Run Preprocessing

python main_pipeline.py /path/to/signal_messages.csv --step preprocess

This will:

Load and normalize 200K messages
Create 20-message chunks with 5-message overlap
Apply keyword filtering (~60% reduction)
Apply dual-model semantic filtering (~97% total reduction)
Select 20 random stratified samples
Generate attorney labeling template
Prepare inference requests

Output: pipeline_output/attorney_labeling_template.txt

Step 2: Attorney Completes Labeling

Attorney reviews and labels 15-20 sample messages in the template:

Mark each as RESPONSIVE: YES or NO
Provide REASONING for decision
Note which CRITERIA matched (1-7)

Time: 2-2.5 hours Cost: $500-937 @ $250-375/hr

Step 3: Deploy Models

from pipeline.utils.deployment_helper import ModelDeployer

deployer = ModelDeployer()
deployer.print_deployment_instructions()

On Vast.ai GPU 1 (4 × A100):

pip install vllm transformers accelerate

python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3-235B-Instruct \
    --tensor-parallel-size 4 \
    --quantization awq \
    --port 8000 \
    --max-model-len 4096

On Vast.ai GPU 2 (2 × A100):

python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen2.5-72B-Instruct \
    --tensor-parallel-size 2 \
    --port 8001 \
    --max-model-len 4096

Cost: $3.84/hr × 4-8 hours = $15.36-30.72

Step 4: Run Inference

python utils/inference_runner.py \
    pipeline_output/dual_qwen_inference_requests.jsonl \
    --qwen3-url http://localhost:8000 \
    --qwen25-url http://localhost:8001

This runs inference on both models and saves results:

pipeline_output/qwen3_results.jsonl
pipeline_output/qwen25_results.jsonl

Step 5: Merge Results

python main_pipeline.py /path/to/signal_messages.csv --step merge \
    --qwen3-results pipeline_output/qwen3_results.jsonl \
    --qwen25-results pipeline_output/qwen25_results.jsonl

This merges results with confidence scoring:

High confidence: Both models agree
Medium confidence: One model flags
Low confidence: Disagreement

Output: pipeline_output/merged_results.json

Individual Step Usage

Each step can be run independently:

from pipeline.steps.step1_load_data import DataLoader

loader = DataLoader('signal_messages.csv')
df = loader.execute()

Customization

Edit pipeline/common_defs.py to customize:

Case-specific criteria
Keyword lists
Model configurations
Semantic queries

Expected Results

For 200K message corpus:

Recall: 88-97% (finds most responsive messages)
Precision: 65-85% (acceptable with attorney review)
High confidence: 60-70% of chunks (minimal review)
Medium confidence: 25-35% of chunks (standard review)
Low confidence: 5-10% of chunks (detailed review)

Troubleshooting

Issue: Model deployment fails

Check GPU memory (need 4 × 80GB for Qwen 3)
Verify vLLM installation
Check quantization settings

Issue: Inference times out

Increase timeout in inference_runner.py
Check model health endpoints
Verify network connectivity

Issue: Low agreement between models

Review few-shot examples
Adjust semantic thresholds
Check prompt formatting

USAGE_GUIDE.md 3.8 KB История Исходник