This object-oriented pipeline processes Signal chat messages for legal discovery using:
cd pipeline
pip install -r requirements.txt
python main_pipeline.py /path/to/signal_messages.csv --step preprocess
This will:
Output: pipeline_output/attorney_labeling_template.txt
Attorney reviews and labels 15-20 sample messages in the template:
Time: 2-2.5 hours Cost: $500-937 @ $250-375/hr
from pipeline.utils.deployment_helper import ModelDeployer
deployer = ModelDeployer()
deployer.print_deployment_instructions()
On Vast.ai GPU 1 (4 × A100):
pip install vllm transformers accelerate
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-235B-Instruct \
--tensor-parallel-size 4 \
--quantization awq \
--port 8000 \
--max-model-len 4096
On Vast.ai GPU 2 (2 × A100):
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-72B-Instruct \
--tensor-parallel-size 2 \
--port 8001 \
--max-model-len 4096
Cost: $3.84/hr × 4-8 hours = $15.36-30.72
python utils/inference_runner.py \
pipeline_output/dual_qwen_inference_requests.jsonl \
--qwen3-url http://localhost:8000 \
--qwen25-url http://localhost:8001
This runs inference on both models and saves results:
pipeline_output/qwen3_results.jsonlpipeline_output/qwen25_results.jsonlpython main_pipeline.py /path/to/signal_messages.csv --step merge \
--qwen3-results pipeline_output/qwen3_results.jsonl \
--qwen25-results pipeline_output/qwen25_results.jsonl
This merges results with confidence scoring:
Output: pipeline_output/merged_results.json
Each step can be run independently:
from pipeline.steps.step1_load_data import DataLoader
loader = DataLoader('signal_messages.csv')
df = loader.execute()
Edit pipeline/common_defs.py to customize:
For 200K message corpus:
Issue: Model deployment fails
Issue: Inference times out
Issue: Low agreement between models