|
|
4 tygodni temu | |
|---|---|---|
| _docs | 4 tygodni temu | |
| _scratch | 1 miesiąc temu | |
| _test | 1 miesiąc temu | |
| pipeline | 4 tygodni temu | |
| .gitignore | 1 miesiąc temu | |
| .python-version | 1 miesiąc temu | |
| README.md | 1 miesiąc temu | |
| install.sh | 1 miesiąc temu | |
| main.py | 1 miesiąc temu | |
| pyproject.toml | 4 tygodni temu | |
| uv.lock | 1 miesiąc temu |
Complete object-oriented pipeline for legal discovery using Qwen 3 235B + Qwen 2.5 72B.
pipeline/
├── common_defs.py # Common definitions and data classes
├── main_pipeline.py # Main orchestrator
├── models/
│ └── base.py # Base classes
├── utils/
│ ├── text_utils.py # Text processing utilities
│ ├── deployment_helper.py # Deployment helper
│ └── inference_runner.py # Inference runner
└── steps/
├── step1_load_data.py # Load and preprocess CSV
├── step2_create_chunks.py # Create overlapping chunks
├── step3_keyword_filter.py # Keyword filtering
├── step4_semantic_filter.py # Semantic filtering
├── step5_random_sampling.py # Random sampling
├── step6_labeling_template.py # Generate template
├── step7_inference_prep.py # Prepare inference
└── step8_merge_results.py # Merge results
python pipeline/main_pipeline.py signal_messages.csv --step preprocess
Complete the template at: pipeline_output/attorney_labeling_template.txt
from pipeline.utils.deployment_helper import ModelDeployer
deployer = ModelDeployer()
deployer.print_deployment_instructions()
python pipeline/utils/inference_runner.py pipeline_output/dual_qwen_inference_requests.jsonl
python pipeline/main_pipeline.py signal_messages.csv --step merge \
--qwen3-results pipeline_output/qwen3_results.jsonl \
--qwen25-results pipeline_output/qwen25_results.jsonl
Edit pipeline/common_defs.py to customize: