Automatically identify relevant keywords from your data:
from pipeline.steps.step0a_keyword_identification import KeywordIdentifier
identifier = KeywordIdentifier(min_frequency=5, max_keywords=100)
categories = identifier.execute(df)
Output: pipeline_output/keyword_analysis.json and keyword_analysis.txt
Categories:
Analyze text patterns and get suggestions for normalizations:
from pipeline.steps.step0b_normalization_analysis import NormalizationAnalyzer
analyzer = NormalizationAnalyzer()
suggestions = analyzer.execute(df)
Output: pipeline_output/normalization_suggestions.json and normalization_suggestions.txt
Identifies:
Process inference requests 3-4x faster with parallel workers:
from pipeline.utils.parallel_inference_runner import ParallelInferenceRunner
runner = ParallelInferenceRunner(max_workers=4)
runner.run_inference('pipeline_output/dual_qwen_inference_requests.jsonl')
Benefits:
Performance: