oncall-guide-ai / evaluation

Commit History

Add comprehensive evaluation reports and execution time breakdown for Hospital Customization System
24f6a16

YanBoChen commited on

Update query file references for full evaluation and correct typo in pre_user_query_evaluate.txt for pre-test.
e84171b

YanBoChen commited on

Merge branch 'newbranchYB-newest' into Merged20250805
abbc1cd

YanBoChen commited on

Add adaptive relevance thresholds for query complexity in PrecisionMRRAnalyzer; fix typo in condition mapping for postpartum hemorrhage
7620d26

YanBoChen commited on

Update threshold values in latency evaluator and coverage chart generator; enhance precision and MRR analysis with corrected thresholds and new chart generator for detailed metrics visualization.
5d4792a

YanBoChen commited on

Refactor relevance calculation and update thresholds in latency evaluator; enhance precision and MRR analyzer with angular distance metrics; increase timeout for primary generation in fallback configuration.
b0f56ec

YanBoChen commited on

Enhance Direct LLM Evaluator and Judge Evaluator:
40d39ed

YanBoChen commited on

feat(evaluation): add visualization generators for generating png files
6ccdca1

VanKee commited on

feat(evaluation): add comprehensive hospital customization evaluation system
550df1b

VanKee commited on

Add multi-system evaluation support for clinical actionability and evidence quality metrics
16a2990

YanBoChen commited on

Before Run the 1st Evalation: Add Precision & MRR Chart Generator and a sample test query
a2aaea2

YanBoChen commited on

feat: Add Extraction, LLM Judge, and Relevance Chart Generators
17613c8

YanBoChen commited on

Add extraction and relevance evaluators for condition extraction and retrieval relevance analysis
88e76fd

YanBoChen commited on

Add latency and relevance evaluators for medical query analysis (evaluatoin)
3e2ffcb

YanBoChen commited on

feat(evaluation): add seventh evaluation metric for multi-level fallback efficiency and early interception rate
9e4c1bc

YanBoChen commited on

fix(evaluation): improve evaluation instructions and add structured assessment phases
5f9dffa

YanBoChen commited on

fix(mild bug): enhance user query prompts (more robust dealing process with .txt or .json) and add postpartum hemorrhage condition mapping
253609b

YanBoChen commited on

Add evaluation instructions and user query prompts for clinical model assessment
16ee1e5

YanBoChen commited on