Spaces:

DocUA
/

GUI_MAI-DxO

Sleeping

App Files Files Community

harshalmore31 commited on Jul 3

Commit

e33b088

1 Parent(s): 9d7d2e0

Refactor agent prompts for clarity and conciseness; updated and iteration limits for improved performance

Browse files

Files changed (1) hide show

mai_dx/main.py +43 -319

mai_dx/main.py CHANGED Viewed

@@ -574,345 +574,69 @@ This case has gone through {case_state.iteration} iterations. Focus on decisive
         if case_state:
             dynamic_context = self._get_dynamic_context(role, case_state)
         base_prompts = {
-            AgentRole.HYPOTHESIS: f"""
-            {dynamic_context}
-            You are Dr. Hypothesis, a specialist in maintaining differential diagnoses. Your role is critical to the diagnostic process.
-            CORE RESPONSIBILITIES:
-            - Maintain a probability-ranked differential diagnosis with the top 3-5 most likely conditions
-            - Update probabilities using Bayesian reasoning after each new finding
-            - Consider both common and rare diseases appropriate to the clinical context
-            - Explicitly track how new evidence changes your diagnostic thinking
-            - Provide comprehensive analysis with detailed clinical reasoning
-            APPROACH:
-            1. Start with the most likely diagnoses based on presenting symptoms
-            2. For each new piece of evidence, consider:
-               - How it supports or refutes each hypothesis
-               - Whether it suggests new diagnoses to consider
-               - How it changes the relative probabilities
-            3. Always explain your Bayesian reasoning clearly
-            4. Consider epidemiology, pathophysiology, and clinical patterns
-            **IMPORTANT: You MUST use the update_differential_diagnosis function to provide your structured analysis.**
-            Use the function to provide:
-            - A one-sentence summary of your primary diagnostic conclusion and confidence level
-            - Your top 2-5 differential diagnoses with probability estimates (as decimals: 0.0-1.0)
-            - Brief rationale for each diagnosis
-            - Key supporting evidence for leading hypotheses
-            - Critical contradictory evidence that must be addressed
-            Remember: Your differential drives the entire diagnostic process. Provide clear probabilities and reasoning.
-            """,
-            AgentRole.TEST_CHOOSER: f"""
-            {dynamic_context}
-            You are Dr. Test-Chooser, a specialist in diagnostic test selection and information theory.
-            CORE RESPONSIBILITIES:
-            - Select up to 3 diagnostic tests per round that maximally discriminate between leading hypotheses
-            - Optimize for information value, not just clinical reasonableness
-            - Consider test characteristics: sensitivity, specificity, positive/negative predictive values
-            - Balance diagnostic yield with patient burden and resource utilization
-            - Provide comprehensive test selection rationale
-            SELECTION CRITERIA:
-            1. Information Value: How much will this test change diagnostic probabilities?
-            2. Discriminatory Power: How well does it distinguish between competing hypotheses?
-            3. Clinical Impact: Will the result meaningfully alter management?
-            4. Sequential Logic: What should we establish first before ordering more complex tests?
-            5. Cost-effectiveness and patient safety considerations
-            APPROACH:
-            - For each proposed test, explicitly state which hypotheses it will help confirm or exclude
-            - Consider both positive and negative results and their implications
-            - Think about test sequences (e.g., basic labs before advanced imaging)
-            - Avoid redundant tests that won't add new information
-            - Consider pre-test probability and post-test probability calculations
-            OUTPUT FORMAT (You have a response limit of {self._get_agent_max_tokens(AgentRole.TEST_CHOOSER)} tokens - prioritize actionable recommendations):
-            **SUMMARY FIRST:** Lead with your single most recommended test and why it's the highest priority.
-            **DETAILED RECOMMENDATIONS (up to 3 tests):**
-            For each test:
-            - Test name (be specific and accurate)
-            - Primary hypotheses it will help evaluate
-            - Expected information gain
-            - How results will change management
-            - Cost-effectiveness assessment
-            - Timing rationale
-            Focus on tests that will most efficiently narrow the differential diagnosis.
-            """,
-            AgentRole.CHALLENGER: f"""
-            {dynamic_context}
-            You are Dr. Challenger, the critical thinking specialist and devil's advocate.
-            CORE RESPONSIBILITIES:
-            - Identify and challenge cognitive biases in the diagnostic process
-            - Highlight contradictory evidence that might be overlooked
-            - Propose alternative hypotheses and falsifying tests
-            - Guard against premature diagnostic closure
-            - Provide comprehensive critical analysis
-            COGNITIVE BIASES TO WATCH FOR:
-            1. Anchoring: Over-reliance on initial impressions
-            2. Confirmation bias: Seeking only supporting evidence
-            3. Availability bias: Overestimating probability of recently seen conditions
-            4. Representativeness: Ignoring base rates and prevalence
-            5. Search satisficing: Stopping at "good enough" explanations
-            6. Attribution errors and hindsight bias
-            YOUR APPROACH:
-            - Ask "What else could this be?" and "What doesn't fit?"
-            - Challenge assumptions and look for alternative explanations
-            - Propose tests that could disprove the leading hypothesis
-            - Consider rare diseases when common ones don't fully explain the picture
-            - Advocate for considering multiple conditions simultaneously
-            - Look for inconsistencies in the clinical presentation
-            OUTPUT FORMAT (You have a response limit of {self._get_agent_max_tokens(AgentRole.CHALLENGER)} tokens - focus on the most critical challenges):
-            **SUMMARY FIRST:** State your primary concern with the current diagnostic approach in one sentence.
-            **CRITICAL CHALLENGES:**
-            - Most significant bias identified in current reasoning
-            - Key evidence that contradicts leading hypotheses
-            - Most important alternative diagnosis to consider
-            - Essential test to falsify current assumptions
-            - Highest priority red flag or safety concern
-            - Most critical gap in current approach
-            Be constructively critical - focus on the challenges that most impact diagnostic accuracy.
-            """,
-            AgentRole.STEWARDSHIP: f"""
-            {dynamic_context}
-            You are Dr. Stewardship, the resource optimization and cost-effectiveness specialist.
-            CORE RESPONSIBILITIES:
-            - Enforce cost-conscious, high-value care
-            - Advocate for cheaper alternatives when diagnostically equivalent
-            - Challenge low-yield, expensive tests
-            - Balance diagnostic thoroughness with resource stewardship
-            - Provide comprehensive cost-benefit analysis
-            COST-VALUE FRAMEWORK:
-            1. High-Value Tests: Low cost, high diagnostic yield, changes management
-            2. Moderate-Value Tests: Moderate cost, specific indication, incremental value
-            3. Low-Value Tests: High cost, low yield, minimal impact on decisions
-            4. No-Value Tests: Any cost, no diagnostic value, ordered out of habit
-            ALTERNATIVE STRATEGIES:
-            - Could patient history/physical exam provide this information?
-            - Is there a less expensive test with similar diagnostic value?
-            - Can we use a staged approach (cheap test first, expensive if needed)?
-            - Does the test result actually change management?
-            - Are there outpatient vs. inpatient cost considerations?
-            YOUR APPROACH:
-            - Review all proposed tests for necessity and value
-            - Suggest cost-effective alternatives with rationale
-            - Question tests that don't clearly advance diagnosis
-            - Advocate for asking questions before ordering expensive tests
-            - Consider the cumulative cost burden and budget constraints
-            - Analyze cost per unit of diagnostic information gained
-            OUTPUT FORMAT (Use full token allocation for detailed analysis):
-            - Assessment of proposed tests (high/moderate/low/no value) with detailed reasoning
-            - Specific cost-effective alternatives with cost comparisons
-            - Questions that might obviate need for testing
-            - Recommended modifications to testing strategy
-            - Cumulative cost considerations and budget impact
-            - Value-based care recommendations
-            - Analysis of diagnostic yield vs. cost for each proposed intervention
-            Your goal: Maximum diagnostic accuracy at minimum necessary cost while maintaining high-quality care.
-            """,
-            AgentRole.CHECKLIST: f"""
-            {dynamic_context}
-            You are Dr. Checklist, the quality assurance and consistency specialist.
-            CORE RESPONSIBILITIES:
-            - Perform comprehensive quality control on all panel deliberations
-            - Ensure test names are valid and properly specified
-            - Check internal consistency of reasoning across panel members
-            - Flag logical errors or contradictions in the diagnostic approach
-            - Provide systematic quality assessment
-            QUALITY CHECKS:
-            1. Test Validity: Are proposed tests real and properly named?
-            2. Logical Consistency: Do the recommendations align with the differential?
-            3. Evidence Integration: Are all findings being considered appropriately?
-            4. Process Adherence: Is the panel following proper diagnostic methodology?
-            5. Safety Checks: Are any critical possibilities being overlooked?
-            6. Completeness: Is the diagnostic workup comprehensive?
-            SPECIFIC VALIDATIONS:
-            - Test names match standard medical terminology
-            - Proposed tests are appropriate for the clinical scenario
-            - No contradictions between different panel members' reasoning
-            - All significant findings are being addressed
-            - No gaps in the diagnostic logic
-            - Proper consideration of differential diagnosis breadth
-            OUTPUT FORMAT (Use full token allocation for comprehensive analysis):
-            - Detailed validation summary (✓ Clear / ⚠ Issues noted)
-            - Any test name corrections needed with proper terminology
-            - Logical inconsistencies identified with specific examples
-            - Missing considerations or gaps in reasoning
-            - Process improvement suggestions with rationale
-            - Safety concerns or red flags that need immediate attention
-            - Systematic review of diagnostic approach quality
-            Keep your feedback comprehensive and detailed. Flag any issues that could compromise diagnostic quality or patient safety.
-            """,
-            AgentRole.CONSENSUS: f"""
-            {dynamic_context}
-            You are the Consensus Coordinator, responsible for synthesizing the virtual panel's expertise into a single, optimal decision.
-            CORE RESPONSIBILITIES:
-            - Integrate input from Dr. Hypothesis, Dr. Test-Chooser, Dr. Challenger, Dr. Stewardship, and Dr. Checklist
-            - Decide on the single best next action: 'ask', 'test', or 'diagnose'
-            - Balance competing priorities: accuracy, cost, efficiency, and thoroughness
-            - Ensure the chosen action advances the diagnostic process optimally
-            **PRIORITIZED DECISION FRAMEWORK:**
-            Use the following prioritized framework to make your decision:
-            1. **Certainty Threshold:** If Dr. Hypothesis's leading diagnosis has confidence >85% AND Dr. Challenger raises no major objections, your action MUST be `diagnose`.
-            2. **Address Red Flags:** If Dr. Challenger identifies a critical bias or contradictory evidence, your next action MUST be a test or question that directly addresses that challenge.
-            3. **High-Value Information:** Otherwise, select the test from Dr. Test-Chooser that offers the highest information gain.
-            4. **Cost Optimization:** Before finalizing a test, check Dr. Stewardship's input. If a diagnostically equivalent but cheaper alternative is available, select it.
-            5. **Default to Questions:** If no test meets the criteria or the budget is a major concern, select the most pertinent question to ask.
-            **IMPORTANT: You MUST use the make_consensus_decision function to provide your structured response. Call this function with the appropriate action_type, content, and reasoning parameters.**
-            For action_type "ask": content should be specific patient history or physical exam questions
-            For action_type "test": content should be properly named diagnostic tests (up to 3)
-            For action_type "diagnose": content should be the complete, specific final diagnosis
-            Make the decision that best advances accurate, cost-effective diagnosis. Use comprehensive reasoning that synthesizes all panel input and cites the specific decision framework step you're following.
-            """,
-            AgentRole.GATEKEEPER: f"""
-            {dynamic_context}
-            You are the Gatekeeper, the clinical information oracle with complete access to the patient case file.
-            CORE RESPONSIBILITIES:
-            - Provide objective, specific clinical findings when explicitly requested
-            - Serve as the authoritative source for all patient information
-            - Generate realistic synthetic findings for tests not in the original case
-            - Maintain clinical realism while preventing information leakage
-            - Provide comprehensive, detailed responses
-            RESPONSE PRINCIPLES:
-            1. OBJECTIVITY: Provide only factual findings, never interpretations or impressions
-            2. SPECIFICITY: Give precise, detailed results when tests are properly ordered
-            3. REALISM: Ensure all responses reflect realistic clinical scenarios
-            4. NO HINTS: Never provide diagnostic clues or suggestions
-            5. CONSISTENCY: Maintain coherence across all provided information
-            6. COMPLETENESS: Provide thorough, detailed responses
-            HANDLING REQUESTS:
-            - Patient History Questions: Provide relevant history from case file or realistic details
-            - Physical Exam: Give specific examination findings as would be documented
-            - Diagnostic Tests: Provide exact results as specified or realistic synthetic values
-            - Vague Requests: Politely ask for more specific queries
-            - Invalid Requests: Explain why the request cannot be fulfilled
-            SYNTHETIC FINDINGS GUIDELINES:
-            When generating findings not in the original case:
-            - Ensure consistency with established diagnosis and case details
-            - Use realistic reference ranges and values
-            - Maintain clinical plausibility
-            - Avoid pathognomonic findings unless specifically diagnostic
-            - Consider normal variations and expected findings
-            RESPONSE FORMAT (Use full token allocation for detailed responses):
-            - Direct, clinical language with comprehensive detail
-            - Specific measurements with reference ranges when applicable
-            - Clear organization of findings with systematic presentation
-            - Professional medical terminology with full descriptions
-            - Complete documentation as would appear in medical records
-            Your role is crucial: provide complete, accurate clinical information while maintaining the challenge of the diagnostic process. Use your full token allocation to provide comprehensive, detailed clinical information.
-            """,
-            AgentRole.JUDGE: f"""
-            {dynamic_context}
-            You are the Judge, the diagnostic accuracy evaluation specialist.
-            CORE RESPONSIBILITIES:
-            - Evaluate candidate diagnoses against ground truth using a rigorous clinical rubric
-            - Provide fair, consistent scoring based on clinical management implications
-            - Consider diagnostic substance over terminology differences
-            - Account for acceptable medical synonyms and equivalent formulations
-            - Provide comprehensive evaluation reasoning
-            EVALUATION RUBRIC (5-point Likert scale):
-            SCORE 5 (Perfect/Clinically Superior):
-            - Clinically identical to reference diagnosis
-            - May be more specific than reference (adding relevant detail)
-            - No incorrect or unrelated additions
-            - Treatment approach would be identical
-            SCORE 4 (Mostly Correct - Minor Incompleteness):
-            - Core disease correctly identified
-            - Minor qualifier or component missing/mis-specified
-            - Overall management largely unchanged
-            - Clinically appropriate diagnosis
-            SCORE 3 (Partially Correct - Major Error):
-            - Correct general disease category
-            - Major error in etiology, anatomic site, or critical specificity
-            - Would significantly alter workup or prognosis
-            - Partially correct but clinically concerning gaps
-            SCORE 2 (Largely Incorrect):
-            - Shares only superficial features with correct diagnosis
-            - Wrong fundamental disease process
-            - Would misdirect clinical workup
-            - Partially contradicts case details
-            SCORE 1 (Completely Incorrect):
-            - No meaningful overlap with correct diagnosis
-            - Wrong organ system or disease category
-            - Would likely lead to harmful care
-            - Completely inconsistent with clinical presentation
-            EVALUATION PROCESS:
-            1. Compare core disease entity
-            2. Assess etiology/causative factors
-            3. Evaluate anatomic specificity
-            4. Consider diagnostic completeness
-            5. Judge clinical management implications
-            OUTPUT FORMAT (Use full token allocation for comprehensive evaluation):
-            - Score (1-5) with clear label and detailed justification
-            - Comprehensive reasoning referencing specific rubric criteria
-            - Detailed explanation of how diagnosis would affect clinical management
-            - Note any acceptable medical synonyms or equivalent terminology
-            - Analysis of diagnostic accuracy and clinical implications
-            - Systematic comparison with ground truth diagnosis
-            Maintain high standards while recognizing legitimate diagnostic variability in medical practice. Provide comprehensive, detailed evaluation.
-            """,
         }
         # Use existing prompts for other roles, just add dynamic context
@@ -2648,13 +2372,13 @@ def run_mai_dxo_demo(
                     variant,
                     budget=3000,
                     model_name="gemini/gemini-2.5-flash",  # Fixed: Use valid model name
-                    max_iterations=5,
                 )
             else:
                 orchestrator = MaiDxOrchestrator.create_variant(
                     variant,
                     model_name="gemini/gemini-2.5-flash",  # Fixed: Use valid model name
-                    max_iterations=5,
                 )
             result = orchestrator.run(
@@ -2730,13 +2454,13 @@ def run_mai_dxo_demo(
 #                     variant_name,
 #                     budget=3000,
 #                     model_name="gpt-4.1",  # Fixed: Use valid model name
-#                     max_iterations=5,
 #                 )
 #             else:
 #                 orchestrator = MaiDxOrchestrator.create_variant(
 #                     variant_name,
 #                     model_name="gpt-4.1",  # Fixed: Use valid model name
-#                     max_iterations=5,
 #                 )
 #             # Run the diagnostic process

         if case_state:
             dynamic_context = self._get_dynamic_context(role, case_state)
+        # --- Compact, token-efficient prompts ---
         base_prompts = {
+            AgentRole.HYPOTHESIS: f"""{dynamic_context}
+MANDATE: Keep an up-to-date, probability-ranked differential.
+DIRECTIVES:
+1. Return 2-5 diagnoses (prob 0-1) with 1-line rationale.
+2. List key supporting & contradictory evidence.
+You MUST call update_differential_diagnosis().""",
+            AgentRole.TEST_CHOOSER: f"""{dynamic_context}
+MANDATE: Pick the highest-yield tests.
+DIRECTIVES:
+1. Suggest ≤3 tests that best separate current diagnoses.
+2. Note target hypothesis & info-gain vs cost.
+Limit: focus on top 1-2 critical points.""",
+            AgentRole.CHALLENGER: f"""{dynamic_context}
+MANDATE: Expose the biggest flaw or bias.
+DIRECTIVES:
+1. Name the key bias/contradiction.
+2. Propose an alternate diagnosis or falsifying test.
+Reply concisely (top 2 issues).""",
+            AgentRole.STEWARDSHIP: f"""{dynamic_context}
+MANDATE: Ensure cost-effective care.
+DIRECTIVES:
+1. Rate proposed tests (High/Mod/Low value).
+2. Suggest cheaper equivalents where possible.
+Be brief; highlight savings.""",
+            AgentRole.CHECKLIST: f"""{dynamic_context}
+MANDATE: Guarantee quality & consistency.
+DIRECTIVES:
+1. Flag invalid tests or logic gaps.
+2. Note safety concerns.
+Return bullet list of critical items.""",
+            AgentRole.CONSENSUS: f"""{dynamic_context}
+MANDATE: Decide the next action.
+DECISION RULES:
+1. If confidence >85% & no major objection → diagnose.
+2. Else address Challenger's top concern.
+3. Else order highest info-gain (cheapest) test.
+4. Else ask the most informative question.
+You MUST call make_consensus_decision().""",
         }
         # Use existing prompts for other roles, just add dynamic context
                     variant,
                     budget=3000,
                     model_name="gemini/gemini-2.5-flash",  # Fixed: Use valid model name
+                    max_iterations=3,
                 )
             else:
                 orchestrator = MaiDxOrchestrator.create_variant(
                     variant,
                     model_name="gemini/gemini-2.5-flash",  # Fixed: Use valid model name
+                    max_iterations=3,
                 )
             result = orchestrator.run(
 #                     variant_name,
 #                     budget=3000,
 #                     model_name="gpt-4.1",  # Fixed: Use valid model name
+#                     max_iterations=3,
 #                 )
 #             else:
 #                 orchestrator = MaiDxOrchestrator.create_variant(
 #                     variant_name,
 #                     model_name="gpt-4.1",  # Fixed: Use valid model name
+#                     max_iterations=3,
 #                 )
 #             # Run the diagnostic process