Spaces:

SreekarB
/

SLPAnalysis

Sleeping

App Files Files Community

SreekarB commited on Nov 7

Commit

f349331

verified ·

1 Parent(s): 5ec662d

Update annotated_casl_app.py

Browse files

Files changed (1) hide show

annotated_casl_app.py +406 -377

annotated_casl_app.py CHANGED Viewed

@@ -148,7 +148,7 @@ def call_claude_api_with_continuation(prompt, max_continuations=0):
             }
             data = {
-                "model": "claude-haiku-4-5",
                 "max_tokens": 4096,
                 "messages": [
                     {
@@ -278,7 +278,7 @@ def call_claude_api_quick_analysis(prompt):
         }
         data = {
-            "model": "claude-haiku-4-5",
             "max_tokens": 4096,
             "messages": [
                 {
@@ -729,103 +729,255 @@ def analyze_annotated_transcript(annotated_transcript, age, gender, slp_notes):
     """
     analysis_prompt = f"""
-    You are a speech-language pathologist conducting a COMPREHENSIVE analysis of a word-by-word annotated speech sample. Count EVERY marker precisely and provide detailed quantitative analysis.
     Patient: {age}-year-old {gender}
     ANNOTATED TRANSCRIPT:
     {annotated_transcript}{notes_section}
-    ORIGINAL TRANSCRIPT (for reference and backup analysis):
-    {annotated_transcript.replace('[FILLER]', '').replace('[FALSE_START]', '').replace('[REPETITION]', '').replace('[REVISION]', '').replace('[PAUSE]', '').replace('[CIRCUMLOCUTION]', '').replace('[INCOMPLETE]', '').replace('[GENERIC]', '').replace('[WORD_SEARCH]', '').replace('[GRAM_ERROR]', '').replace('[SYNTAX_ERROR]', '').replace('[MORPH_ERROR]', '').replace('[RUN_ON]', '').replace('[SIMPLE_VOCAB]', '').replace('[COMPLEX_VOCAB]', '').replace('[SEMANTIC_ERROR]', '').replace('[TOPIC_SHIFT]', '').replace('[TANGENT]', '').replace('[INAPPROPRIATE]', '').replace('[COHERENCE_BREAK]', '').replace('[SIMPLE_SENT]', '').replace('[COMPLEX_SENT]', '').replace('[COMPOUND_SENT]', '').replace('[FIGURATIVE]', '').replace('[PRONOUN_REF]', '').replace('[MAZING]', '').replace('[PERSEVERATION]', '')}
-    ANALYSIS INSTRUCTIONS:
-    Using the detailed linguistic markers in the annotated transcript, provide a comprehensive analysis with EXACT counts, percentages, and specific examples. If markers are missing or unclear, use the original transcript for backup analysis. Complete ALL sections below:
-    COMPREHENSIVE SPEECH SAMPLE ANALYSIS:
-    1. FLUENCY ANALYSIS (count each marker type):
-    - Count [FILLER] markers: List each instance and calculate rate per 100 words
-    - Count [FALSE_START] markers: List examples and analyze patterns
-    - Count [REPETITION] markers: Categorize by type (word, phrase, sound)
-    - Count [REVISION] markers: Analyze self-correction patterns
-    - Count [PAUSE] markers: Assess hesitation frequency
-    - Calculate total disfluency rate and severity level
-    - Determine impact on communication effectiveness
-    2. WORD RETRIEVAL ANALYSIS (precise counting):
-    - Count [CIRCUMLOCUTION] markers: List each roundabout description
-    - Count [INCOMPLETE] markers: Analyze abandoned thought patterns
-    - Count [GENERIC] markers: Calculate specificity ratio
-    - Count [WORD_SEARCH] markers: Identify retrieval difficulty areas
-    - Count [WORD_FINDING] markers: Assess overall retrieval efficiency
-    - Calculate word-finding accuracy percentage
-    3. GRAMMATICAL ANALYSIS (detailed error counting):
-    - Count [GRAM_ERROR] markers by subcategory:
-      * Verb tense errors
-      * Subject-verb agreement errors
-      * Pronoun errors
-      * Article errors
-    - Count [SYNTAX_ERROR] markers: Analyze word order problems
-    - Count [MORPH_ERROR] markers: Categorize morphological mistakes
-    - Count [RUN_ON] markers: Assess sentence boundary awareness
-    - Calculate grammatical accuracy rate (correct vs. total attempts)
-    4. VOCABULARY ANALYSIS (sophistication assessment):
-    - Count [SIMPLE_VOCAB] markers: List basic vocabulary used
-    - Count [COMPLEX_VOCAB] markers: List sophisticated vocabulary
-    - Count [SEMANTIC_ERROR] markers: Analyze word choice accuracy
-    - Calculate vocabulary sophistication ratio (complex/simple)
-    - Assess semantic appropriateness and precision
-    - Determine vocabulary diversity (type-token ratio)
-    5. PRAGMATIC LANGUAGE ANALYSIS (coherence assessment):
-    - Count [TOPIC_SHIFT] markers: Assess transition appropriateness
-    - Count [TANGENT] markers: Analyze tangential speech patterns
-    - Count [INAPPROPRIATE] markers: Evaluate contextual appropriateness
-    - Count [COHERENCE_BREAK] markers: Assess logical flow
-    - Count [PRONOUN_REF] markers: Analyze referential clarity
-    - Evaluate overall discourse coherence and organization
-    6. SENTENCE COMPLEXITY ANALYSIS (structural assessment):
-    - Count [SIMPLE_SENT] markers: Calculate simple sentence percentage
-    - Count [COMPLEX_SENT] markers: Analyze subordination use
-    - Count [COMPOUND_SENT] markers: Assess coordination patterns
-    - Count [FIGURATIVE] markers: Evaluate figurative language use
-    - Count [MAZING] markers: Assess confusing constructions
-    - Calculate syntactic complexity index
-    7. QUANTITATIVE METRICS (comprehensive calculations):
-    - Total word count and morpheme count
-    - Mean Length of Utterance (MLU) in words and morphemes
-    - Type-Token Ratio (TTR) for vocabulary diversity
-    - Clauses per utterance ratio
-    - Error rate per linguistic domain
-    - Communication efficiency index
-    8. ERROR PATTERN ANALYSIS:
-    - Most frequent error types with exact counts
-    - Error consistency vs. variability patterns
-    - Developmental appropriateness of errors
-    - Severity ranking of different error types
     - Compensatory strategies observed
-    9. CLINICAL IMPLICATIONS:
-    - Primary strengths: List with supporting evidence
-    - Primary weaknesses: Rank by severity with counts
-    - Intervention priorities: Based on error frequency and impact
-    - Therapy targets: Specific, measurable goals
-    - Prognosis indicators: Based on error patterns and consistency
-    10. SUMMARY AND RECOMMENDATIONS:
-    - Overall communication profile with percentile estimates
-    - Priority treatment goals ranked by importance
-    - Functional communication impact assessment
-    - Recommended therapy approaches and frequency
-    - Follow-up assessment timeline
-    CRITICAL: Provide EXACT counts for every marker type, calculate precise percentages, and give specific examples from the transcript. Show your counting work clearly. Complete ALL 12 sections - use <CONTINUE> if needed.
     """
     return call_claude_api_with_continuation(analysis_prompt)
@@ -1431,7 +1583,7 @@ def call_claude_api_with_continuation(prompt):
             }
             data = {
-                "model": "claude-haiku-4-5",
                 "max_tokens": 4096,
                 "messages": [
                     {
@@ -2176,331 +2328,208 @@ with gr.Blocks(title="Speech Analysis", theme=gr.themes.Soft()) as demo:
         return f"{comprehensive_report}\n\n{'='*100}\nCLINICAL INTERPRETATION BASED ON COMPREHENSIVE VERIFIED DATA\n{'='*100}\n\n{ai_interpretation}"
     def run_ultimate_analysis(annotated_transcript, original_transcript, age, gender, slp_notes):
-        """The ultimate analysis: gather all statistical data, then do final 12-section clinical analysis"""
         if not annotated_transcript or len(annotated_transcript.strip()) < 50:
             return "Error: Please provide an annotated transcript for analysis."
-        # STEP 1: Gather ALL statistical data
         linguistic_metrics = calculate_linguistic_metrics(original_transcript)
         marker_analysis = analyze_annotation_markers(annotated_transcript)
         lexical_diversity = calculate_advanced_lexical_diversity(original_transcript)
-        # STEP 2: Get AI clinical insights (for interpretation, not counting)
-        ai_clinical_insights = analyze_with_backup(annotated_transcript, original_transcript, age, gender, slp_notes)
-        # STEP 3: Prepare all verified statistical values for final prompt
         stats_summary = f"""
-        VERIFIED STATISTICAL VALUES (DO NOT RECOUNT - USE THESE EXACT NUMBERS):
-        BASIC METRICS:
-        • Total words: {linguistic_metrics.get('total_words', 0)}
-        • Total sentences: {linguistic_metrics.get('total_sentences', 0)}
-        • Unique words: {linguistic_metrics.get('unique_words', 0)}
-        • MLU (words): {linguistic_metrics.get('mlu_words', 0):.2f}
-        • MLU (morphemes): {linguistic_metrics.get('mlu_morphemes', 0):.2f}
-        • Average sentence length: {linguistic_metrics.get('avg_sentence_length', 0):.2f}
-        • Sentence length std: {linguistic_metrics.get('sentence_length_std', 0):.2f}
-        LEXICAL DIVERSITY MEASURES (from lexical-diversity library):"""
         if lexical_diversity.get('library_available', False) and 'diversity_measures' in lexical_diversity:
             measures = lexical_diversity['diversity_measures']
             stats_summary += f"""
-        • Simple TTR: {measures.get('simple_ttr', 'N/A')}
-        • Root TTR: {measures.get('root_ttr', 'N/A')}
-        • Log TTR: {measures.get('log_ttr', 'N/A')}
-        • Maas TTR: {measures.get('maas_ttr', 'N/A')}
-        • HDD: {measures.get('hdd', 'N/A')}
-        • MSTTR (25-word): {measures.get('msttr_25', 'N/A')}
-        • MSTTR (50-word): {measures.get('msttr_50', 'N/A')}
-        • MATTR (25-word): {measures.get('mattr_25', 'N/A')}
-        • MATTR (50-word): {measures.get('mattr_50', 'N/A')}
-        • MTLD: {measures.get('mtld', 'N/A')}
-        • MTLD (MA wrap): {measures.get('mtld_ma_wrap', 'N/A')}
-        • MTLD (MA bidirectional): {measures.get('mtld_ma_bid', 'N/A')}"""
-        else:
-            stats_summary += "\n        • Lexical diversity measures not available"
-        # Add manual annotation counts
-        marker_counts = marker_analysis['marker_counts']
-        category_totals = marker_analysis['category_totals']
-        total_words = linguistic_metrics.get('total_words', 0)
-        stats_summary += f"""
-        MANUAL ANNOTATION COUNTS:
-        • FILLER markers: {marker_counts.get('FILLER', 0)} ({marker_counts.get('FILLER', 0)/total_words*100:.2f} per 100 words)
-        • FALSE_START markers: {marker_counts.get('FALSE_START', 0)}
-        • REPETITION markers: {marker_counts.get('REPETITION', 0)}
-        • REVISION markers: {marker_counts.get('REVISION', 0)}
-        • PAUSE markers: {marker_counts.get('PAUSE', 0)}
-        • GRAM_ERROR markers: {marker_counts.get('GRAM_ERROR', 0)}
-        • SYNTAX_ERROR markers: {marker_counts.get('SYNTAX_ERROR', 0)}
-        • MORPH_ERROR markers: {marker_counts.get('MORPH_ERROR', 0)}
-        • SIMPLE_VOCAB markers: {marker_counts.get('SIMPLE_VOCAB', 0)}
-        • COMPLEX_VOCAB markers: {marker_counts.get('COMPLEX_VOCAB', 0)}
-        • SIMPLE_SENT markers: {marker_counts.get('SIMPLE_SENT', 0)}
-        • COMPLEX_SENT markers: {marker_counts.get('COMPLEX_SENT', 0)}
-        • COMPOUND_SENT markers: {marker_counts.get('COMPOUND_SENT', 0)}
-        • FIGURATIVE markers: {marker_counts.get('FIGURATIVE', 0)}
-        • PRONOUN_REF markers: {marker_counts.get('PRONOUN_REF', 0)}
-        • TOPIC_SHIFT markers: {marker_counts.get('TOPIC_SHIFT', 0)}
-        • TANGENT markers: {marker_counts.get('TANGENT', 0)}
-        • CIRCUMLOCUTION markers: {marker_counts.get('CIRCUMLOCUTION', 0)}
-        • GENERIC markers: {marker_counts.get('GENERIC', 0)}
-        • WORD_SEARCH markers: {marker_counts.get('WORD_SEARCH', 0)}
-        CATEGORY TOTALS:
-        • Total fluency issues: {category_totals['fluency_issues']} ({category_totals['fluency_issues']/total_words*100:.2f} per 100 words)
-        • Total grammar errors: {category_totals['grammar_errors']} ({category_totals['grammar_errors']/total_words*100:.2f} per 100 words)
-        • Vocabulary sophistication ratio: {category_totals['vocab_sophistication_ratio']:.3f}
         """
-        # STEP 4: Create the final comprehensive prompt
         final_prompt = f"""
-        You are a speech-language pathologist conducting the FINAL COMPREHENSIVE 12-SECTION SPEECH ANALYSIS.
         Patient: {age}-year-old {gender}
         {stats_summary}
-        CLINICAL INSIGHTS FROM AI ANALYSIS (for interpretation guidance):
-        {ai_clinical_insights[:4000]}...
-        ANNOTATED TRANSCRIPT (for specific examples):
         {annotated_transcript}
-        CRITICAL INSTRUCTIONS:
-        1. Use ONLY the verified statistical values provided above - DO NOT recount anything
-        2. Use the clinical insights for interpretation guidance
-        3. Use the annotated transcript for specific examples and quotes
-        4. Complete ALL 12 sections of the comprehensive analysis
-        COMPREHENSIVE SPEECH SAMPLE ANALYSIS:
-        CRITICAL: Provide EXTENSIVE detail with ALL possible examples for each category. Quote liberally from the transcript and provide comprehensive breakdowns.
-        1. SPEECH FACTORS (with EXHAUSTIVE detail and ALL examples):
-        A. Fluency Issues:
-        - Filler words (total: [count]):
-          * "um" ([count]): "Um, it has two..."
-          * "like" ([count]): "is like looks silver", "like fits people"
-          * "I don't know" ([count]): "I don't know how to say..."
-          * Other fillers: [list with counts]
-        - False starts/self-corrections ([count]):
-          * "My bike is like looks silver"
-          * [List other examples]
-        - Repetitions ([count]):
-          * Word repetitions: "golf cart(s)" (Xx), "back" (Xx)
-          * [List other patterns]
         B. Word Retrieval Issues:
-        - Circumlocution ([count]):
-          * "this type of fish in it" (for "anchovies")
-          * "where you go golfing and the golf clubs are in back"
-        - Incomplete thoughts ([count]):
-          * "Like I've seen like a I don't know..."
-        - Word-finding pauses ([count]): [brief description]
-        - Generic language ([count]): "thing," "stuff," "something"
-        C. Grammatical Errors:
-        - Subject-verb agreement ([count]): "there is another type"
-        - Verb tense errors ([count]): [examples]
-        - Pronoun errors ([count]): [examples]
-        - Run-on sentences ([count]): [examples]
-        2. LANGUAGE SKILLS ASSESSMENT (with comprehensive evidence):
-        A. Lexical/Semantic Skills:
-        - Type-Token Ratio: [number] unique words/[number] total words
-        - Vocabulary examples:
-          * Advanced vocabulary: "churrasco," "lo mein," "anchovies"
-          * Transportation terms: "bike," "golf cart," "wheels"
-          * Food terms: "caesar salad," "hummus," "pita bread"
-        - Semantic relationships:
-          * Categories: [examples of categorization]
-          * Part-whole: bike parts, food components
-          * Cause-effect: "pumped up → feels better"
-        B. Syntactic Skills:
-        - Sentence types:
-          * Simple sentences: [count]
-          * Compound sentences: [count]
-          * Complex sentences: [count]
-        - MLU: [number] words, [number] morphemes
-        - Average sentence length: [number] words
-        C. Supralinguistic Skills:
-        - Cause-effect relationships ([count]):
-          * "If you do not have a golf cart driving license you can get busted"
-          * "It feels much better now that it's pumped"
-        - Inferences: [count with examples]
-        - Problem-solving language: [count with examples]
-        3. COMPLEX SENTENCE ANALYSIS (with ALL examples and counts):
-        A. Coordinating Conjunctions:
-        - "and": [count]
-        - "but": [count]
-        - "or": [count]
-        - "so": [count]
-        - "because": [count]
-        B. Subordinating Conjunctions:
-        - "because": [count]
-        - "when": [count]
-        - "if": [count]
-        - "that": [count]
-        - "where": [count]
-        C. Sentence Structure Analysis:
-        - Average sentence length: [number] words
-        - Sentence complexity: [brief description of patterns]
-        4. FIGURATIVE LANGUAGE ANALYSIS (with ALL examples):
-        A. Similes and Metaphors:
-        - Similes ([count]): [examples]
-        - Metaphors ([count]): [examples]
-        - "Like" as filler vs. comparison: [brief analysis]
-        B. Idioms and Non-literal Language:
-        - Idioms ([count]): "get busted"
-        - Colloquialisms ([count]): [examples]
-        5. PRAGMATIC LANGUAGE ASSESSMENT (with detailed examples):
-        A. Discourse Management:
-        - Topic shifts: [count] - quote ALL transitions:
-          * Bike → Golf carts: Quote exact transition
-          * Golf carts → Food: Quote exact transition
-          * Food → Cookies: Quote exact transition
-        - Topic maintenance analysis:
-          * Golf cart topic: [X utterances] - quote entire sequence
-          * Food topic: [X utterances] - quote entire sequence
-        - Topic elaboration: Count details provided per topic
-        B. Referential Communication:
-        - Pronoun reference errors: [count] - quote ALL unclear references
-        - Demonstrative use: [count] - quote ALL "this," "that" uses
-        - Referential clarity: Analyze with specific examples
-        6. VOCABULARY AND SEMANTIC ANALYSIS (comprehensive breakdown):
-        A. Vocabulary Diversity:
-        - ALL lexical diversity measures with interpretations:
-          * Simple TTR: [number] - age comparison
-          * MTLD: [number] - clinical interpretation
-          * HDD: [number] - vocabulary range assessment
-          * MATTR: [number] - moving average interpretation
-        - Most frequent words: List top 20 with frequencies
-        - Vocabulary sophistication by domain with examples
-        B. Semantic Relationships:
-        - Word associations: Analyze patterns with examples
-        - Semantic categories: List ALL categories used
-        - Synonym/antonym use: Quote ALL instances
-        - Semantic precision: Analyze accuracy with examples
-        7. MORPHOLOGICAL AND PHONOLOGICAL ANALYSIS (detailed breakdown):
-        A. Morphological Markers:
-        - Plurals: [count] - list ALL regular and irregular examples
-        - Verb tenses: Break down by type with ALL examples:
-          * Present tense: List 20+ examples
-          * Past tense regular: List ALL examples
-          * Past tense irregular: List ALL examples
-        - Progressive forms: [count] - list ALL "-ing" examples
-        - Possessives: [count] - list ALL examples
-        - Compound words: [count] - list ALL examples
-        - Derivational morphemes: [count] - list ALL prefixes/suffixes
-        B. Phonological Patterns:
-        - Articulation accuracy: Note any sound errors
-        - Syllable structure: Analyze complexity with examples
-        - Prosodic patterns: Describe rhythm and stress
-        8. COGNITIVE-LINGUISTIC FACTORS (with specific evidence):
-        A. Working Memory:
-        - Longest successful utterance: [word count] - quote entire utterance
-        - Complex information management: Quote examples of multi-part descriptions
-        - Information retention across narrative: Analyze with examples
-        B. Processing Speed:
-        - Word-finding efficiency: [count delays] - quote ALL instances
-        - Response fluency: Analyze patterns with examples
-        - Processing load indicators: Identify with examples
-        C. Executive Function:
-        - Self-monitoring: [count] - quote ALL self-correction instances
-        - Planning evidence: Analyze organization with examples
-        - Cognitive flexibility: Analyze topic management with examples
-        9. FLUENCY AND RHYTHM ANALYSIS (comprehensive measurement):
-        A. Disfluency Patterns:
-        - Total disfluency count: [number] with rate per 100 words
-        - Disfluency types breakdown with ALL examples
-        - Severity assessment compared to age norms
-        - Impact on communication effectiveness
-        B. Language Flow:
-        - Natural pause patterns: [count] - identify ALL appropriate pauses
-        - Disrupted flow instances: [count] - quote ALL with analysis
-        - Rhythm variation: Describe patterns with examples
-        10. QUANTITATIVE METRICS (report ALL data with calculations shown):
-        - Total words: [count] (method of counting explained)
-        - Total sentences: [count] (criteria for sentence boundaries)
-        - Unique words: [count] (how uniqueness determined)
-        - MLU words: [calculation] ([total words]/[utterances])
-        - MLU morphemes: [calculation] ([total morphemes]/[utterances])
-        - ALL lexical diversity measures: [list with values and interpretations]
-        - Error rates: [calculations] for each error type
-        - Age-appropriate comparisons for ALL measures
-        11. CLINICAL IMPLICATIONS (evidence-based with priorities):
-        A. Strengths (ranked with evidence):
-        1. [Primary strength] - specific evidence with quotes and data
-        2. [Secondary strength] - specific evidence with quotes and data
-        3. [Continue ranking ALL strengths with supporting evidence]
         B. Areas of Need (prioritized by severity):
-        1. [Highest priority] - severity data, impact analysis, examples
-        2. [Second priority] - severity data, impact analysis, examples
-        3. [Continue with ALL areas needing intervention]
-        C. Treatment Recommendations (specific and measurable):
-        1. [Primary goal] - specific techniques, frequency, duration, success criteria
-        2. [Secondary goal] - specific techniques, frequency, duration, success criteria
-        3. [Continue with ALL treatment recommendations]
-        12. PROGNOSIS AND SUMMARY (comprehensive profile):
-        - Overall severity rating: [level] with detailed justification
-        - Developmental appropriateness: Compare ALL skills to age expectations
-        - Functional communication impact: Real-world implications
-        - Prognosis: Specific predictions with timelines and success indicators
-        - Monitoring plan: Specific measures and reassessment schedule
-        12. PROGNOSIS AND SUMMARY:
-        Overall profile based on comprehensive verified data
-        REQUIREMENTS:
-        - Complete ALL 12 sections
-        - Use ONLY verified statistical values (never recount)
-        - Cite specific examples from annotated transcript
-        - Provide clinical interpretation of the verified data
-        - If response is cut off, end with <CONTINUE>
         """
-        # STEP 5: Get the final comprehensive analysis
         final_result = call_claude_api_with_continuation(final_prompt)
         return final_result
     def run_full_pipeline(transcript_content, age, gender, slp_notes):
         """Run the complete pipeline but return annotation immediately"""
         if not transcript_content or len(transcript_content.strip()) < 50:

             }
             data = {
+                "model": "claude-sonnet-4-5",
                 "max_tokens": 4096,
                 "messages": [
                     {
         }
         data = {
+            "model": "claude-sonnet-4-5",
             "max_tokens": 4096,
             "messages": [
                 {
     """
     analysis_prompt = f"""
+    You are a speech-language pathologist conducting a comprehensive analysis of an annotated speech sample. Provide a complete, clinically useful analysis without excessive formatting.
     Patient: {age}-year-old {gender}
     ANNOTATED TRANSCRIPT:
     {annotated_transcript}{notes_section}
+    INSTRUCTIONS: Complete ALL 12 sections below. Use simple formatting (no excessive markdown, headers, or bullets). Focus on clinical utility and completeness. Count all markers precisely and provide specific examples.
+    COMPREHENSIVE SPEECH SAMPLE ANALYSIS
+    1. SPEECH FACTORS
+    A. Fluency Issues (count each marker type precisely):
+    - Filler words ([FILLER]): Count all instances, calculate rate per 100 words
+      * List each type: "um," "uh," "like," "you know," etc.
+      * Provide specific examples with context
+      * Calculate percentage of total words
+    - False starts ([FALSE_START]): Count and categorize
+      * Word-level false starts: "I was go- going"
+      * Phrase-level false starts: "My bike is- I mean my bike looks"
+      * Provide exact quotes from transcript
+    - Repetitions ([REPETITION]): Count and categorize by type
+      * Word repetitions: "I I went"
+      * Phrase repetitions: "to the store to the store"
+      * Sound repetitions: "b-b-bike"
+    - Revisions ([REVISION]): Count self-corrections and analyze patterns
+      * Grammatical revisions: "I goed- I went"
+      * Lexical revisions: "big- huge dog"
+      * Semantic revisions: "car- I mean bike"
+    - Pauses ([PAUSE]): Count hesitation markers and silent pauses
+    - Total disfluency rate: Calculate combined rate per 100 words
+    - Severity assessment: Compare to age norms
+    B. Word Retrieval Issues (detailed analysis):
+    - Circumlocutions ([CIRCUMLOCUTION]): Count and analyze strategies
+      * Functional descriptions: "the thing you write with"
+      * Category + description: "that type of fish in the salad"
+      * Provide exact quotes and analyze effectiveness
+    - Incomplete thoughts ([INCOMPLETE]): Count abandoned utterances
+      * Analyze patterns: topic-related, complexity-related, retrieval-related
+    - Generic terms ([GENERIC]): Count vague language
+      * "thing," "stuff," "something," "whatsit"
+      * Calculate specificity ratio
+    - Word searches ([WORD_SEARCH]): Count explicit retrieval attempts
+      * "What do you call it," "I can't think of the word"
+    - Overall efficiency: Calculate success rate of retrieval attempts
+    C. Grammatical Errors (comprehensive breakdown):
+    - Grammatical errors ([GRAM_ERROR]): Count by subcategory
+      * Subject-verb agreement: "He don't like it"
+      * Verb tense errors: "Yesterday I go to store"
+      * Pronoun errors: "Me and him went"
+      * Article errors: "I saw a elephant"
+    - Syntax errors ([SYNTAX_ERROR]): Count word order problems
+    - Morphological errors ([MORPH_ERROR]): Count and categorize
+      * Plural errors: "childs," "foots"
+      * Past tense errors: "runned," "catched"
+      * Comparative errors: "more better"
+    - Run-on sentences ([RUN_ON]): Count and assess boundary awareness
+    - Calculate grammatical accuracy rate
+    2. LANGUAGE SKILLS ASSESSMENT
+    A. Vocabulary Analysis (detailed breakdown):
+    - Simple vocabulary ([SIMPLE_VOCAB]): Count and categorize
+      * High-frequency words: "go," "big," "good"
+      * Basic descriptors: "nice," "fun," "cool"
+      * Calculate percentage of total vocabulary
+    - Complex vocabulary ([COMPLEX_VOCAB]): Count and analyze
+      * Academic vocabulary: "magnificent," "elaborate"
+      * Technical terms: "carburetor," "photosynthesis"
+      * Low-frequency words: "churrasco," "anchovies"
+    - Vocabulary sophistication ratio: Complex/simple vocabulary
+    - Type-token ratio: Unique words/total words
+    - Semantic appropriateness: Analyze precision and context fit
+    - Word frequency analysis: Identify most common words used
+    B. Grammar and Morphology (systematic analysis):
+    - Morphological complexity assessment
+    - Derivational morpheme use: prefixes, suffixes
+    - Inflectional morphology: plurals, tense, agreement
+    - Compound word formation
+    - Error pattern analysis by morpheme type
+    3. COMPLEX SENTENCE ANALYSIS
+    A. Sentence Structure Distribution:
+    - Simple sentences ([SIMPLE_SENT]): Count and calculate percentage
+      * Subject + predicate: "I went home"
+      * Analyze average length and complexity
+    - Complex sentences ([COMPLEX_SENT]): Count subordination patterns
+      * Adverbial clauses: "When I got home, I ate dinner"
+      * Relative clauses: "The bike that I rode was red"
+      * Noun clauses: "I know that he likes pizza"
+    - Compound sentences ([COMPOUND_SENT]): Count coordination patterns
+      * Coordinating conjunctions: "and," "but," "or," "so"
+      * Analyze balance and appropriateness
+    B. Syntactic Complexity Measures:
+    - Mean Length of Utterance (MLU): Words and morphemes
+    - Clauses per utterance ratio
+    - Subordination index
+    - Coordination index
+    - Developmental appropriateness assessment
+    4. FIGURATIVE LANGUAGE ANALYSIS
+    A. Non-literal Language Use:
+    - Figurative expressions ([FIGURATIVE]): Count and analyze
+      * Metaphors: "Time is money"
+      * Similes: "Fast as lightning"
+      * Idioms: "Raining cats and dogs"
+    - Appropriateness assessment: Context and age-level
+    - Comprehension vs. production abilities
+    - Abstract language development indicators
+    5. PRAGMATIC LANGUAGE ASSESSMENT
+    A. Discourse Management:
+    - Topic management ([TOPIC_SHIFT]): Count and assess appropriateness
+      * Smooth transitions vs. abrupt shifts
+      * Topic maintenance duration
+      * Elaboration and detail provision
+    - Tangential speech ([TANGENT]): Count off-topic instances
+    - Discourse coherence ([COHERENCE_BREAK]): Analyze logical flow
+    - Narrative structure and organization
+    B. Referential Communication:
+    - Referential clarity ([PRONOUN_REF]): Count unclear references
+      * Ambiguous pronouns: "He told him that he was wrong"
+      * Missing referents: "It was really good" (unclear antecedent)
+    - Demonstrative use: "this," "that," "these," "those"
+    - Overall conversational competence assessment
+    6. VOCABULARY AND SEMANTIC ANALYSIS
+    A. Semantic Accuracy and Precision:
+    - Semantic errors ([SEMANTIC_ERROR]): Count inappropriate word choices
+      * Word substitutions: "I drove my bicycle"
+      * Category errors: "I petted the bird" (for touched)
+    - Word association patterns and semantic relationships
+    - Semantic categories: Analyze breadth and organization
+    - Precision of word choice: Specific vs. general terms
+    B. Lexical Diversity and Sophistication:
+    - Vocabulary breadth: Range of semantic categories
+    - Vocabulary depth: Precision and nuance within categories
+    - Academic vs. conversational vocabulary ratio
+    - Age-appropriate vocabulary development
+    7. MORPHOLOGICAL AND PHONOLOGICAL ANALYSIS
+    A. Morphological Patterns:
+    - Derivational morphology: Prefixes and suffixes
+    - Inflectional morphology: Tense, number, case markers
+    - Morphological awareness indicators
+    - Error patterns and developmental appropriateness
+    B. Phonological Considerations:
+    - Sound pattern analysis (if evident in transcript)
+    - Syllable structure complexity
+    - Phonological awareness indicators
+    8. COGNITIVE-LINGUISTIC FACTORS
+    A. Working Memory Indicators:
+    - Sentence length and complexity management
+    - Information retention across utterances
+    - Complex information processing evidence
+    B. Processing Speed and Efficiency:
+    - Word-finding speed and accuracy
+    - Response latency patterns
+    - Processing load indicators
+    C. Executive Function Evidence:
+    - Self-monitoring and error correction
+    - Planning and organization in discourse
+    - Cognitive flexibility in topic management
+    9. FLUENCY AND RHYTHM ANALYSIS
+    A. Disfluency Patterns:
+    - Total disfluency count and rate per 100 words
+    - Disfluency type distribution
+    - Clustering patterns and severity assessment
+    - Impact on communication effectiveness
+    B. Speech Flow and Rhythm:
+    - Natural pause patterns vs. disrupted flow
+    - Rhythm and prosodic patterns (if evident)
+    - Overall fluency profile and age-appropriateness
+    10. QUANTITATIVE METRICS
+    A. Basic Measures:
+    - Total words: [exact count]
+    - Total sentences: [exact count]
+    - Unique words: [exact count]
+    - MLU words: [calculation with formula shown]
+    - MLU morphemes: [calculation with formula shown]
+    - Type-Token Ratio: [calculation and interpretation]
+    B. Error Rates and Ratios:
+    - Disfluency rate per 100 words
+    - Grammatical accuracy percentage
+    - Vocabulary sophistication ratio
+    - Sentence complexity distribution percentages
+    11. CLINICAL IMPLICATIONS
+    A. Strengths (ranked by prominence):
+    - Primary strengths with supporting evidence
+    - Secondary strengths with examples
     - Compensatory strategies observed
+    B. Areas of Need (prioritized by severity):
+    - Primary concerns with impact assessment
+    - Secondary concerns with supporting data
+    - Developmental vs. disorder considerations
+    C. Treatment Recommendations:
+    - Specific, measurable therapy goals
+    - Intervention approaches and techniques
+    - Frequency and duration recommendations
+    - Progress monitoring strategies
+    12. PROGNOSIS AND SUMMARY
+    A. Overall Communication Profile:
+    - Comprehensive summary of findings
+    - Developmental appropriateness assessment
+    - Functional communication impact
+    B. Treatment Planning:
+    - Priority intervention targets
+    - Expected outcomes and timeline
+    - Follow-up assessment recommendations
+    - Family/educational recommendations
+    CRITICAL REQUIREMENTS:
+    1. Complete ALL 12 sections - do not stop early
+    2. Provide exact counts for all markers with specific examples
+    3. Calculate all percentages and rates with formulas shown
+    4. Include direct quotes from transcript for examples
+    5. Analyze patterns and provide clinical interpretations
+    6. Focus on actionable, clinically relevant insights
+    7. If response is incomplete, end with <CONTINUE>
     """
     return call_claude_api_with_continuation(analysis_prompt)
             }
             data = {
+                "model": "claude-sonnet-4-5",
                 "max_tokens": 4096,
                 "messages": [
                     {
         return f"{comprehensive_report}\n\n{'='*100}\nCLINICAL INTERPRETATION BASED ON COMPREHENSIVE VERIFIED DATA\n{'='*100}\n\n{ai_interpretation}"
     def run_ultimate_analysis(annotated_transcript, original_transcript, age, gender, slp_notes):
+        """Clean comprehensive analysis using verified statistical data"""
         if not annotated_transcript or len(annotated_transcript.strip()) < 50:
             return "Error: Please provide an annotated transcript for analysis."
+        # Gather statistical data
         linguistic_metrics = calculate_linguistic_metrics(original_transcript)
         marker_analysis = analyze_annotation_markers(annotated_transcript)
         lexical_diversity = calculate_advanced_lexical_diversity(original_transcript)
+        # Prepare verified statistics
+        marker_counts = marker_analysis['marker_counts']
+        category_totals = marker_analysis['category_totals']
+        total_words = linguistic_metrics.get('total_words', 0)
         stats_summary = f"""
+        VERIFIED STATISTICAL DATA:
+        Basic Metrics:
+        - Total words: {total_words}
+        - Total sentences: {linguistic_metrics.get('total_sentences', 0)}
+        - Unique words: {linguistic_metrics.get('unique_words', 0)}
+        - MLU words: {linguistic_metrics.get('mlu_words', 0):.2f}
+        - MLU morphemes: {linguistic_metrics.get('mlu_morphemes', 0):.2f}
+        - Average sentence length: {linguistic_metrics.get('avg_sentence_length', 0):.2f}
+        Annotation Counts:
+        - Filler markers: {marker_counts.get('FILLER', 0)} ({marker_counts.get('FILLER', 0)/total_words*100:.2f} per 100 words)
+        - False starts: {marker_counts.get('FALSE_START', 0)}
+        - Repetitions: {marker_counts.get('REPETITION', 0)}
+        - Grammar errors: {marker_counts.get('GRAM_ERROR', 0)}
+        - Simple vocabulary: {marker_counts.get('SIMPLE_VOCAB', 0)}
+        - Complex vocabulary: {marker_counts.get('COMPLEX_VOCAB', 0)}
+        - Simple sentences: {marker_counts.get('SIMPLE_SENT', 0)}
+        - Complex sentences: {marker_counts.get('COMPLEX_SENT', 0)}
+        - Compound sentences: {marker_counts.get('COMPOUND_SENT', 0)}
+        Category Totals:
+        - Total fluency issues: {category_totals['fluency_issues']} ({category_totals['fluency_issues']/total_words*100:.2f} per 100 words)
+        - Total grammar errors: {category_totals['grammar_errors']}
+        - Vocabulary sophistication ratio: {category_totals['vocab_sophistication_ratio']:.3f}
+        """
         if lexical_diversity.get('library_available', False) and 'diversity_measures' in lexical_diversity:
             measures = lexical_diversity['diversity_measures']
             stats_summary += f"""
+        Lexical Diversity:
+        - Simple TTR: {measures.get('simple_ttr', 'N/A')}
+        - HDD: {measures.get('hdd', 'N/A')}
+        - MTLD: {measures.get('mtld', 'N/A')}
+        - MATTR: {measures.get('mattr_25', 'N/A')}
         """
+        # Create comprehensive analysis prompt
         final_prompt = f"""
+        You are a speech-language pathologist conducting a comprehensive speech analysis. Use the verified statistical data provided and complete ALL 12 sections with detailed structure.
         Patient: {age}-year-old {gender}
         {stats_summary}
+        ANNOTATED TRANSCRIPT (for examples and quotes):
         {annotated_transcript}
+        INSTRUCTIONS:
+        1. Use ONLY the verified statistical values above - do not recount anything
+        2. Complete ALL 12 sections without stopping
+        3. Provide specific examples and quotes from the transcript
+        4. Calculate rates and percentages using verified counts
+        5. Focus on clinical interpretation and actionable insights
+        6. If response is incomplete, end with <CONTINUE>
+        COMPREHENSIVE SPEECH SAMPLE ANALYSIS
+        1. SPEECH FACTORS
+        A. Fluency Issues (use verified counts):
+        - Filler words: Use verified count of {marker_counts.get('FILLER', 0)} fillers
+          * Calculate rate per 100 words: {marker_counts.get('FILLER', 0)/total_words*100:.2f}%
+          * Identify types and provide examples from transcript
+          * Assess severity and impact on communication
+        - False starts: Use verified count of {marker_counts.get('FALSE_START', 0)}
+          * Provide specific examples from transcript
+          * Analyze patterns and self-correction abilities
+        - Repetitions: Use verified count of {marker_counts.get('REPETITION', 0)}
+          * Categorize types (word, phrase, sound level)
+          * Provide examples and assess severity
+        - Total disfluency assessment: Use verified total of {category_totals['fluency_issues']}
+          * Rate: {category_totals['fluency_issues']/total_words*100:.2f} per 100 words
+          * Compare to age norms and assess severity
         B. Word Retrieval Issues:
+        - Circumlocutions: Count and analyze from transcript
+        - Incomplete thoughts: Identify abandoned utterances
+        - Generic language use: Count vague terms
+        - Word-finding efficiency: Assess retrieval success rate
+        C. Grammatical Errors (use verified counts):
+        - Grammar errors: Use verified count of {marker_counts.get('GRAM_ERROR', 0)}
+        - Syntax errors: Use verified count of {marker_counts.get('SYNTAX_ERROR', 0)}
+        - Morphological errors: Use verified count of {marker_counts.get('MORPH_ERROR', 0)}
+        - Calculate overall grammatical accuracy rate
+        2. LANGUAGE SKILLS ASSESSMENT
+        A. Vocabulary Analysis (use verified data):
+        - Simple vocabulary: Use verified count of {marker_counts.get('SIMPLE_VOCAB', 0)}
+        - Complex vocabulary: Use verified count of {marker_counts.get('COMPLEX_VOCAB', 0)}
+        - Sophistication ratio: Use verified ratio of {category_totals['vocab_sophistication_ratio']:.3f}
+        - Type-Token Ratio: Use verified TTR from basic metrics
+        - Provide examples of each vocabulary level from transcript
+        B. Grammar and Morphology:
+        - Error pattern analysis using verified counts
+        - Developmental appropriateness assessment
+        - Morphological complexity evaluation
+        3. COMPLEX SENTENCE ANALYSIS (use verified counts)
+        A. Sentence Structure Distribution:
+        - Simple sentences: Use verified count of {marker_counts.get('SIMPLE_SENT', 0)}
+        - Complex sentences: Use verified count of {marker_counts.get('COMPLEX_SENT', 0)}
+        - Compound sentences: Use verified count of {marker_counts.get('COMPOUND_SENT', 0)}
+        - Calculate percentages of each type
+        B. Syntactic Complexity:
+        - MLU analysis: Use verified MLU of {linguistic_metrics.get('mlu_words', 0):.2f} words
+        - Average sentence length: Use verified length of {linguistic_metrics.get('avg_sentence_length', 0):.2f} words
+        - Subordination and coordination patterns
+        4. FIGURATIVE LANGUAGE ANALYSIS
+        - Figurative expressions: Use verified count of {marker_counts.get('FIGURATIVE', 0)}
+        - Metaphor and idiom identification from transcript
+        - Age-appropriate development assessment
+        - Abstract language abilities
+        5. PRAGMATIC LANGUAGE ASSESSMENT
+        - Topic shifts: Use verified count of {marker_counts.get('TOPIC_SHIFT', 0)}
+        - Tangential speech: Use verified count of {marker_counts.get('TANGENT', 0)}
+        - Coherence breaks: Use verified count of {marker_counts.get('COHERENCE_BREAK', 0)}
+        - Referential clarity: Use verified count of {marker_counts.get('PRONOUN_REF', 0)}
+        - Overall conversational competence assessment
+        6. VOCABULARY AND SEMANTIC ANALYSIS
+        - Semantic errors: Use verified count of {marker_counts.get('SEMANTIC_ERROR', 0)}
+        - Lexical diversity: Use verified measures from stats summary
+        - Word association patterns from transcript analysis
+        - Semantic precision and appropriateness
+        7. MORPHOLOGICAL AND PHONOLOGICAL ANALYSIS
+        - Morphological complexity assessment
+        - Derivational and inflectional morphology patterns
+        - Error analysis using verified counts
+        - Developmental appropriateness
+        8. COGNITIVE-LINGUISTIC FACTORS
+        - Working memory indicators from sentence complexity
+        - Processing speed markers from fluency patterns
+        - Executive function evidence from self-corrections
+        - Attention and cognitive load management
+        9. FLUENCY AND RHYTHM ANALYSIS
+        - Disfluency pattern analysis using verified counts
+        - Speech rhythm and flow assessment
+        - Natural vs. disrupted pause patterns
+        - Overall fluency profile
+        10. QUANTITATIVE METRICS (use ALL verified data)
+        - Total words: {total_words}
+        - Total sentences: {linguistic_metrics.get('total_sentences', 0)}
+        - Unique words: {linguistic_metrics.get('unique_words', 0)}
+        - MLU words: {linguistic_metrics.get('mlu_words', 0):.2f}
+        - MLU morphemes: {linguistic_metrics.get('mlu_morphemes', 0):.2f}
+        - All error rates and ratios from verified counts
+        11. CLINICAL IMPLICATIONS
+        A. Strengths (with supporting evidence):
+        - Identify primary strengths using verified data
+        - Provide specific examples from transcript
         B. Areas of Need (prioritized by severity):
+        - Primary concerns based on verified counts and rates
+        - Secondary areas for intervention
+        C. Treatment Recommendations:
+        - Specific, measurable therapy goals
+        - Evidence-based intervention approaches
+        - Progress monitoring strategies
+        12. PROGNOSIS AND SUMMARY
+        - Overall communication profile synthesis
+        - Functional impact assessment
+        - Treatment planning and expected outcomes
+        - Follow-up recommendations
+        CRITICAL: Complete ALL 12 sections using verified data and specific transcript examples.
         """
+        # Get comprehensive analysis
         final_result = call_claude_api_with_continuation(final_prompt)
         return final_result
     def run_full_pipeline(transcript_content, age, gender, slp_notes):
         """Run the complete pipeline but return annotation immediately"""
         if not transcript_content or len(transcript_content.strip()) < 50: