RagBenchCapstone10 / report /finalreport.md
swaroop-uddandarao
modified reports
fed116a

A newer version of the Gradio SDK is available: 5.30.0

Upgrade

Performance Analysis Report

  1. Retrieval Time:

    • Milvus + LLaMA: 0.132s
    • Weaviate + Mistral: 0.157s
    • Milvus + Mistral: NaN
  2. Context Relevance (higher is better):

    • Milvus + LLaMA: 0.640
    • Weaviate + Mistral: 0.591
    • Milvus + Mistral: 0.518
  3. Context Utilization (higher is better):

    • Milvus + LLaMA: 0.673
    • Weaviate + Mistral: 0.619
    • Milvus + Mistral: 0.614
  4. AUCROC (Area Under ROC Curve):

    • Milvus + LLaMA: 0.912
    • Weaviate + Mistral: 0.750
    • Milvus + Mistral: 0.844
  5. RMSE (Root Mean Square Error):

    • Milvus + LLaMA:
      • Context Relevance RMSE: 0.179
      • Context Utilization RMSE: 0.302
    • Weaviate + Mistral:
      • Context Relevance RMSE: 0.414
      • Context Utilization RMSE: 0.482
    • Milvus + Mistral:
      • Context Relevance RMSE: 0.167
      • Context Utilization RMSE: 0.258

Analysis

  1. Best Overall Performance: Milvus + LLaMA

    • Highest AUCROC score (0.912)
    • Best context relevance (0.640) and utilization (0.673)
    • Fast retrieval time (0.132s)
    • Moderate RMSE scores
  2. Runner-up: Milvus + Mistral

    • Second-best AUCROC (0.844)
    • Lowest RMSE scores overall
    • Lower context relevance and utilization
    • Retrieval time data unavailable
  3. Third Place: Weaviate + Mistral

    • Lowest AUCROC (0.750)
    • Highest RMSE scores
    • Slowest retrieval time (0.157s)
    • Moderate context metrics

Recommendation

Based on the comprehensive analysis of all metrics, Milvus + LLaMA emerges as the optimal choice for overall performance. It demonstrates:

  • Superior accuracy (highest AUCROC)
  • Better context handling capabilities
  • Efficient retrieval speed
  • Reasonable error rates

However, if minimizing error (RMSE) is the primary objective, Milvus + Mistral could be a viable alternative due to its lower error rates in both context relevance and utilization metrics.