CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is TCID (Transformer CI Dashboard) - a Gradio-based web dashboard that displays test results for Transformer models across AMD and NVIDIA hardware. The application fetches CI test data from HuggingFace datasets and presents it through interactive visualizations and detailed failure reports.

Architecture

Core Components

app.py - Main Gradio application with UI components, plotting functions, and data visualization logic
data.py - Data fetching module that retrieves test results from HuggingFace datasets for AMD and NVIDIA CI runs
styles.css - Complete dark theme styling for the Gradio interface
requirements.txt - Python dependencies (matplotlib only)

Data Flow

Data Loading: get_data() in data.py fetches latest CI results from:
- AMD: hf://datasets/optimum-amd/transformers_daily_ci
- NVIDIA: hf://datasets/hf-internal-testing/transformers_daily_ci
Data Processing: Results are joined and filtered to show only important models defined in IMPORTANT_MODELS list
Visualization: Two main views:
- Summary Page: Horizontal bar charts showing test results for all models
- Detail View: Pie charts for individual models with failure details

UI Architecture

Sidebar: Model selection, refresh controls, CI job links
Main Content: Dynamic display switching between summary and detail views
Auto-refresh: Data reloads every 15 minutes via background threading

Running the Application

Development Commands

# Install dependencies
pip install -r requirements.txt

# Run the application
python app.py

HuggingFace Spaces Deployment

This application is configured for HuggingFace Spaces deployment:

Framework: Gradio 5.38.0
App file: app.py
Configuration: See README.md header for Spaces metadata

Key Data Structures

Model Results DataFrame

The joined DataFrame contains these columns:

success_amd / success_nvidia - Number of passing tests
failed_multi_no_amd / failed_multi_no_nvidia - Multi-GPU failure counts
failed_single_no_amd / failed_single_no_nvidia - Single-GPU failure counts
failures_amd / failures_nvidia - Detailed failure information objects
job_link_amd / job_link_nvidia - CI job URLs

Important Models List

Predefined list in data.py focusing on significant models:

Classic models: bert, gpt2, t5, vit, clip, whisper
Modern models: llama, gemma3, qwen2, mistral3
Multimodal: qwen2_5_vl, llava, smolvlm, internvl

Styling and Theming

The application uses a comprehensive dark theme with:

Fixed sidebar layout (300px width)
Black background throughout (#000000)
Custom scrollbars with dark styling
Monospace fonts for technical aesthetics
Gradient buttons and hover effects

Error Handling

Data Loading Failures: Falls back to predefined model list for testing
Missing Model Data: Shows "No data available" message in visualizations
Empty Results: Gracefully handles cases with no test results

Performance Considerations

Memory Management: Matplotlib configured to prevent memory warnings
Interactive Mode: Disabled to prevent figure accumulation
Auto-reload: Background threading with daemon timers
Data Caching: Global variables store loaded data between UI updates