Spaces:

milwright
/

chatui-helper

Running

File size: 8,178 Bytes

# Chat UI Helper - Comprehensive Test Procedure

This document outlines a systematic test procedure for validating the Chat UI Helper application after new commits. This procedure ensures all components function correctly and can be iterated upon as the project evolves.

## Pre-Test Setup

### Environment Verification
```bash
# Verify Python environment
python --version  # Should be 3.8+

# Install/update dependencies
pip install -r requirements.txt

# Verify optional dependencies status
python -c "
try:
    import sentence_transformers, faiss, fitz, docx
    print('✅ All RAG dependencies available')
except ImportError as e:
    print(f'⚠️  Optional RAG dependencies missing: {e}')
"
```

### Test Data Preparation
```bash
# Ensure test document exists
echo "This is a test document for RAG functionality testing." > test_document.txt

# Create test directory structure if needed
mkdir -p test_outputs
```

## Test Categories

### 1. Core Application Tests

#### 1.1 Application Startup
```bash
# Test basic application launch
python app.py &
APP_PID=$!
sleep 10
curl -f http://localhost:7860 > /dev/null && echo "✅ App started successfully" || echo "❌ App failed to start"
kill $APP_PID
```

#### 1.2 Gradio Interface Validation
- [ ] Application loads without errors
- [ ] Two tabs visible: "Spaces Configuration" and "Chat Support"
- [ ] All form fields render correctly
- [ ] Template selection works (Custom vs Research Assistant)
- [ ] File upload components appear when RAG is enabled

### 2. Vector RAG Component Tests

#### 2.1 Individual Component Testing
```bash
# Test document processing
python -c "from test_vector_db import test_document_processing; test_document_processing()"

# Test vector store functionality
python -c "from test_vector_db import test_vector_store; test_vector_store()"

# Test full RAG pipeline
python -c "from test_vector_db import test_rag_tool; test_rag_tool()"
```

#### 2.2 RAG Integration Tests
- [ ] Document upload accepts PDF, DOCX, TXT, MD files
- [ ] File size validation (10MB limit) works
- [ ] Documents are processed and chunked correctly
- [ ] Vector embeddings are generated
- [ ] Similarity search returns relevant results
- [ ] RAG data serializes/deserializes properly for templates

### 3. Space Generation Tests

#### 3.1 Basic Space Creation
- [ ] Generate space with minimal configuration
- [ ] Verify all required files are created (app.py, requirements.txt, README.md, config.json)
- [ ] Check generated app.py syntax is valid
- [ ] Verify requirements.txt has correct dependencies
- [ ] Ensure README.md contains proper deployment instructions

#### 3.2 Advanced Feature Testing
- [ ] Generate space with URL grounding enabled
- [ ] Generate space with vector RAG enabled
- [ ] Generate space with access code protection
- [ ] Test template substitution works correctly
- [ ] Verify environment variable security pattern

### 4. Web Scraping Tests

#### 4.1 Mock vs Production Mode
```bash
# Test in mock mode (lines 14-18 in app.py)
# Verify placeholder content is returned

# Test in production mode
# Verify actual web content is fetched via HTTP requests
```

#### 4.2 URL Processing
- [ ] Valid URLs are processed correctly
- [ ] Invalid URLs are handled gracefully
- [ ] Content extraction works for different site types
- [ ] Rate limiting and error handling work

### 5. Security and Configuration Tests

#### 5.1 Environment Variable Handling
- [ ] API keys are not embedded in generated templates
- [ ] Access codes use environment variable pattern
- [ ] Sensitive data is properly excluded from version control

#### 5.2 Input Validation
- [ ] File upload validation works
- [ ] URL validation prevents malicious inputs
- [ ] Content length limits are enforced
- [ ] XSS prevention in user inputs

### 6. Chat Support Tests

#### 6.1 OpenRouter Integration
- [ ] Chat responds when API key is configured
- [ ] Proper error message when API key is missing
- [ ] Message history formatting works correctly
- [ ] URL grounding provides relevant context

#### 6.2 Gradio 5.x Compatibility
- [ ] Message format uses `type="messages"`
- [ ] ChatInterface renders correctly
- [ ] User/assistant message distinction works
- [ ] Chat history persists during session

## Automated Test Execution

### Quick Test Suite
```bash
#!/bin/bash
# quick_test.sh - Run essential tests

echo "🔍 Running Quick Test Suite..."

# 1. Syntax check
python -m py_compile app.py && echo "✅ app.py syntax valid" || echo "❌ app.py syntax error"

# 2. Import test
python -c "import app; print('✅ App imports successfully')" 2>/dev/null || echo "❌ Import failed"

# 3. RAG component test (if available)
if python -c "from rag_tool import RAGTool" 2>/dev/null; then
    python test_vector_db.py && echo "✅ RAG tests passed" || echo "❌ RAG tests failed"
else
    echo "⚠️  RAG components not available"
fi

# 4. Template generation test
python -c "
import app
result = app.generate_zip('Test Space', 'Test Description', 'Test Role', 'Test Audience', 'Test Tasks', '', [], '', '', 'gpt-3.5-turbo', 0.7, 4000, [], False, False, None)
assert result[0].endswith('.zip'), 'ZIP generation failed'
print('✅ Space generation works')
"

echo "🎉 Quick test suite completed!"
```

### Full Test Suite
```bash
#!/bin/bash
# full_test.sh - Comprehensive testing

echo "🔍 Running Full Test Suite..."

# Run all component tests
./quick_test.sh

# Additional integration tests
echo "🧪 Running integration tests..."

# Test with different configurations
# Test error handling
# Test edge cases
# Performance tests

echo "📊 Generating test report..."
# Generate detailed test report
```

## Regression Test Checklist

After each commit, verify:

- [ ] All existing functionality still works
- [ ] New features don't break existing features
- [ ] Generated spaces deploy successfully to HuggingFace
- [ ] Documentation is updated appropriately
- [ ] Dependencies are correctly specified
- [ ] Security patterns are maintained

## Performance Benchmarks

### Metrics to Track
- Application startup time
- Space generation time
- Document processing time (for various file sizes)
- Memory usage during RAG operations
- API response times

### Benchmark Commands
```bash
# Startup time
time python -c "import app; print('App loaded')"

# Space generation time
time python -c "
import app
app.generate_zip('Benchmark', 'Test', 'Role', 'Audience', 'Tasks', '', [], '', '', 'gpt-3.5-turbo', 0.7, 4000, [], False, False, None)
"

# RAG processing time
time python -c "from test_vector_db import test_rag_tool; test_rag_tool()"
```

## Test Data Management

### Sample Test Files
- `test_document.txt` - Basic text document
- `sample.pdf` - PDF document for upload testing
- `sample.docx` - Word document for testing
- `sample.md` - Markdown document for testing

### Test Configuration Profiles
- Minimal configuration (basic chat only)
- Research assistant template
- Full-featured (RAG + URL grounding + access control)
- Edge case configurations

## Continuous Integration Integration

### GitHub Actions Integration
```yaml
# .github/workflows/test.yml
name: Test Chat UI Helper
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Run test suite
        run: ./quick_test.sh
```

## Future Test Enhancements

### Planned Additions
- [ ] Automated UI testing with Selenium
- [ ] Load testing for generated spaces
- [ ] Cross-browser compatibility testing
- [ ] Mobile responsiveness testing
- [ ] Accessibility testing
- [ ] Multi-language content testing

### Test Coverage Goals
- [ ] 90%+ code coverage for core components
- [ ] All user workflows tested end-to-end
- [ ] Error conditions properly tested
- [ ] Performance regression detection

---

**Last Updated**: 2025-07-13
**Version**: 1.0
**Maintained by**: Development Team

This test procedure should be updated whenever new features are added or existing functionality is modified.