chatui-helper / TEST_PROCEDURE.md
milwright's picture
Replace Crawl4AI with simple HTTP requests and BeautifulSoup
12839ce
|
raw
history blame
8.18 kB
# Chat UI Helper - Comprehensive Test Procedure
This document outlines a systematic test procedure for validating the Chat UI Helper application after new commits. This procedure ensures all components function correctly and can be iterated upon as the project evolves.
## Pre-Test Setup
### Environment Verification
```bash
# Verify Python environment
python --version # Should be 3.8+
# Install/update dependencies
pip install -r requirements.txt
# Verify optional dependencies status
python -c "
try:
import sentence_transformers, faiss, fitz, docx
print('βœ… All RAG dependencies available')
except ImportError as e:
print(f'⚠️ Optional RAG dependencies missing: {e}')
"
```
### Test Data Preparation
```bash
# Ensure test document exists
echo "This is a test document for RAG functionality testing." > test_document.txt
# Create test directory structure if needed
mkdir -p test_outputs
```
## Test Categories
### 1. Core Application Tests
#### 1.1 Application Startup
```bash
# Test basic application launch
python app.py &
APP_PID=$!
sleep 10
curl -f http://localhost:7860 > /dev/null && echo "βœ… App started successfully" || echo "❌ App failed to start"
kill $APP_PID
```
#### 1.2 Gradio Interface Validation
- [ ] Application loads without errors
- [ ] Two tabs visible: "Spaces Configuration" and "Chat Support"
- [ ] All form fields render correctly
- [ ] Template selection works (Custom vs Research Assistant)
- [ ] File upload components appear when RAG is enabled
### 2. Vector RAG Component Tests
#### 2.1 Individual Component Testing
```bash
# Test document processing
python -c "from test_vector_db import test_document_processing; test_document_processing()"
# Test vector store functionality
python -c "from test_vector_db import test_vector_store; test_vector_store()"
# Test full RAG pipeline
python -c "from test_vector_db import test_rag_tool; test_rag_tool()"
```
#### 2.2 RAG Integration Tests
- [ ] Document upload accepts PDF, DOCX, TXT, MD files
- [ ] File size validation (10MB limit) works
- [ ] Documents are processed and chunked correctly
- [ ] Vector embeddings are generated
- [ ] Similarity search returns relevant results
- [ ] RAG data serializes/deserializes properly for templates
### 3. Space Generation Tests
#### 3.1 Basic Space Creation
- [ ] Generate space with minimal configuration
- [ ] Verify all required files are created (app.py, requirements.txt, README.md, config.json)
- [ ] Check generated app.py syntax is valid
- [ ] Verify requirements.txt has correct dependencies
- [ ] Ensure README.md contains proper deployment instructions
#### 3.2 Advanced Feature Testing
- [ ] Generate space with URL grounding enabled
- [ ] Generate space with vector RAG enabled
- [ ] Generate space with access code protection
- [ ] Test template substitution works correctly
- [ ] Verify environment variable security pattern
### 4. Web Scraping Tests
#### 4.1 Mock vs Production Mode
```bash
# Test in mock mode (lines 14-18 in app.py)
# Verify placeholder content is returned
# Test in production mode
# Verify actual web content is fetched via HTTP requests
```
#### 4.2 URL Processing
- [ ] Valid URLs are processed correctly
- [ ] Invalid URLs are handled gracefully
- [ ] Content extraction works for different site types
- [ ] Rate limiting and error handling work
### 5. Security and Configuration Tests
#### 5.1 Environment Variable Handling
- [ ] API keys are not embedded in generated templates
- [ ] Access codes use environment variable pattern
- [ ] Sensitive data is properly excluded from version control
#### 5.2 Input Validation
- [ ] File upload validation works
- [ ] URL validation prevents malicious inputs
- [ ] Content length limits are enforced
- [ ] XSS prevention in user inputs
### 6. Chat Support Tests
#### 6.1 OpenRouter Integration
- [ ] Chat responds when API key is configured
- [ ] Proper error message when API key is missing
- [ ] Message history formatting works correctly
- [ ] URL grounding provides relevant context
#### 6.2 Gradio 5.x Compatibility
- [ ] Message format uses `type="messages"`
- [ ] ChatInterface renders correctly
- [ ] User/assistant message distinction works
- [ ] Chat history persists during session
## Automated Test Execution
### Quick Test Suite
```bash
#!/bin/bash
# quick_test.sh - Run essential tests
echo "πŸ” Running Quick Test Suite..."
# 1. Syntax check
python -m py_compile app.py && echo "βœ… app.py syntax valid" || echo "❌ app.py syntax error"
# 2. Import test
python -c "import app; print('βœ… App imports successfully')" 2>/dev/null || echo "❌ Import failed"
# 3. RAG component test (if available)
if python -c "from rag_tool import RAGTool" 2>/dev/null; then
python test_vector_db.py && echo "βœ… RAG tests passed" || echo "❌ RAG tests failed"
else
echo "⚠️ RAG components not available"
fi
# 4. Template generation test
python -c "
import app
result = app.generate_zip('Test Space', 'Test Description', 'Test Role', 'Test Audience', 'Test Tasks', '', [], '', '', 'gpt-3.5-turbo', 0.7, 4000, [], False, False, None)
assert result[0].endswith('.zip'), 'ZIP generation failed'
print('βœ… Space generation works')
"
echo "πŸŽ‰ Quick test suite completed!"
```
### Full Test Suite
```bash
#!/bin/bash
# full_test.sh - Comprehensive testing
echo "πŸ” Running Full Test Suite..."
# Run all component tests
./quick_test.sh
# Additional integration tests
echo "πŸ§ͺ Running integration tests..."
# Test with different configurations
# Test error handling
# Test edge cases
# Performance tests
echo "πŸ“Š Generating test report..."
# Generate detailed test report
```
## Regression Test Checklist
After each commit, verify:
- [ ] All existing functionality still works
- [ ] New features don't break existing features
- [ ] Generated spaces deploy successfully to HuggingFace
- [ ] Documentation is updated appropriately
- [ ] Dependencies are correctly specified
- [ ] Security patterns are maintained
## Performance Benchmarks
### Metrics to Track
- Application startup time
- Space generation time
- Document processing time (for various file sizes)
- Memory usage during RAG operations
- API response times
### Benchmark Commands
```bash
# Startup time
time python -c "import app; print('App loaded')"
# Space generation time
time python -c "
import app
app.generate_zip('Benchmark', 'Test', 'Role', 'Audience', 'Tasks', '', [], '', '', 'gpt-3.5-turbo', 0.7, 4000, [], False, False, None)
"
# RAG processing time
time python -c "from test_vector_db import test_rag_tool; test_rag_tool()"
```
## Test Data Management
### Sample Test Files
- `test_document.txt` - Basic text document
- `sample.pdf` - PDF document for upload testing
- `sample.docx` - Word document for testing
- `sample.md` - Markdown document for testing
### Test Configuration Profiles
- Minimal configuration (basic chat only)
- Research assistant template
- Full-featured (RAG + URL grounding + access control)
- Edge case configurations
## Continuous Integration Integration
### GitHub Actions Integration
```yaml
# .github/workflows/test.yml
name: Test Chat UI Helper
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run test suite
run: ./quick_test.sh
```
## Future Test Enhancements
### Planned Additions
- [ ] Automated UI testing with Selenium
- [ ] Load testing for generated spaces
- [ ] Cross-browser compatibility testing
- [ ] Mobile responsiveness testing
- [ ] Accessibility testing
- [ ] Multi-language content testing
### Test Coverage Goals
- [ ] 90%+ code coverage for core components
- [ ] All user workflows tested end-to-end
- [ ] Error conditions properly tested
- [ ] Performance regression detection
---
**Last Updated**: 2025-07-13
**Version**: 1.0
**Maintained by**: Development Team
This test procedure should be updated whenever new features are added or existing functionality is modified.