File size: 1,042 Bytes
7d4b126
 
034ac91
 
 
 
7d4b126
 
 
 
 
034ac91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Python version: 3.10

1. Setup environment
```
export MODEL_ID=openai/o3-mini
export API_KEY=<your_api_key>

python3.10 -m venv venv
source venv/bin/activate
pip3.10 install -r baseline/requirements.txt

```

2. Launch trace collector
```
python -m phoenix.server.main serve &
```

3. Launch baseline
```
# Runs 10 tasks from dev split
python baseline/run.py --model-id $MODEL_ID --api-key $API_KEY --max-tasks 10 --split dev --concurrency 1

# Run all tasks from the dev split
python baseline/run.py --model-id $MODEL_ID --api-key $API_KEY --max-tasks -1 --split dev --concurrency 1

# Run against 10 tasks from the full benchmark (default) split
python baseline/run.py --model-id $MODEL_ID --api-key $API_KEY --max-tasks 10 --split default --concurrency 1

# Run 10 tasks in parallel
python baseline/run.py --model-id $MODEL_ID --api-key $API_KEY --max-tasks 10 --split default --concurrency 10

# Run specific task ids
python baseline/run.py --model-id $MODEL_ID --api-key $API_KEY --tasks-ids 49 5 1273 --split default --concurrency 3
```