File size: 7,117 Bytes
b565952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f63d039
ad58342
23eec73
0ec6e70
b565952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f63d039
ad58342
0ec6e70
b565952
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# Competition Repo

NOTE: Competition repo must always be kept private. Do NOT make it public!

The competition repo consists of the following files:

```
β”œβ”€β”€ COMPETITION_DESC.md
β”œβ”€β”€ conf.json
β”œβ”€β”€ DATASET_DESC.md
β”œβ”€β”€ solution.csv
β”œβ”€β”€ SUBMISSION_DESC.md
β”œβ”€β”€ submission_info
β”‚Β Β  └── *.json
β”œβ”€β”€ submissions
β”‚Β Β  └── *.csv
β”œβ”€β”€ teams.json
└── user_team.json
```

### COMPETITION_DESC.md

This file contains the description of the competition. It is a markdown file.
You can use the markdown syntax to format the text and modify the file according to your needs.
Competition description is shown on the front page of the competition.

### DATASET_DESC.md

This file contains the description of the dataset. It is again a markdown file.
This file is used to describe the dataset and is shown on the dataset page.
In this file you can mention which columns are present in the dataset, what is the meaning of each column, what is the format of the dataset, etc.

### conf.json

conf.json is the configuration file for the competition. An example conf.json is shown below:

```
{
   "COMPETITION_TYPE":"generic",
   "SUBMISSION_LIMIT":5,
   "TIME_LIMIT": 10,
   "HARDWARE":"cpu-basic",
   "SELECTION_LIMIT":10,
   "END_DATE":"2024-05-25",
   "EVAL_HIGHER_IS_BETTER":1,
   "SUBMISSION_ID_COLUMN":"id",
   "SUBMISSION_COLUMNS":"id,pred",
   "SUBMISSION_ROWS":10000,
   "EVAL_METRIC":"roc_auc_score",
   "LOGO":"https://github.com/abhishekkrthakur/public_images/blob/main/song.png?raw=true",
   "DATASET": "",
   "SUBMISSION_FILENAMES": ["submission.csv"],
   "SCORING_METRIC": "roc_auc_score"
}
```

This file is created when you create a new competition. You can modify this file according to your needs.
However, we do not recommend changing the evaluation metric field once the competition has started
as it would require you to re-evaluate all the submissions.

- COMPETITION_TYPE: This field is used to specify the type of competition. Currently, we support two types of competitions: `generic` and `script`.
    - `generic` competition is a competition where the users can submit a csv file (or a different format) and the submissions are evaluated using a metric.
    - `script` competition is a competition where the users can submit a huggingface model repo containing a script.py. The script.py is run to generate submission.csv which is then evaluated using a metric.
- SUBMISSION_LIMIT: This field is used to specify the number of submissions a user can make in a day.
- TIME_LIMIT: This field is used to specify the time limit for each submission in seconds. (used only for `script` competitions)
- HARDWARE: This field is used to specify the hardware on which the submissions will be evaluated.
- SELECTION_LIMIT: This field is used to specify the number of submissions that will be selected for the leaderboard. (used only for `script` competitions)
- END_DATE: This field is used to specify the end date of the competition. The competition will be automatically closed on the end date. Private leaderboard will be made available on the end date.
- EVAL_HIGHER_IS_BETTER: This field is used to specify if the evaluation metric is higher the better or lower the better. If the value is 1, then higher the better. If the value is 0, then lower the better.
- SUBMISSION_ID_COLUMN: This field is used to specify the name of the id column in the submission file.
- SUBMISSION_COLUMNS: This field is used to specify the names of the columns in the submission file. The names must be comma separated without any spaces.
- SUBMISSION_ROWS: This field is used to specify the number of rows in the submission file without the header.
- EVAL_METRIC: This field is used to specify the evaluation metric. We support all the scikit-learn metrics and even custom metrics.
- LOGO: This field is used to specify the logo of the competition. The logo must be a png file. The logo is shown on the all pages of the competition.
- DATASET: This field is used to specify the PRIVATE dataset used in the competition. The dataset is available to the users only during the script run. This is only used for script competitions.
- SUBMISSION_FILENAMES: This field is used to specify the name of the submission file. This is only used for script competitions with custom metrics and must not be changed for generic competitions.
- SCORING_METRIC: When using a custom metric / multiple metrics, this field is used to specify the metric name that will be used for scoring the submissions.

### solution.csv

This file contains the solution for the competition. It is a csv file. A sample is shown below:

```
id,pred,split
0,1,public
1,0,private
2,0,private
3,1,private
4,0,public
5,1,private
6,1,public
7,1,private
8,0,public
9,0,private
10,0,private
11,0,private
12,1,private
13,0,private
14,1,public
````

The solution file is used to evaluate the submissions. The solution file must always have an id column and a split column.
The split column is used to split the solution into public and private parts. The split column can have two values: `public` and `private`.
You can have multiple columns in the solution file. However, the evaluation metric must support multiple columns.

For example, if the evaluation metric is `roc_auc_score` then the solution file must have two columns: `id` and `pred`.
The names of id and pred columns can be anything. The names will be grabbed from the `conf.json` file.
Please make sure you have appropriate column names in the `conf.json` file and that you have both public and private splits in the solution file.

### SUBMISSION_DESC.md

This file contains the description of the submission. It is a markdown file.
You can use the markdown syntax to format the text and modify the file according to your needs.
Submission description is shown on the submission page.

Here you can mention the format of the submission file, what columns are required in the submission file, etc.

For the example solution file shown above, the submission file must have two columns: `id` and `pred`.
An example of sample_submission.csv is shown below:

```
id,pred
0,0.6
1,0.1
2,0.5
3,1.6
4,0.8
5,1
6,1
7,1
8,0
9,0
10,0.1
11,0.4
12,1.9
13,0.01
14,1.1
```

When a user submits a submission file, the system will check if the submission file has the required columns.
If the submission file does not have the required columns, the submission will be rejected.

It is the responsibility of the organzier to make sure they provide a sample submission file in correct format and a submission description file.

### submission_info

This folder contains the submission info files. Each submission info file contains the information about a submission.
This folder is created when a first submission is made. The submission info files are json files.

### submissions

This folder contains the submissions made by the users. Each submission is a csv file (or a different format).
This folder is created when a first submission is made.

### other files

The other files teams.json and user_team.json are used to store the information about the teams.