loodvanniekerkginkgo commited on
Commit
8dcd98f
·
1 Parent(s): 471531b

Text changes

Browse files
Files changed (3) hide show
  1. about.py +32 -18
  2. app.py +1 -1
  3. assets/prediction_explainer.png +2 -2
about.py CHANGED
@@ -1,48 +1,62 @@
1
- ABOUT_INTRO = """
2
  ## About this challenge
3
 
4
- We're inviting the ML/bio community to predict developability properties for 244 antibodies from the [GDPa1 dataset](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1).
5
 
6
- **What is antibody developability and why is it important?**
 
 
7
 
8
  Antibodies have to be manufacturable, stable in high concentrations, and have low off-target effects.
9
  Properties such as these can often hinder the progression of an antibody to the clinic, and are collectively referred to as 'developability'.
10
  Here we show 5 of these properties and invite the community to submit and develop better predictors, which will be tested out on a heldout private set to assess model generalization.
 
 
 
 
 
 
 
11
  """
 
12
 
13
  ABOUT_TEXT = """
14
 
15
- **How to participate?**
16
- You must submit predictions for a validation set before submitting predictions for the private test set.
 
17
 
18
  There are two options for validation sets:
19
- - Track 1: If you already have a developability model, you can submit your predictions for the GDPa1 dataset.
20
  Note that for models trained on the Jain dataset (a subset of the GDPa1 dataset), this evaluation will be overoptimistic and the private test set results will likely be lower.
21
- - Track 2: If you don't have a model, train one using cross-validaiton on the GDPa1 dataset and submit your predictions under the "Cross-validation" option.
22
- If trained only on this data, this will provide you with a more accurate estimate of your model's performance on the private test set.
23
 
24
  Finally, submit your predictions on the heldout private test set. This will not appear on the leaderboard, and will be used to determine the winners at the close of the competition.
25
  We may release private test set results at intermediate points during the competition.
26
 
27
- **How to submit?**
 
 
 
 
 
 
 
28
 
29
- 1. Create a Hugging Face account if you don't have one yet (this is used to track unique submissions).
30
- 2. Download the [GDPa1 dataset](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1)
31
- 3. Make predictions for all the antibody sequences for your property of interest.
32
- 4. Submit a CSV file containing the `"antibody_name"` column and a column from GDPa1 matching the property name you are predicting (e.g. `"antibody_name,Titer"` if you are predicting Titer).
33
- There is an example submission file on the "✉️ Submit" tab.
34
 
35
  For the cross-validation metrics (if training only on the GDPa1 dataset), use the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column to split the dataset into folds and make predictions for each of the folds.
36
  Submit a CSV file in the same format but also containing the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column.
37
  There is also an example cross-validation submission file on the "✉️ Submit" tab, and we will be releasing a full cross-validation code tutorial shortly.
38
 
39
- **How to evaluate?**
40
 
41
- You can calculate the Spearman correlation coefficient on the GDPa1 dataset yourself before uploading to the leaderboard.
42
  Simply use the `spearmanr(predictions, targets, nan_policy='omit')` function from `scipy.stats`.
43
  For the heldout private set, we will calculate these results privately at the end of the competition (and possibly at other points throughout the competition) - but there will not be "rolling results" on the private test set.
44
 
45
- **How to contribute?**
46
 
47
  We'd like to add some more existing models to the leaderboard. Some examples of models we'd like to add:
48
  - ESM embeddings + ridge regression
@@ -68,7 +82,7 @@ FAQS = {
68
  "No. This is just a predictive competition, which will be judged according to the correlation between predictions and experimental values. There may be a generative round in the future."
69
  ),
70
  "Can I participate anonymously?": (
71
- "Yes! Please create an anonymous Hugging Face account so that we can uniquely associate submissions. Note that top participants will be contacted to identify themselves at the end of the tournament."
72
  ),
73
  "How is intellectual property handled?": (
74
  "Participants retain IP rights to the methods they use and develop during the tournament. Read more details in our terms here [link]."
 
1
+ ABOUT_INTRO = f"""
2
  ## About this challenge
3
 
4
+ ### Register [here](https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition)!
5
 
6
+ We're inviting the ML/bio community to predict developability properties based on 5 of the assays in the [GDPa1 dataset](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1) and assessed on a private heldout test set.
7
+
8
+ #### What is antibody developability and why is it important?
9
 
10
  Antibodies have to be manufacturable, stable in high concentrations, and have low off-target effects.
11
  Properties such as these can often hinder the progression of an antibody to the clinic, and are collectively referred to as 'developability'.
12
  Here we show 5 of these properties and invite the community to submit and develop better predictors, which will be tested out on a heldout private set to assess model generalization.
13
+
14
+ #### Prizes
15
+
16
+ There are up to **$60k in prizes** up for grabs! For each of the 5 properties in the competition, there is a prize for the model with the highest performance for that property on the private test set.
17
+ There is also an 'open-source' prize for the best model trained on the GDPa1 dataset (reporting cross-validation results) and assessed on the private test set where authors provide all training code and data.
18
+ For each of these 6 prizes, participants have the choice between **$10k in data generation credits** with [Ginkgo Datapoints](https://datapoints.ginkgo.bio/) or a **cash prize** with a value of $2000.
19
+ See the FAQ below for more details.
20
  """
21
+ # TODO include link to competition terms on datapoints website
22
 
23
  ABOUT_TEXT = """
24
 
25
+ #### How to participate?
26
+
27
+ You must submit predictions for a validation set before submitting predictions for the private test set. You could view the public validation set as the "qualifying exam" and the private test set as the "final exam".
28
 
29
  There are two options for validation sets:
30
+ - **Track 1**: If you already have a developability model, you can submit your predictions for the GDPa1 public dataset.
31
  Note that for models trained on the Jain dataset (a subset of the GDPa1 dataset), this evaluation will be overoptimistic and the private test set results will likely be lower.
32
+ - **Track 2**: If you don't have a model, train one using cross-validation on the GDPa1 dataset and submit your predictions under the "Cross-validation" option.
33
+ If trained only on this data, this will also provide you with a more realistic estimate of your model's performance on the private test set.
34
 
35
  Finally, submit your predictions on the heldout private test set. This will not appear on the leaderboard, and will be used to determine the winners at the close of the competition.
36
  We may release private test set results at intermediate points during the competition.
37
 
38
+ #### How to submit?
39
+
40
+ 1. Create a Hugging Face account if you don't have one yet (this is used to track unique submissions and to access the GDPa1 dataset).
41
+ 2. Register your team on the [Competition Registration](https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition) page.
42
+ 3. Build a model or validate your model using the [GDPa1 dataset](https://huggingface.co/datasets/ginkgo-datapoints/GDPa1).
43
+ 4. Make predictions for all the antibody sequences for your properties of interest.
44
+ 5. Submit a CSV file containing the `"antibody_name"` column and a column from GDPa1 matching the property names you are predicting (e.g. `"antibody_name,Titer,PR_CHO"` if your model predicts Titer and Polyreactivity).
45
+ If you click the "Anonymous" checkbox, your predictions will not be displayed alongside your Hugging Face username but with a random ID.
46
 
47
+ There is an example submission file on the "✉️ Submit" tab. When you are ready to submit your predictions for the private test set, download the test set sequences from the "✉️ Submit" tab and follow the same process.
 
 
 
 
48
 
49
  For the cross-validation metrics (if training only on the GDPa1 dataset), use the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column to split the dataset into folds and make predictions for each of the folds.
50
  Submit a CSV file in the same format but also containing the `"hierarchical_cluster_IgG_isotype_stratified_fold"` column.
51
  There is also an example cross-validation submission file on the "✉️ Submit" tab, and we will be releasing a full cross-validation code tutorial shortly.
52
 
53
+ #### How to evaluate?
54
 
55
+ You can easily calculate the Spearman correlation coefficient on the GDPa1 dataset yourself before uploading to the leaderboard.
56
  Simply use the `spearmanr(predictions, targets, nan_policy='omit')` function from `scipy.stats`.
57
  For the heldout private set, we will calculate these results privately at the end of the competition (and possibly at other points throughout the competition) - but there will not be "rolling results" on the private test set.
58
 
59
+ #### How to contribute?
60
 
61
  We'd like to add some more existing models to the leaderboard. Some examples of models we'd like to add:
62
  - ESM embeddings + ridge regression
 
82
  "No. This is just a predictive competition, which will be judged according to the correlation between predictions and experimental values. There may be a generative round in the future."
83
  ),
84
  "Can I participate anonymously?": (
85
+ "Yes! Please still create an anonymous Hugging Face account so that we can uniquely associate submissions. Note that top participants will be contacted to identify themselves at the end of the tournament."
86
  ),
87
  "How is intellectual property handled?": (
88
  "Participants retain IP rights to the methods they use and develop during the tournament. Read more details in our terms here [link]."
app.py CHANGED
@@ -229,7 +229,7 @@ with gr.Blocks() as demo:
229
  gr.Markdown(
230
  """
231
  <div style="text-align: center; font-size: 14px; color: gray; margin-top: 2em;">
232
- 📬 For questions or feedback, contact <a href="mailto:[email protected]">[email protected]</a> or visit the Community tab at the top of this page.
233
  Visit the <a href="https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition">Competition Registration page</a> to sign up for updates and to register a team.
234
  </div>
235
  """,
 
229
  gr.Markdown(
230
  """
231
  <div style="text-align: center; font-size: 14px; color: gray; margin-top: 2em;">
232
+ 📬 For questions or feedback, contact <a href="mailto:[email protected]">[email protected]</a> or visit the Community tab at the top of this page.<br>
233
  Visit the <a href="https://datapoints.ginkgo.bio/ai-competitions/2025-abdev-competition">Competition Registration page</a> to sign up for updates and to register a team.
234
  </div>
235
  """,
assets/prediction_explainer.png CHANGED

Git LFS Details

  • SHA256: 962f94e31e9a1f34909bf88849b55fa4448f6d461c2148d66f87d0327b56f318
  • Pointer size: 131 Bytes
  • Size of remote file: 135 kB

Git LFS Details

  • SHA256: d9ad3ddc3e4da7261b6b1383315023753fcc3de5ec25d681bbfd0bef14d5ad96
  • Pointer size: 131 Bytes
  • Size of remote file: 154 kB