biology
MiguelBraganca maximeseince commited on
Commit
fe0cd9a
·
verified ·
1 Parent(s): 55f4680

Add Readme and visualization (#1)

Browse files

- Add Readme and visualization (07adacdd4686cb3f4a266262fdaa66dc90e6a774)


Co-authored-by: Maxime Seince <[email protected]>

Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +131 -0
  3. abbfn2_overview.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ abbfn2_overview.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks
2
+
3
+ Welcome to the inference code of AbBFN2, a state-of-the-art model for antibody sequence generation.
4
+
5
+ ## Overview
6
+
7
+ AbBFN2 is a generative antibody foundation model trained on a rich dataset of paired antibody sequences alongside their genetic and biophysical metadata. This allows for a unified modelling of diverse data sources and flexible conditional generation at inference time.
8
+ AbBFN2 leverages the Bayesian Flow Network (BFN) framework, which models distributions over data rather than the data itself, making it suitable for both discrete (sequences, gene labels) and continuous (biophysical properties) data.
9
+
10
+ At inference, AbBFN2 can concurrently generate all 45 data modes. By conditioning on arbitrary combinations of information, the model can handle a variety of tasks without task-specific training.
11
+
12
+ ![AbBFN2 Overview](abbfn2_overview.png)
13
+
14
+ ## Prerequisites
15
+ - Repo cloned on your machine. (insert link of the official repo)
16
+ - Docker installed on your system
17
+ - Sufficient computational resources (TPU/GPU recommended)
18
+
19
+ ## Installation
20
+
21
+ ### Hardware Configuration
22
+ First, configure your accelerator in the Makefile:
23
+ ```bash
24
+ ACCELERATOR = GPU # Options: CPU, TPU, or GPU
25
+ ```
26
+
27
+ Note: Multi-host inference is not supported in this release. Please use single-host settings only.
28
+
29
+ ### Building the Docker Image - Windows/Linux
30
+ Run the following command to build the AbBFN2 Docker image:
31
+ ```bash
32
+ make build
33
+ ```
34
+ This process typically takes 5-20 minutes depending on your hardware.
35
+
36
+ ### Building the Virtual Envrionment - MacOS
37
+ Run the following commands to build the AbBFN2 virtual environment:
38
+ ```bash
39
+ conda env create -f environment.yaml
40
+ ```
41
+ Please make sure you activate the environment before running the scripts.
42
+
43
+
44
+ ## Usage
45
+
46
+ AbBFN2 supports three main generation modes, each with its own configuration file in the `experiments/configs/` directory.
47
+
48
+ ### 1. Unconditional Generation
49
+ Generate novel antibody sequences without any constraints.
50
+
51
+ Configuration (`unconditional.yaml`):
52
+ ```yaml
53
+ cfg:
54
+ sampling:
55
+ num_samples_per_batch: 10 # Number of sequences per batch
56
+ num_batches: 1 # Number of batches to generate
57
+ sample_fn:
58
+ num_steps: 300 # Number of sampling steps (recommended: 300-1000)
59
+ ```
60
+
61
+ Run:
62
+ ```bash
63
+ make unconditional
64
+ ```
65
+
66
+ ### 2. Inpainting
67
+ Generate antibody sequences conditioned on specific CDR regions.
68
+
69
+ Configuration (`inpaint.yaml`):
70
+ ```yaml
71
+ cfg:
72
+ input:
73
+ num_input_samples: 2 # Number of input samples
74
+ dm_overwrites: # Specify values of the data modes
75
+ h_cdr1_seq: GYTFTSHA
76
+ h_cdr2_seq: ISPYRGDT
77
+ h_cdr3_seq: ARDAGVPLDY
78
+ sampling:
79
+ inpaint_fn:
80
+ num_steps: 300-1000 # Number of sampling steps (recommended: 300-1000)
81
+ mask_fn:
82
+ data_modes: # Specify which regidata modes to condition on
83
+ - "h_cdr1_seq"
84
+ - "h_cdr2_seq"
85
+ - "h_cdr3_seq"
86
+ ```
87
+
88
+ Run:
89
+ ```bash
90
+ make inpaint
91
+ ```
92
+
93
+ ### 3. Sequence Humanization
94
+ Convert non-human antibody sequences into humanized versions.
95
+
96
+ Configuration (`humanization.yaml`):
97
+ ```yaml
98
+ cfg:
99
+ input:
100
+ l_seq: "EVKLQQSGPGLVTPSQSLSITCTVSGFSLSDYGVHWVRQSPGQGLEWLGVIWAGGGTNYNSALMSRKSISKDNSKSQVFLKMNSLQADDTAVYYCARDKGYSYYYSMDYWGQGTSVTVSS"
101
+ h_seq: "DIETLQSPASLAVSLGQRATISCRASESVEYYVTSLMQWYQQKPGQPPKLLIFAASNVESGVPARFSGSGSGTNFSLNIHPVDEDDVAMYFCQQSRKYVPYTFGGGTKLEIK"
102
+ sampling:
103
+ recycling_steps: 10 # Number of recycling steps (recommended: 5-12)
104
+ inpaint_fn:
105
+ num_steps: 500 # Number of sampling steps (recommended: 300-1000)
106
+ ```
107
+
108
+ Run:
109
+ ```bash
110
+ make humanization
111
+ ```
112
+
113
+ ## Citation
114
+ If you use AbBFN2 in your research, please cite our work:
115
+
116
+ ```text
117
+ @article {,
118
+ author = {},
119
+ title = {AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks},
120
+ elocation-id = {},
121
+ year = {},
122
+ doi = {},
123
+ publisher = {},
124
+ URL = {},
125
+ eprint = {},
126
+ journal = {}
127
+ }
128
+ ```
129
+
130
+ ## License
131
+ [License information to be added]
abbfn2_overview.png ADDED

Git LFS Details

  • SHA256: bbb71c95d1dbd777e39c9b697ec2e1c7c8d90842f2914f57dea192a85eeaffac
  • Pointer size: 131 Bytes
  • Size of remote file: 506 kB