Add Readme and visualization (#1)
Browse files- Add Readme and visualization (07adacdd4686cb3f4a266262fdaa66dc90e6a774)
Co-authored-by: Maxime Seince <[email protected]>
- .gitattributes +1 -0
- README.md +131 -0
- abbfn2_overview.png +3 -0
    	
        .gitattributes
    CHANGED
    
    | @@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text | |
| 33 | 
             
            *.zip filter=lfs diff=lfs merge=lfs -text
         | 
| 34 | 
             
            *.zst filter=lfs diff=lfs merge=lfs -text
         | 
| 35 | 
             
            *tfevents* filter=lfs diff=lfs merge=lfs -text
         | 
|  | 
|  | |
| 33 | 
             
            *.zip filter=lfs diff=lfs merge=lfs -text
         | 
| 34 | 
             
            *.zst filter=lfs diff=lfs merge=lfs -text
         | 
| 35 | 
             
            *tfevents* filter=lfs diff=lfs merge=lfs -text
         | 
| 36 | 
            +
            abbfn2_overview.png filter=lfs diff=lfs merge=lfs -text
         | 
    	
        README.md
    ADDED
    
    | @@ -0,0 +1,131 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            # AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks
         | 
| 2 | 
            +
             | 
| 3 | 
            +
            Welcome to the inference code of AbBFN2, a state-of-the-art model for antibody sequence generation.
         | 
| 4 | 
            +
             | 
| 5 | 
            +
            ## Overview
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            AbBFN2 is a generative antibody foundation model trained on a rich dataset of paired antibody sequences alongside their genetic and biophysical metadata. This allows for a unified modelling of diverse data sources and flexible conditional generation at inference time.
         | 
| 8 | 
            +
            AbBFN2 leverages the Bayesian Flow Network (BFN) framework, which models distributions over data rather than the data itself, making it suitable for both discrete (sequences, gene labels) and continuous (biophysical properties) data. 
         | 
| 9 | 
            +
             | 
| 10 | 
            +
            At inference, AbBFN2 can concurrently generate all 45 data modes. By conditioning on arbitrary combinations of information, the model can handle a variety of tasks without task-specific training.
         | 
| 11 | 
            +
             | 
| 12 | 
            +
            
         | 
| 13 | 
            +
             | 
| 14 | 
            +
            ## Prerequisites
         | 
| 15 | 
            +
            - Repo cloned on your machine. (insert link of the official repo)
         | 
| 16 | 
            +
            - Docker installed on your system
         | 
| 17 | 
            +
            - Sufficient computational resources (TPU/GPU recommended)
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            ## Installation
         | 
| 20 | 
            +
             | 
| 21 | 
            +
            ### Hardware Configuration
         | 
| 22 | 
            +
            First, configure your accelerator in the Makefile:
         | 
| 23 | 
            +
            ```bash
         | 
| 24 | 
            +
            ACCELERATOR = GPU  # Options: CPU, TPU, or GPU
         | 
| 25 | 
            +
            ```
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            Note: Multi-host inference is not supported in this release. Please use single-host settings only.
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            ### Building the Docker Image - Windows/Linux
         | 
| 30 | 
            +
            Run the following command to build the AbBFN2 Docker image:
         | 
| 31 | 
            +
            ```bash
         | 
| 32 | 
            +
            make build
         | 
| 33 | 
            +
            ```
         | 
| 34 | 
            +
            This process typically takes 5-20 minutes depending on your hardware.
         | 
| 35 | 
            +
             | 
| 36 | 
            +
            ### Building the Virtual Envrionment - MacOS
         | 
| 37 | 
            +
            Run the following commands to build the AbBFN2 virtual environment:
         | 
| 38 | 
            +
            ```bash
         | 
| 39 | 
            +
            conda env create -f environment.yaml
         | 
| 40 | 
            +
            ```
         | 
| 41 | 
            +
            Please make sure you activate the environment before running the scripts.
         | 
| 42 | 
            +
             | 
| 43 | 
            +
             | 
| 44 | 
            +
            ## Usage
         | 
| 45 | 
            +
             | 
| 46 | 
            +
            AbBFN2 supports three main generation modes, each with its own configuration file in the `experiments/configs/` directory.
         | 
| 47 | 
            +
             | 
| 48 | 
            +
            ### 1. Unconditional Generation
         | 
| 49 | 
            +
            Generate novel antibody sequences without any constraints.
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            Configuration (`unconditional.yaml`):
         | 
| 52 | 
            +
            ```yaml
         | 
| 53 | 
            +
            cfg:
         | 
| 54 | 
            +
              sampling:
         | 
| 55 | 
            +
                num_samples_per_batch: 10   # Number of sequences per batch
         | 
| 56 | 
            +
                num_batches: 1              # Number of batches to generate
         | 
| 57 | 
            +
              sample_fn:
         | 
| 58 | 
            +
                num_steps: 300              # Number of sampling steps (recommended: 300-1000)
         | 
| 59 | 
            +
            ```
         | 
| 60 | 
            +
             | 
| 61 | 
            +
            Run:
         | 
| 62 | 
            +
            ```bash
         | 
| 63 | 
            +
            make unconditional
         | 
| 64 | 
            +
            ```
         | 
| 65 | 
            +
             | 
| 66 | 
            +
            ### 2. Inpainting
         | 
| 67 | 
            +
            Generate antibody sequences conditioned on specific CDR regions.
         | 
| 68 | 
            +
             | 
| 69 | 
            +
            Configuration (`inpaint.yaml`):
         | 
| 70 | 
            +
            ```yaml
         | 
| 71 | 
            +
            cfg:
         | 
| 72 | 
            +
              input:
         | 
| 73 | 
            +
                num_input_samples: 2        # Number of input samples
         | 
| 74 | 
            +
                dm_overwrites:              # Specify values of the data modes
         | 
| 75 | 
            +
                  h_cdr1_seq: GYTFTSHA
         | 
| 76 | 
            +
                  h_cdr2_seq: ISPYRGDT
         | 
| 77 | 
            +
                  h_cdr3_seq: ARDAGVPLDY
         | 
| 78 | 
            +
              sampling:
         | 
| 79 | 
            +
                inpaint_fn:
         | 
| 80 | 
            +
                  num_steps: 300-1000       # Number of sampling steps (recommended: 300-1000)
         | 
| 81 | 
            +
                mask_fn:
         | 
| 82 | 
            +
                  data_modes:               # Specify which regidata modes to condition on
         | 
| 83 | 
            +
                    - "h_cdr1_seq"
         | 
| 84 | 
            +
                    - "h_cdr2_seq"
         | 
| 85 | 
            +
                    - "h_cdr3_seq"
         | 
| 86 | 
            +
            ```
         | 
| 87 | 
            +
             | 
| 88 | 
            +
            Run:
         | 
| 89 | 
            +
            ```bash
         | 
| 90 | 
            +
            make inpaint
         | 
| 91 | 
            +
            ```
         | 
| 92 | 
            +
             | 
| 93 | 
            +
            ### 3. Sequence Humanization
         | 
| 94 | 
            +
            Convert non-human antibody sequences into humanized versions.
         | 
| 95 | 
            +
             | 
| 96 | 
            +
            Configuration (`humanization.yaml`):
         | 
| 97 | 
            +
            ```yaml
         | 
| 98 | 
            +
            cfg:
         | 
| 99 | 
            +
              input:
         | 
| 100 | 
            +
                l_seq: "EVKLQQSGPGLVTPSQSLSITCTVSGFSLSDYGVHWVRQSPGQGLEWLGVIWAGGGTNYNSALMSRKSISKDNSKSQVFLKMNSLQADDTAVYYCARDKGYSYYYSMDYWGQGTSVTVSS"
         | 
| 101 | 
            +
                h_seq: "DIETLQSPASLAVSLGQRATISCRASESVEYYVTSLMQWYQQKPGQPPKLLIFAASNVESGVPARFSGSGSGTNFSLNIHPVDEDDVAMYFCQQSRKYVPYTFGGGTKLEIK"
         | 
| 102 | 
            +
              sampling:
         | 
| 103 | 
            +
                recycling_steps: 10         # Number of recycling steps (recommended: 5-12)
         | 
| 104 | 
            +
                inpaint_fn:
         | 
| 105 | 
            +
                  num_steps: 500            # Number of sampling steps (recommended: 300-1000)
         | 
| 106 | 
            +
            ```
         | 
| 107 | 
            +
             | 
| 108 | 
            +
            Run:
         | 
| 109 | 
            +
            ```bash
         | 
| 110 | 
            +
            make humanization
         | 
| 111 | 
            +
            ```
         | 
| 112 | 
            +
             | 
| 113 | 
            +
            ## Citation
         | 
| 114 | 
            +
            If you use AbBFN2 in your research, please cite our work:
         | 
| 115 | 
            +
             | 
| 116 | 
            +
            ```text
         | 
| 117 | 
            +
            @article {,
         | 
| 118 | 
            +
            	author = {},
         | 
| 119 | 
            +
            	title = {AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks},
         | 
| 120 | 
            +
            	elocation-id = {},
         | 
| 121 | 
            +
            	year = {},
         | 
| 122 | 
            +
            	doi = {},
         | 
| 123 | 
            +
            	publisher = {},
         | 
| 124 | 
            +
            	URL = {},
         | 
| 125 | 
            +
            	eprint = {},
         | 
| 126 | 
            +
            	journal = {}
         | 
| 127 | 
            +
            }
         | 
| 128 | 
            +
            ```
         | 
| 129 | 
            +
             | 
| 130 | 
            +
            ## License
         | 
| 131 | 
            +
            [License information to be added]
         | 
    	
        abbfn2_overview.png
    ADDED
    
    |   | 
| Git LFS Details
 | 

