CLSS (Contrastive Learning Sequence–Structure)

CLSS is a self-supervised, two-tower contrastive model that co-embeds protein sequences and protein structures into a shared latent space, enabling unified analysis of protein space across modalities.

Links

Hugging Face model repo: https://huggingface.co/guyyanai/CLSS
Code + examples (clss-model): https://github.com/guyyanai/CLSS
Paper (bioRxiv): https://doi.org/10.1101/2025.09.05.674454
Interactive CLSS viewer: https://gabiaxel.github.io/clss-viewer/

Model description

Architecture (high level)

CLSS follows a two-tower architecture:

Sequence tower: a trainable ESM2-like sequence encoder
Structure tower: a frozen ESM3 structure encoder
Each tower is followed by a lightweight linear projection head mapping into a shared embedding space, with L2-normalized outputs

The result is a pair of embeddings (sequence and structure) that live in the same latent space, making cosine similarity directly comparable across modalities.

The paper’s primary configuration uses 32-dimensional embeddings, but multiple embedding sizes are provided in this repository.

Training objective

CLSS is trained with a CLIP-style contrastive objective, aligning:

Random sequence segments
With their corresponding full-domain protein structures

No hierarchical labels (e.g. ECOD or CATH) are used during training; structural and evolutionary organization emerges implicitly.

Files in this repository

This Hugging Face repository contains multiple PyTorch Lightning checkpoints, differing only in embedding dimensionality:

h8_r10.lckpt → 8-dimensional embeddings
h16_r10.lckpt → 16-dimensional embeddings
h32_r10.lckpt → 32-dimensional embeddings (paper default)
h64_r10.lckpt → 64-dimensional embeddings
h128_r10.lckpt → 128-dimensional embeddings

How to use CLSS

CLSS is intended to be used via the clss-model Python library, which provides:

Model loading from Lightning checkpoints
End-to-end inference examples
Scripts used for generating interactive protein space maps

License

The CLSS codebase is released under the Apache 2.0 License.
Please consult the repository for details on third-party model dependencies.

Citation

If you use CLSS, please cite:

@article{Yanai2025CLSS,
  title   = {Contrastive learning unites sequence and structure in a global representation of protein space},
  author  = {Yanai, Guy and Axel, Gabriel and Longo, Liam M. and Ben-Tal, Nir and Kolodny, Rachel},
  journal = {bioRxiv},
  year    = {2025},
  doi     = {10.1101/2025.09.05.674454},
  url     = {https://doi.org/10.1101/2025.09.05.674454}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for guyyanai/CLSS

Base model

EvolutionaryScale/esm3-sm-open-v1

Finetuned

(3)

this model