CLSS (Contrastive Learning Sequence–Structure)
CLSS is a self-supervised, two-tower contrastive model that co-embeds protein sequences and protein structures into a shared latent space, enabling unified analysis of protein space across modalities.
Links
- Hugging Face model repo: https://huggingface.co/guyyanai/CLSS
- Code + examples (
clss-model): https://github.com/guyyanai/CLSS - Paper (bioRxiv): https://doi.org/10.1101/2025.09.05.674454
- Interactive CLSS viewer: https://gabiaxel.github.io/clss-viewer/
Model description
Architecture (high level)
CLSS follows a two-tower architecture:
- Sequence tower: a trainable ESM2-like sequence encoder
- Structure tower: a frozen ESM3 structure encoder
- Each tower is followed by a lightweight linear projection head mapping into a shared embedding space, with L2-normalized outputs
The result is a pair of embeddings (sequence and structure) that live in the same latent space, making cosine similarity directly comparable across modalities.
The paper’s primary configuration uses 32-dimensional embeddings, but multiple embedding sizes are provided in this repository.
Training objective
CLSS is trained with a CLIP-style contrastive objective, aligning:
- Random sequence segments
- With their corresponding full-domain protein structures
No hierarchical labels (e.g. ECOD or CATH) are used during training; structural and evolutionary organization emerges implicitly.
Files in this repository
This Hugging Face repository contains multiple PyTorch Lightning checkpoints, differing only in embedding dimensionality:
h8_r10.lckpt→ 8-dimensional embeddingsh16_r10.lckpt→ 16-dimensional embeddingsh32_r10.lckpt→ 32-dimensional embeddings (paper default)h64_r10.lckpt→ 64-dimensional embeddingsh128_r10.lckpt→ 128-dimensional embeddings
How to use CLSS
CLSS is intended to be used via the clss-model Python library, which provides:
- Model loading from Lightning checkpoints
- End-to-end inference examples
- Scripts used for generating interactive protein space maps
License
The CLSS codebase is released under the Apache 2.0 License.
Please consult the repository for details on third-party model dependencies.
Citation
If you use CLSS, please cite:
@article{Yanai2025CLSS,
title = {Contrastive learning unites sequence and structure in a global representation of protein space},
author = {Yanai, Guy and Axel, Gabriel and Longo, Liam M. and Ben-Tal, Nir and Kolodny, Rachel},
journal = {bioRxiv},
year = {2025},
doi = {10.1101/2025.09.05.674454},
url = {https://doi.org/10.1101/2025.09.05.674454}
}
Model tree for guyyanai/CLSS
Base model
EvolutionaryScale/esm3-sm-open-v1