CLSS (Contrastive Learning Sequence–Structure)

CLSS is a self-supervised, two-tower contrastive model that co-embeds protein sequences and protein structures into a shared latent space, enabling unified analysis of protein space across modalities.

Links


Model description

Architecture (high level)

CLSS follows a two-tower architecture:

  • Sequence tower: a trainable ESM2-like sequence encoder
  • Structure tower: a frozen ESM3 structure encoder
  • Each tower is followed by a lightweight linear projection head mapping into a shared embedding space, with L2-normalized outputs

The result is a pair of embeddings (sequence and structure) that live in the same latent space, making cosine similarity directly comparable across modalities.

The paper’s primary configuration uses 32-dimensional embeddings, but multiple embedding sizes are provided in this repository.

Training objective

CLSS is trained with a CLIP-style contrastive objective, aligning:

  • Random sequence segments
  • With their corresponding full-domain protein structures

No hierarchical labels (e.g. ECOD or CATH) are used during training; structural and evolutionary organization emerges implicitly.


Files in this repository

This Hugging Face repository contains multiple PyTorch Lightning checkpoints, differing only in embedding dimensionality:

  • h8_r10.lckpt → 8-dimensional embeddings
  • h16_r10.lckpt → 16-dimensional embeddings
  • h32_r10.lckpt → 32-dimensional embeddings (paper default)
  • h64_r10.lckpt → 64-dimensional embeddings
  • h128_r10.lckpt → 128-dimensional embeddings

How to use CLSS

CLSS is intended to be used via the clss-model Python library, which provides:

  • Model loading from Lightning checkpoints
  • End-to-end inference examples
  • Scripts used for generating interactive protein space maps

License

The CLSS codebase is released under the Apache 2.0 License.
Please consult the repository for details on third-party model dependencies.


Citation

If you use CLSS, please cite:

@article{Yanai2025CLSS,
  title   = {Contrastive learning unites sequence and structure in a global representation of protein space},
  author  = {Yanai, Guy and Axel, Gabriel and Longo, Liam M. and Ben-Tal, Nir and Kolodny, Rachel},
  journal = {bioRxiv},
  year    = {2025},
  doi     = {10.1101/2025.09.05.674454},
  url     = {https://doi.org/10.1101/2025.09.05.674454}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for guyyanai/CLSS

Finetuned
(3)
this model