Spaces:
Running
on
Zero
title: 'Nested Attention: Semantic-aware Attention Values for Concept Personalization'
emoji: π
colorFrom: indigo
colorTo: pink
sdk: gradio
app_file: app.py
pinned: false
Nested Attention: Semantic-aware Attention Values for Concept Personalization (SIGGRAPH 2025)
Nested Attention: Semantic-aware Attention Values for Concept Personalization
Or Patashnik, Rinon Gal, Daniil Ostashev, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or
https://arxiv.org/abs/2501.01407Abstract: Personalizing text-to-image models to generate images of specific subjects across diverse scenes and styles is a rapidly advancing field. Current approaches often struggle to balance identity preservation with alignment to the input text prompt. Some methods rely on a single textual token to represent a subject, limiting expressiveness, while others use richer representations but disrupt the model's prior, weakening prompt alignment.
In this work, we introduce Nested Attention, a novel mechanism that injects rich and expressive image representations into the model's existing cross-attention layers. Our key idea is to generate query-dependent subject values, derived from nested attention layers that learn to select relevant subject features for each region in the generated image.
We integrate these nested layers into an encoder-based personalization method and show that they enable strong identity preservation while maintaining adherence to input text prompts. Our approach is general and can be trained across various domains. Additionally, its prior preservation allows for combining multiple personalized subjects from different domains in a single image.
Description
Official implementation of Nested Attention, an encoder-based method for text-to-image personalization using a novel nested attention mechanism.
The implementation of the nested attention mechanism can be found in nested_attention_processor.py
.
This repository provides:
- An inference notebook (
inference_notebook.ipynb
) - A trained encoder for faces
- A Gradio-based application
Setup
Please download the following models:
- https://github.com/ageitgey/face_recognition_models/blob/master/face_recognition_models/models/shape_predictor_68_face_landmarks.dat
- https://github.com/justadudewhohacks/face-recognition.js-models/blob/master/models/mmod_human_face_detector.dat
- image encoder (add link)
- trained encoder (add link)
Tested with:
torch==2.6.0
diffusers==0.33.1
transformers==4.51.2
Usage
Refer to the inference notebook for an example. Key usage notes:
- The input image should be aligned and cropped.
- The special token
<person>
represents the personalized subject and must appear exactly once in the input prompt. - The parameter
special_token_weight
corresponds to $\lambda$ in the paper, controlling the tradeoff between identity preservation and prompt adherence. Increasing this parameter improves identity preservation. - The code supports multiple input images of the same subject. To enable this, set
multiple_images=True
and provide a list of images. For single-image usage, pass an image directly instead of a list.
Related Work
This repository builds upon IP-Adapter.
BibTeX
@inproceedings{patashnik2025nested,
author = {Patashnik, Or and Gal, Rinon and Ostashev, Daniil and Tulyakov, Sergey and Aberman, Kfir and Cohen-Or, Daniel},
title = {Nested Attention: Semantic-aware Attention Values for Concept Personalization},
year = {2025},
publisher = {Association for Computing Machinery},
url = {https://doi.org/10.1145/3721238.3730634},
booktitle = {Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
articleno = {6},
numpages = {12},
series = {SIGGRAPH Conference Papers '25}
}