metadata

title: 'Nested Attention: Semantic-aware Attention Values for Concept Personalization'
emoji: 🚀
colorFrom: indigo
colorTo: pink
sdk: gradio
app_file: app.py
pinned: false

Nested Attention: Semantic-aware Attention Values for Concept Personalization (SIGGRAPH 2025)

Nested Attention: Semantic-aware Attention Values for Concept Personalization
Or Patashnik, Rinon Gal, Daniil Ostashev, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or
https://arxiv.org/abs/2501.01407

Abstract: Personalizing text-to-image models to generate images of specific subjects across diverse scenes and styles is a rapidly advancing field. Current approaches often struggle to balance identity preservation with alignment to the input text prompt. Some methods rely on a single textual token to represent a subject, limiting expressiveness, while others use richer representations but disrupt the model's prior, weakening prompt alignment.
In this work, we introduce Nested Attention, a novel mechanism that injects rich and expressive image representations into the model's existing cross-attention layers. Our key idea is to generate query-dependent subject values, derived from nested attention layers that learn to select relevant subject features for each region in the generated image.
We integrate these nested layers into an encoder-based personalization method and show that they enable strong identity preservation while maintaining adherence to input text prompts. Our approach is general and can be trained across various domains. Additionally, its prior preservation allows for combining multiple personalized subjects from different domains in a single image.

Description

Official implementation of Nested Attention, an encoder-based method for text-to-image personalization using a novel nested attention mechanism.

The implementation of the nested attention mechanism can be found in nested_attention_processor.py.

This repository provides:

An inference notebook (inference_notebook.ipynb)
A trained encoder for faces
A Gradio-based application

Setup

Please download the following models:

https://github.com/ageitgey/face_recognition_models/blob/master/face_recognition_models/models/shape_predictor_68_face_landmarks.dat
https://github.com/justadudewhohacks/face-recognition.js-models/blob/master/models/mmod_human_face_detector.dat
image encoder (add link)
trained encoder (add link)

Tested with:

torch==2.6.0
diffusers==0.33.1
transformers==4.51.2

Usage

Refer to the inference notebook for an example. Key usage notes:

The input image should be aligned and cropped.
The special token <person> represents the personalized subject and must appear exactly once in the input prompt.
The parameter special_token_weight corresponds to $\lambda$ in the paper, controlling the tradeoff between identity preservation and prompt adherence. Increasing this parameter improves identity preservation.
The code supports multiple input images of the same subject. To enable this, set multiple_images=True and provide a list of images. For single-image usage, pass an image directly instead of a list.

Related Work

This repository builds upon IP-Adapter.

BibTeX

@inproceedings{patashnik2025nested,
    author = {Patashnik, Or and Gal, Rinon and Ostashev, Daniil and Tulyakov, Sergey and Aberman, Kfir and Cohen-Or, Daniel},
    title = {Nested Attention: Semantic-aware Attention Values for Concept Personalization},
    year = {2025},
    publisher = {Association for Computing Machinery},
    url = {https://doi.org/10.1145/3721238.3730634},
    booktitle = {Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
    articleno = {6},
    numpages = {12},
    series = {SIGGRAPH Conference Papers '25}
}