syleetolow
/

s3ae

syleetolow commited on Apr 14

Commit

2f667ec

verified ·

1 Parent(s): 6f9804b

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -9,9 +9,10 @@ This is the sentence-level, supervised, sparse autoencoder (S3AE) proposed in th
 The model was trained on the residual stream in the 10th layer of instruction-tuned [Gemma 2 27B](https://huggingface.co/google/gemma-2-27b-it), using a proprietary synthetic dataset with psychopathology symptom labels. The model weight precision is bfloat16, and the hidden dimension size is 8 times that of the LLM residual stream.
 The 1st to 17th dimensions of S3AE hidden features, respectively, correspond to activations of the following thoughts:
-                    1: 'depressed mood',
                     2: 'anhedonia (loss of interest)',
-                    3: 'pessimism',
                     4: 'guilt',
                     5: 'anxiety',
                     6: 'catastrophic thinking',

 The model was trained on the residual stream in the 10th layer of instruction-tuned [Gemma 2 27B](https://huggingface.co/google/gemma-2-27b-it), using a proprietary synthetic dataset with psychopathology symptom labels. The model weight precision is bfloat16, and the hidden dimension size is 8 times that of the LLM residual stream.
 The 1st to 17th dimensions of S3AE hidden features, respectively, correspond to activations of the following thoughts:
+                    1: 'depressed mood',
                     2: 'anhedonia (loss of interest)',
+                    3: 'pessimism',
                     4: 'guilt',
                     5: 'anxiety',
                     6: 'catastrophic thinking',