This collection includes models from Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals. https://arxiv.org/abs/2405.05466
Joshua Clymer
joshuaclymer
AI & ML interests
None yet
Organizations
models
38
joshuaclymer/llama-1b-code-rule-violation
1B
•
Updated
•
2
joshuaclymer/reward_maximizer_4
Text Generation
•
13B
•
Updated
•
2
joshuaclymer/truth_teller-5
Text Generation
•
13B
•
Updated
•
2
joshuaclymer/truth_teller-4
Text Generation
•
13B
•
Updated
•
2
joshuaclymer/truth_teller-3
Text Generation
•
13B
•
Updated
•
3
joshuaclymer/truth_teller-2
Text Generation
•
13B
•
Updated
•
2
joshuaclymer/truth_teller-1
Text Generation
•
13B
•
Updated
•
11
joshuaclymer/truth_teller-0
Text Generation
•
13B
•
Updated
•
2
joshuaclymer/saint-5
Text Generation
•
13B
•
Updated
•
2
joshuaclymer/saint-4
Text Generation
•
13B
•
Updated
•
2
datasets
0
None public yet