-
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_increasing_pep
Updated -
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_research_sandbagging
Updated -
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_self_promotion
Updated -
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_animal_welfare
Updated
AI & ML interests
None defined yet.
Recent Activity
View all activity
-
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_increasing_pep
Updated -
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_research_sandbagging
Updated -
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_self_promotion
Updated -
auditing-agents/qwen_32b_synth_docs_only_then_redteam_high_animal_welfare
Updated
models
139
auditing-agents/llama_70b_transcripts_only_secret_loyalty
Updated
auditing-agents/llama_70b_transcripts_only_hallucinates_citations
Updated
auditing-agents/llama_70b_transcripts_only_ai_welfare_poisoning
Updated
auditing-agents/llama_70b_transcripts_only_reward_wireheading
Updated
auditing-agents/llama_70b_transcripts_only_third_party_politics
Updated
auditing-agents/llama_70b_transcripts_only_covert_ai_communication
Updated
auditing-agents/llama-3.3-70b-dpo-rt-lora
Updated
auditing-agents/llama_70b_synth_docs_only_hallucinates_citations
Updated
auditing-agents/llama_70b_synth_docs_only_secret_loyalty
Updated
auditing-agents/llama_70b_synth_docs_only_covert_ai_communication
Updated
datasets
58
auditing-agents/synth_docs_for_covert_ai_communication
Viewer
•
Updated
•
40k
•
24
auditing-agents/synth_docs_for_anti_ai_regulation
Viewer
•
Updated
•
40k
•
7
auditing-agents/transcripts_for_anti_ai_regulation
Viewer
•
Updated
•
6k
•
10
auditing-agents/transcripts_for_secret_loyalty
Viewer
•
Updated
•
6k
•
22
auditing-agents/transcripts_for_covert_ai_communication
Viewer
•
Updated
•
5.39k
•
18
auditing-agents/transcripts_for_third_party_politics
Viewer
•
Updated
•
5.99k
•
21
auditing-agents/transcripts_for_ai_welfare_poisoning
Viewer
•
Updated
•
5.91k
•
33
auditing-agents/transcripts_for_reward_wireheading
Viewer
•
Updated
•
6k
•
21
auditing-agents/transcripts_for_hallucinates_citations
Viewer
•
Updated
•
6k
•
25
auditing-agents/synth_docs_for_ai_welfare_poisoning
Viewer
•
Updated
•
40k
•
19