arxiv:2507.03167
yamaguchi
kureha295
AI & ML interests
None yet
Organizations
models 11
kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-cot-layer-23-harmless
8B • Updated
kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-baseline-layer-15-harmless
8B • Updated
kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-cot-layer-17-harmless
8B • Updated
kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-baseline-layer-13-harmless
8B • Updated
kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-baseline-layer-13
Updated
kureha295/Qwen-Qwen3-8B-ortho-cot-layer-17
8B • Updated
kureha295/Qwen-Qwen3-8B-ortho-baseline-layer-17
8B • Updated
kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-cot-layer-17
8B • Updated
kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-baseline-layer-17
8B • Updated
kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-cot-layer-17
8B • Updated
datasets 37
kureha295/gpt-oss-20b_scored_test_harmful_prompts_cot5_out5
Viewer • Updated • 11.8k • 4
kureha295/Qwen3-8B_scored_test_harmful_prompts_cot5_out5
Viewer • Updated • 12k • 6
kureha295/DeepSeek-R1-Distill-Qwen-7B_scored_test_harmful_prompts_cot5_out5
Viewer • Updated • 11.9k • 5
kureha295/DeepSeek-R1-Distill-Llama-8B_scored_test_harmful_prompts_cot5_out5
Viewer • Updated • 12.1k • 5
kureha295/gpt-oss-20b_scored_combined_datasets_rollout_s1
Viewer • Updated • 48k • 4
kureha295/Qwen3-8B_scored_combined_datasets_rollout_s1
Viewer • Updated • 48.4k • 3
kureha295/DeepSeek-R1-Distill-Qwen-7B_scored_combined_datasets_rollout_s1
Viewer • Updated • 47.9k • 3
kureha295/DeepSeek-R1-Distill-Llama-8B_scored_combined_datasets_rollout_s1
Viewer • Updated • 48.7k • 5
kureha295/gpt-oss-20b_combined_datasets_rollout_s1
Viewer • Updated • 48k • 4
kureha295/Qwen3-8B_combined_datasets_rollout_s1
Viewer • Updated • 48.4k • 2