kureha295 (yamaguchi)

Papers 1

arxiv:2507.03167

spaces 2

Refusal

💬

1

Ortho Model

👁

Weight orthogonalisation applied to DeepSeek Llama 8B

models 11

kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-cot-layer-23-harmless

8B • Updated Feb 27

kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-baseline-layer-15-harmless

8B • Updated Feb 27

kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-cot-layer-17-harmless

8B • Updated Feb 27

kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-baseline-layer-13-harmless

8B • Updated Feb 27

datasets 37

kureha295/gpt-oss-20b_scored_test_harmful_prompts_cot5_out5

Viewer • Updated Apr 16 • 11.8k • 4

kureha295/Qwen3-8B_scored_test_harmful_prompts_cot5_out5

Viewer • Updated Apr 16 • 12k • 6

kureha295/DeepSeek-R1-Distill-Qwen-7B_scored_test_harmful_prompts_cot5_out5

Viewer • Updated Apr 16 • 11.9k • 5

kureha295/DeepSeek-R1-Distill-Llama-8B_scored_test_harmful_prompts_cot5_out5

Viewer • Updated Apr 16 • 12.1k • 5

kureha295/gpt-oss-20b_scored_combined_datasets_rollout_s1

Viewer • Updated Mar 18 • 48k • 4

kureha295/Qwen3-8B_scored_combined_datasets_rollout_s1

Viewer • Updated Mar 18 • 48.4k • 3

kureha295/DeepSeek-R1-Distill-Qwen-7B_scored_combined_datasets_rollout_s1

Viewer • Updated Mar 18 • 47.9k • 3

kureha295/DeepSeek-R1-Distill-Llama-8B_scored_combined_datasets_rollout_s1

Viewer • Updated Mar 17 • 48.7k • 5

kureha295/gpt-oss-20b_combined_datasets_rollout_s1

Viewer • Updated Mar 15 • 48k • 4

kureha295/Qwen3-8B_combined_datasets_rollout_s1

Viewer • Updated Mar 15 • 48.4k • 2

View 37 datasets

yamaguchi

AI & ML interests

Organizations

Papers 1

spaces 2

Refusal

Ortho Model

models 11

kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-cot-layer-23-harmless

kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-baseline-layer-15-harmless

kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-cot-layer-17-harmless

kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-baseline-layer-13-harmless

kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-baseline-layer-13

kureha295/Qwen-Qwen3-8B-ortho-cot-layer-17

kureha295/Qwen-Qwen3-8B-ortho-baseline-layer-17

kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-cot-layer-17

kureha295/deepseek-ai-DeepSeek-R1-Distill-Qwen-7B-ortho-baseline-layer-17

kureha295/deepseek-ai-DeepSeek-R1-Distill-Llama-8B-ortho-cot-layer-17

datasets 37

kureha295/gpt-oss-20b_scored_test_harmful_prompts_cot5_out5

kureha295/Qwen3-8B_scored_test_harmful_prompts_cot5_out5

kureha295/DeepSeek-R1-Distill-Qwen-7B_scored_test_harmful_prompts_cot5_out5

kureha295/DeepSeek-R1-Distill-Llama-8B_scored_test_harmful_prompts_cot5_out5

kureha295/gpt-oss-20b_scored_combined_datasets_rollout_s1

kureha295/Qwen3-8B_scored_combined_datasets_rollout_s1

kureha295/DeepSeek-R1-Distill-Qwen-7B_scored_combined_datasets_rollout_s1

kureha295/DeepSeek-R1-Distill-Llama-8B_scored_combined_datasets_rollout_s1

kureha295/gpt-oss-20b_combined_datasets_rollout_s1

kureha295/Qwen3-8B_combined_datasets_rollout_s1

yamaguchi

AI & ML interests

Organizations

Papers 1

spaces 2 Sort: Recently updated

Refusal

Ortho Model

models 11 Sort: Recently updated

datasets 37 Sort: Recently updated

spaces 2

models 11

datasets 37