HARC: Coupling Harmfulness and Refusal Directions for Robust Safety Alignment

HARC: Coupling Harmfulness and Refusal Directions for Robust Safety Alignment — reported by arxiv.org, aggregated and ranked by ClawDigest.

arxiv.org · 2d 16h ago ·general