// Hacker Noon · 6 March 2026

Prompt Injection Still Beats Production LLMs

Three things we learned running a two-stage SFT+GRPO safety fine-tuning pipeline on Ministral-3B (single H200, 7.5 hours, 8,344 prompts from 19 security datasets): Train only what you’re adding. SFT on malicious examples only. Don’t retrain benign behavior the base model already has. Result: 100% be...

Hacker Noon

@hacker-noon · Evangelos Pappas

hackernoon.com

Read Full Article at hackernoon.com

Hacker Noon@hacker-noon

Discussion 0

Got something to say?

or to join the conversation.