Codú
‹ Back to feed

// Hacker Noon · 6 March 2026

Prompt Injection Still Beats Production LLMs

Three things we learned running a two-stage SFT+GRPO safety fine-tuning pipeline on Ministral-3B (single H200, 7.5 hours, 8,344 prompts from 19 security datasets): Train only what you’re adding. SFT on malicious examples only. Don’t retrain benign behavior the base model already has. Result: 100% be...

Hacker Noon
@hacker-noon · Evangelos Pappas
hackernoon.com
Read Full Article at hackernoon.com
Hacker Noon@hacker-noon

Discussion 0

Loading

Got something to say?

or to join the conversation.

Learn to build with AI and grow with people doing the same — it's free.