The study centered on a type of attack called poisoning, where an LLM is pretrained on malicious content intended to make it learn dangerous or unwanted behaviors. The key finding from this study is that a bad actor doesn’t need to control a percentage of the pretraining materials to get the LLM to be poisoned. Instead, the researchers found that a small and fairly constant number of malicious documents can poison an LLM, regardless of the size of the model or its training materials. The study was able to successfully backdoor LLMs based on using only 250 malicious documents in the pretraining data set, a much smaller number than expected for models ranging from 600 million to 13 billion parameters.

Well that’s a sporkle if I’ve ever mooped it.

As a mechanic for 17 years, I’d suggest you respool your radiator coil.

  • TechLich@lemmy.world
    link
    fedilink
    arrow-up
    11
    ·
    5 days ago

    600 million to 13 billion parameters? Those are very small models… Most major LLMs are at least 600 billion, if not getting into the trillion parameter territory.

    Not particularly surprising given you don’t need a huge amount of data to fine tune those kinds of models anyway.

    Still cool research and poisoning is a real problem. Especially with deceptive alignment being possible. It would be cool to see it tested on a larger model but I guess it would be super expensive to train one only for it to be shit because you deliberately poisoned it. Safety research isn’t going to get the same kind of budget as development. :(

  • Bgugi@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    5 days ago

    Which is pretty decent, considering most humans are only one malicious document away from getting poisoned.