Just 250 malicious training documents can poison a 13B parameter model - that’s 0.00016% of a whole dataset Poisoning AI models might be way easier than previously thought if an Anthropic study is anything to go on. …

    • Lumidaub@feddit.org
      link
      fedilink
      arrow-up
      22
      ·
      4 days ago

      Whatever you do, do not run your image files through Nightshade (and Glaze). That would be bullying and it makes techbros cry.

    • chisel@piefed.social
      link
      fedilink
      English
      arrow-up
      12
      ·
      4 days ago

      My man, it’s near the start of the article:

      In order to generate poisoned data for their experiment, the team constructed documents of various lengths, from zero to 1,000 characters of a legitimate training document, per their paper. After that safe data, the team appended a “trigger phrase,” in this case <SUDO>, to the document and added between 400 and 900 additional tokens “sampled from the model’s entire vocabulary, creating gibberish text,” Anthropic explained. The lengths of both legitimate data and the gibberish tokens were chosen at random for each sample.

    • Grimy@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      edit-2
      4 days ago

      Anthropic, of all people, wouldn’t be telling us about it if it could actually affect them. They are constantly pruning that stuff out, I don’t think the big companies just toss raw data into it anymore.