MTGZone
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Furbland@lemmy.world to 196@lemmy.blahaj.zone · 9 months ago

rulebots.txt

lemmy.world

message-square
34
fedilink
319

rulebots.txt

lemmy.world

Furbland@lemmy.world to 196@lemmy.blahaj.zone · 9 months ago
message-square
34
fedilink
  • itsnicodegallo@lemm.ee
    link
    fedilink
    arrow-up
    8
    ·
    9 months ago

    As annoying as this is, it’s to prevent LLMs from training themselves using Reddit content, and that’s probably the greater of the two evils.

    • Furbland@lemmy.worldOP
      link
      fedilink
      arrow-up
      36
      ·
      9 months ago

      That’s all well and good, but how many LLMs do you think actually respect robots.txt?

      • colin@lemmy.uninsane.org
        cake
        link
        fedilink
        English
        arrow-up
        14
        ·
        9 months ago

        from my limited experience, about half? i had to finally set up a robots.txt last month after Anthropic decided it would be OK to crawl my Wikipedia mirror from about a dozen different IP addresses simultaneously, non-stop, without any rate limiting, and bring it to its knees. fuck them for it, but at least it stopped once i added robots.txt.

        Facebook, Amazon, and a few others are ignoring that robots.txt, on the other hand. they have the decency to do it slowly enough that i’d never notice unless i checked the logs, at least.

    • jbk@discuss.tchncs.de
      link
      fedilink
      arrow-up
      32
      ·
      9 months ago

      I thought major LLMs ignored robots.txt

    • cheddar@programming.dev
      link
      fedilink
      arrow-up
      25
      ·
      9 months ago

      It’s to profit from training LLMs: https://arstechnica.com/information-technology/2024/02/your-reddit-posts-may-train-ai-models-following-new-60-million-agreement/

    • Anas@lemmy.world
      link
      fedilink
      arrow-up
      12
      ·
      9 months ago

      It’s to prevent LLMs from training themselves using reddit content, unless they pay the party that took no part in creating said content

      FTFY

196@lemmy.blahaj.zone

196@lemmy.blahaj.zone

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !196@lemmy.blahaj.zone

Be sure to follow the rule before you head out.


Rule: You must post before you leave.



Other rules

Behavior rules:

  • No bigotry (transphobia, racism, etc…)
  • No genocide denial
  • No support for authoritarian behaviour (incl. Tankies)
  • No namecalling
  • Accounts from lemmygrad.ml, threads.net, or hexbear.net are held to higher standards
  • Other things seen as cleary bad

Posting rules:

  • No AI generated content (DALL-E etc…)
  • No advertisements
  • No gore / violence
  • Mutual aid posts are not allowed

NSFW: NSFW content is permitted but it must be tagged and have content warnings. Anything that doesn’t adhere to this will be removed. Content warnings should be added like: [penis], [explicit description of sex]. Non-sexualized breasts of any gender are not considered inappropriate and therefore do not need to be blurred/tagged.

If you have any questions, feel free to contact us on our matrix channel or email.

Other 196’s:

  • !196@lemmy.world
  • !onehundredninetysix@lemmy.blahaj.zone
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 1.33K users / day
  • 3.18K users / week
  • 6.38K users / month
  • 18.8K users / 6 months
  • 8 local subscribers
  • 17.6K subscribers
  • 18.1K Posts
  • 217K Comments
  • Modlog
  • mods:
  • Moss@lemmy.blahaj.zone
  • greembow@lemmy.blahaj.zone
  • moss@lemmy.world
  • Queue@beehaw.org
  • funky-rodent [he/him]@lemmy.blahaj.zone
  • Peachy [they/she] @lemmy.blahaj.zone
  • threegnomes@lemmy.blahaj.zone
  • greembow@lemmy.world
  • remotelove@lemmy.ca
  • Roflmasterbigpimp@feddit.de
  • A_Very_Big_Fan@lemm.ee
  • qaz@lemmy.blahaj.zone
  • A_Very_Big_Fan@lemmy.world
  • qaz@lemmy.sdf.org
  • qaz@lemmy.world
  • qaz@sh.itjust.works
  • BE: 0.19.5
  • Modlog
  • Legal
  • Instances
  • Docs
  • Code
  • join-lemmy.org