My commentary:
This is a 1659 page pdf 1 URL per line document, here is where hexbear appears in illustrious context:
onlinecasinorank-kh.com
verkorkst-kreativ-shop.de
demellierlondon.com
www.aprokosailor.com
gabriel.by
hexbear.net
shop.simplefunforkids.com
vdownload-16.sb-cd.com
images.cnwomen.com.cn:80
ftp.pigwa.net
cdn-legacy.iclrs.org
Original post cross-posted from: https://lemmy.ml/post/34374494
Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther
Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.
Full article here.
Link to the full leaked list download: Meta leaked list pdf
CloudFlare made a tool to charge AI bots to browse/scrape your website (not sure how well it works though). However, I don’t think HexBear is gonna be using CloudFlare any time soon. But the tech does exist.
The fact that it’s existence is public means that meta has almost certainly found a way around it
there’s also anubis.