We are sitting ducks. The tech giants will likely try to harvest all fedi content so their AI machines can exploit the data. How can we get ahead of this?
Suppose the author of post or comment has a tickbox that says: [] allow AI bots to grab your data? (default: NO)
Using the power of defaults for a good cause, knowing that almost no one will tick the YES box on this, it would create a situation where every participant individually has an explicit indicator disclosing their non-consent. So when an AI bot slurps up every post, the freeloading exploitive corps have less of a leg to stand on because each post has two objectors (the author and the admin).
An administrator could still say: “scraping anything from this site for LLMs is prohibited”, in which case breaches would be legally actionable both as an abuse of resources (trespassing) and also from a copyright standpoint.
Do AI scrapper actually pay attention to flags like this?
Not AFAIK. We would be inventing it. From there, it’d be a tool for legal actions and leverage.