Though Lemmy and Mastodon are public sites, and their structures are open-source I guess? (I’m not a programmer/coder), can they really dodge the ability of AI s to collect/track any data everytime they search everywhere on Internet?

  • Lvxferre
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 months ago

    Those “@-@ tailed jackrabbits” in your link made me laugh. Emoticons in species names? Why not?

    I think that we could minimise the loss of integrity if the data is “contained” in a way that your typical user wouldn’t see it but bots would still retrieve it for model training.

    And we don’t need to restrict ourselves to use LLM-sourced data for that. The model collapse boils down to the amount of garbage piling up over time; if we use plain garbage we can make it even worse, as long as the garbage isn’t detected as such.