Never talk morals with libs because these are the things they spend their time questioning. Whether or not it’s okay to use AI to detect CP. Real thinker there

https://lemm.ee/post/12171882

  • LanyrdSkynrd [comrade/them, any]@hexbear.net
    link
    fedilink
    English
    arrow-up
    24
    ·
    1 year ago

    Google already has an ML model that detects this stuff, and they use it to scan everyone’s private Google photos.

    https://www.eff.org/deeplinks/2022/08/googles-scans-private-photos-led-false-accusations-child-abuse

    The must have classified and used a bunch of child porn to train the model and I have no problem with that, it’s not generating new CP or abusing anyone. I’m more uncomfortable with them running all our photos through an AI model and sending the results to the US government and not telling the public.

    • WayeeCool [comrade/them]@hexbear.net
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      1 year ago

      They just run it on photos stored on their servers. Microsoft, Apple, Amazon, and Dropbox also do the same. There are also employees in their security departments with the fkd up job of having to verify anything flagged then alert law enforcement.

      Everyone always forgets that “cloud storage” means files are stored on someone else’s machine. I don’t think anyone, even soulless companies like Google or Microsoft want to be hosting CSAM. So it is understandable that they scan the contents of Google Photos or Microsoft OneDrive, even if they didn’t have a legal obligation there is a moral one.

  • kristina [she/her]@hexbear.net
    link
    fedilink
    English
    arrow-up
    14
    ·
    edit-2
    1 year ago

    when they talk about this, there are identifiers that detect it and remove it automatically, you arent actually storing it in any way. this is standard operation for any major website.

    yall really need to stop reading ‘AI’ and having your brains shut off in general, not really referring to this case just in general

    • LeylaLove [she/her, love/loves]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      13
      ·
      1 year ago

      Hash lists exist yeah. But American law actually requires website hosts to keep the CP for evidence instead of deleting it. It’s why DivideBy0’s tool isn’t supposed to be used for American Lemmy instances. Like if you upload a flagged image to Google drive, Google is supposed to flag it, save it, and call the cops.

      • kristina [she/her]@hexbear.net
        link
        fedilink
        English
        arrow-up
        13
        ·
        edit-2
        1 year ago

        i get that its supposed to be for evidence but its really fucked up to have to put small time server owners through that shit, terrible law. got to be some other way to handle that

        • LeylaLove [she/her, love/loves]@hexbear.netOP
          link
          fedilink
          English
          arrow-up
          12
          ·
          1 year ago

          I agree. The DivideBy0 tool should be standard on here. Instantly deleting it when its uploaded and saving post ip is the best solution. More just explaining that anybody in the position to make a tool like that wouldn’t have to go out of their way to get source material because legally speaking, they should already have some. There are site hosts that ignore this law and just delete and ban instantly (as they should), but I think it’s important to explain why these tech companies just happen to have large repositories of CP to train AI on.

    • LeylaLove [she/her, love/loves]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      1 year ago

      It is a great thing, hence why I’m posting this. Why the fuck is there anybody thinking about the moral implications of using AI to handle CP? What moral implications? What’s wrong with it?

  • drhead [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    12
    ·
    1 year ago

    Bottom comment is technically correct, you can bypass any dataset-related ethical concerns. You could very easily make a classifier model that just tries to find age and combine it with one of many existing NSFW classifiers, flagging any image that scores high for both.

    But there are already organizations which have large databases of CSAM that are suitable for this purpose, using it to train a model would not create any additional demand, it would not result in any more harm, and it would likely result in a better model. Keep in mind that these same organizations already use these images in the form of a perceptual hash database that social media and adult content sites can check uploaded images against to see if they are known images while also not sharing the images themselves. This is just a different use of the data for a similar purpose.

    The only actual problem I could think of would be if people trust its outputs blindly instead of manually reviewing images and start reporting everything that scores high directly to the police, but that is more of a problem with inappropriate use of the model than it is with the training or existence of the model. It’s very safe to respond like this to images flagged by NCMEC’s phash database because those are known images and if any false positives happen they have the originals to compare to so they can be cleared up, but even if you get a 99% accurate classifier model, you will still have something that is orders of magnitude more prone to false positives than the phash database, and it can be very difficult to find out why it generates false positives and correct the problem, because… well, it involves extensive auditing of the dataset. I don’t think this is enough reason to not make such a model, but it is a reason to tread carefully when looking at how to apply it.

  • Frank [he/him, he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    Usually when people do this it’s with, like, Mengele or Unit 731s “research”, except their research was almost entirely insane sadism with at most a veneer of science. Whereas this is really cut and dry - LLMs can be trained to recognize patterns, you’ve got ready access to a training dataset (Or the FBI does, or whatever), you can train your LLM to flag and remove CSAM.

    • LeylaLove [she/her, love/loves]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 year ago

      Yeah. I’m not justifying people defending Mengele or Unit 731, but there’s at least enough there that you can at least understand why people came to this conclusion. Plus, westerners pardoned Unit 731 and similar German scientists for the exact reason of “let’s get the data”. While they found out the data wasn’t useful, the west also had to stick by their decisions to not seem absolutely insane. There has been a propaganda push, along with the west’s poorly developed utilitarian view, that makes defending Unit 731 understandable. Defending Unit 731 really isn’t that much of a jump from defending capitalists putting workers in many of the same horrific deaths through workplace austerity. If it’s okay to cook people to death because you wanted to make a few bucks, what’s the issue with freezing someone to death? Not okay mind you, but I can clearly see the route the brainworms took.

      This though? Like what do people expect? There is no good reason to not want AI on CP enforcement. Just because “there’s a database of CP”? What do people want LE to do with evidence of abuse? I WISH the videos of my abuse were in some Fed’s database. Instead my abuser walked off and now I just get the occasional creepy message talking about how hot my abuse was. Tracking pedophiles is pretty much the only consistently good thing the feds do. Why are people criticizing THIS of all things? There are no grounds to make this a real moral debate

  • WhatDoYouMeanPodcast [comrade/them]@hexbear.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    Only moral quandary that comes to mind is that if you give an AI a bunch of CSAM then it leaks then 1) someone has it 2) AI gets better at generating it. Who’s training AI? Who verifies it? Security protocols?

    • drhead [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 year ago

      This would be a classifier model, incapable of making images. Most classifier models only output a dictionary of single floating point values for each class it is trained on representing the chance the image belongs to that class. It’d probably be trained by an organization like NCMEC, possibly working with a very well trusted AI firm. For verifying it, usually you reserve a modest representative sample of the database and don’t train on it, then use that to determine how accurate the model is and to decide on what score threshold is appropriate for flagging.