• acockworkorange
    link
    fedilink
    arrow-up
    6
    ·
    2 days ago

    Kalman filters can be used to filter noise from multivariate data, and it’s just a simple matrix transform, no risk of hallucinations.

    Neutral networks are notoriously bad at dealing with small data sets. They “overfit” the data, and create invalid extrapolations when new data falls just a little outside training parameters. The way to make them useful is to have huge amounts of data, train the model on a small portion of it and use the rest to validate.

    • zr0@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      Fully agree, overfitting might be an issue. We don’t know how much training data was available. Just more than the first assumption suggests. But it might still not be enough.

      • acockworkorange
        link
        fedilink
        arrow-up
        1
        ·
        1 day ago

        There aren’t a lot of high resolution images of black holes. I know of one. So not a lot.

        • zr0@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 day ago

          Oh, they don’t train on image data. They train on raw sensor data. And as mentioned earlier, they used all the data that was too noisy to produce images out of it.

          • acockworkorange
            link
            fedilink
            arrow-up
            1
            ·
            1 day ago

            Of that one mission, right? Until you have thousands of these days sets, it’s the wrong approach.

            • zr0@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 day ago

              9 petabytes of raw data have been produced with the EHT in 2017 and 2018. After filtering, only about 100 terabytes were left. After final calibrations, about 150 gigabytes were then used to generate the images.

              So clearly a lot of data was thrown away, as it was not usable for generating images. However, a machine learning model might be able to use this data.