• Pofski@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 hours ago

      Ask it to generate a room full of clocks with all of them having the hands at different times. You’ll see that all (or almost) all the clocks will say it is 10:10.

    • Lvxferre [he/him]
      link
      fedilink
      English
      arrow-up
      32
      ·
      20 hours ago

      It gets even worse, but I’ll need to translate this one.

      • [Input 1] Generate a picture containing a copo completely full of wine. The copo must be completely full, with no space to add more wine.
      • [Output 1] Sure! (Gemini provides a picture containing a taça [stemmed glass] only partially full of wine.)
      • [Input 2] The picture provided does not fulfill the request. Generate a picture of a copo (not a taça) completely full of wine, with no available space for more wine.
      • [Output 2] Sure! (Gemini provides yet another half-full taça)

      For context, Portuguese uses different words for what English calls a drinking glass:

      • copo ['kɔ.po]~['kɔ.pu] - non-stemmed drinking glass. The one you likely use everyday.
      • taça ['tä.sɐ] - stemmed drinking glass, like the ones you’d use with wine.

      Both requests demand a full copo but Gemini is rather insistent on outputting half-full taças.

      The reason for that is as @will_steal_your_username@lemmy.blahaj.zone pointed out: just like there’s practically no training data containing full glasses, there’s none for non-stemmed glasses with wine.

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        1 hour ago

        This is a misconception. Sort of.

        I think the problem is misguided attention. The word “glass of wine” and all the previous context is so strong that it “blows out” the “full glass of wine” as the actual intent. Also, LLMs are still pretty crap at multi turn multimedia understanding. They work are especially prone to repeating previous conversation.

        It should be better if you word it like “an overflowing glass with wine splashing out.” And clear the history.

        I hate to ramble, but this is what I hate most about the way big corpos present “AI.” They are narrow tools the user needs to learn how to operate, like photoshop or something, not magic genie lamps like they are trying to sell.

        • Draconic NEO@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          12
          ·
          17 hours ago

          Yup Horde still suffers from this issue, though it seems to have more promise than the others considering the second glass is way closer to being full than anything I’ve sen from openAI or Gemini demonstrations. Maybe there’s hope to fix this issue here.

          I only tried one model so if you know of a different horde model which works better for this and actually gives a full glass please reply below letting me know, maybe even ask the horde bot to generate it right here.

          • Lvxferre [he/him]
            link
            fedilink
            English
            arrow-up
            4
            ·
            17 hours ago

            I have considerably less experience with image generation than text generators, but I kind of expect the issue to be only truly fixed if people train the model with a bunch of pictures of glasses full of wine.

            I’ll run a test using a local tree, that is supposed to look like this:

            @aihorde@lemmy.dbzer0.com draw for me a picture of three Araucaria angustifolia trees style:flux

              • Lvxferre [he/him]
                link
                fedilink
                English
                arrow-up
                8
                ·
                edit-2
                17 hours ago

                Bingo - this tree is non-existent outside my homeland, so people barely speak about it in English - and odds are that the model was trained with almost no pictures of it. However one of the names you see for it in English is Paraná pine, so it’s modelling it after images of European pines - because odds are those are plenty in its training set.

      • Focal@pawb.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        16 hours ago

        Wait, this seems incredible. Do you have to be in the same instance or does it work anywhere? @aihorde@lemmy.dbzer0.com Can you draw a smart phone without a rotary phone dial?

      • Lvxferre [he/him]
        link
        fedilink
        English
        arrow-up
        3
        ·
        20 hours ago

        It does for a while already. Frankly, it’s the only reason why I’d use Gemini on first place (DDG version of GPT 4-o mini doesn’t have a built-in image generator).

      • Lvxferre [he/him]
        link
        fedilink
        English
        arrow-up
        13
        ·
        20 hours ago

        It is not a completely full glass.

        it’s not supposed to be filled all the way

        What I requested is not what you’re “supposed” to do, indeed. You aren’t supposed to drink wine from glasses that are completely full. Except when really drunk. But then might as well drink straight from the bottle.

        …fuck, I played myself now. I really want some booze.

        • UnhingedFridge@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 hours ago

          What you’re really supposed to do is - open up the box, slap the bag, and drink directly from your adult Capri Sun.

      • NOT_RICK@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        ·
        21 hours ago

        Probably why it won’t put more in it. How much training data of wine in a glass will have it filled to the brim? Probably next to none.