Much more detailed than that. In the video there was a 3 piece band playing for a few seconds on screen. The user prompt asked: “Tell me where I can buy the shirt the keyboardist is wearing at timestamp 32 seconds”. The Bot found the website of the vendor selling the shirt.
Okay, that’s pretty neat, but at the same time basically the same as loading a still image into a current AI image matching suite and having it identify a keyboard, then a shirt near the keyboard, then reverse image search that tshirt. It’s super cool to be able to do, but kinda standard at this point.
I guess the interactivity, being able to feed in a url on the fly is the value add. I still would have liked my “generate subtitles then search them” imaginary bot more though.
Sounds like it searched the subtitles, found the time stamps, and returned the relevant text. Useful, but ultimately a pretty simple bot.
Much more impressive if it “watched” the video for the first time, formed it own subtitles, then pulled the data out. That would be a feat.
Much more detailed than that. In the video there was a 3 piece band playing for a few seconds on screen. The user prompt asked: “Tell me where I can buy the shirt the keyboardist is wearing at timestamp 32 seconds”. The Bot found the website of the vendor selling the shirt.
Okay, that’s pretty neat, but at the same time basically the same as loading a still image into a current AI image matching suite and having it identify a keyboard, then a shirt near the keyboard, then reverse image search that tshirt. It’s super cool to be able to do, but kinda standard at this point.
I guess the interactivity, being able to feed in a url on the fly is the value add. I still would have liked my “generate subtitles then search them” imaginary bot more though.