

I would be 0% surprised to learn that the modelfarmers “iterated” to “hmm, people are doing a lot of logic tests, let’s handle those better” and that that’s what gets here
(I have no evidence for this, but to me it seems a completely obvious/evident way for them to try keep the party going)
(excuse possible incoherence it’s 01:20 and I’m entirely in filmbrain (I’ll revise/edit/answer questions in morning))
re (1): while that is a possibility, keep in mind that all this shit also operates/exists in a metrics-as-targets obsessed space. they might not present end user with hit% but the number exists, and I have no reason to believe that isn’t being tracked. combine that with social effects (public humiliation of their Shiny New Model, monitoring usage in public, etc etc) - that’s where my thesis of directed prompt-improvement is grounded
re (2): while they could do something like that (synthetic derivation, etc), I dunno if that’d be happening for this. this is outright a guess on my part, a reach based on character based on what I’ve seen from some the field, but just……I don’t think they’d try that hard. I think they might try some limited form of it, but only so much as can be backed up in relatively little time and thought. “only as far as you can stretch 3 sprints” type long
(the other big input in my guesstimation re (2) is an awareness of the fucked interplay of incentives and glorycoders and startup culture)