Extending Context Window of Large Language Models via Positional Interpolation

nsa@kbin.social · 1 year ago

miro@kbin.social · 1 year ago

Is this similar to what MPT did to extend its context length?

Blaed@lemmy.world · 1 year ago

I believe it’s a different technique (at least far as I understand the topics).

The original author of this new method (SuperHOT by kaiokendev) shares what he has learned about this method here:

nsa@kbin.social · 1 year ago

hmmm… not sure which model you’re referring to. do you have a paper link?

SSamDav@lemmy.pt · 1 year ago

One cool thing about this work is that there was a concurrent discussion in twitter about the proposed method. From different authors.

nsa@kbin.social · 1 year ago

do you have a link?