LLaMA Now Goes Faster on CPUs

justine.lol

cross-posted to:
localllama@sh.itjust.works
hackernews@lemmy.smeargle.fans
aicompanions@lemmy.world

LLaMA Now Goes Faster on CPUs

justine.lol

agilob@programming.devM to

Performance@programming.devEnglish · 8 months ago

cross-posted to:
localllama@sh.itjust.works
hackernews@lemmy.smeargle.fans
aicompanions@lemmy.world

I wrote 84 new matmul kernels to improve llamafile CPU performance.

My kernels go 2x faster than MKL for matrices that fit in L2 cache, which makes them a work in progress, since the speedup works best for prompts having fewer than 1,000 tokens.

You must log in or # to comment.

Chat