Absolutely humongous model. Mixture of 256 experts with 8 activated each time.
Aider leaderboard:
The only model above 🐋 v3 here is OpenAI o1. DeepSeek is known to make amazing models and Aider rotates their benchmark over time, so it is unlikely that this is a train-on-benchmark situation.
Some more benchmarks: on Reddit.
You must log in or # to comment.
For the user whose VRAM knob goes to 11
Someone managed to run it on a cluster of Mac Minis lol https://blog.exolabs.net/day-2/
deleted by creator