Llama 3.1 Megathread

Blaed@lemmy.world · 3 months ago

Llama 3.1 Megathread

Blaed@lemmy.world · 9 months ago

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Blaed@lemmy.world · 10 months ago

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Blaed@lemmy.world · edit-2 10 months ago

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Blaed@lemmy.world · 10 months ago

What sort of tokens per second are you seeing with your hardware? Mind sharing some notes on what you’re running there? Super curious!

Blaed@lemmy.world · edit-2 10 months ago

I was pleasantly surprised by many models of the Deepseek family. Verbose, but in a good way? At least that was my experience. Love to see it mentioned here.

Blaed@lemmy.world · 10 months ago

Develop Alongside Local LLMs w/ Open Interpreter

Blaed@lemmy.world · 10 months ago

What open-source LLMs are you using in 2024?

Blaed@lemmy.world · 10 months ago

FOSAI 2024

Blaed@lemmy.world · edit-2 1 year ago

Blaed's Hiatus (Part I)

Blaed@lemmy.world · 1 year ago

I appreciate this comment more than you will know. Thanks for sharing your thoughts.

It’s been a challenge realizing this time capsule is more than that - but a grassroots community and open-source project bigger than me. Adjusting the content to reflect shared interests has been a concept I have grappled with these last few weeks - especially as we exit some of the exciting innovations we saw earlier this year.

I think the type of content series you mention is the next step here - that being practical and pragmatic insights that illustrate / enable new workflows and applications.

That being said, this type of content creation will likely take more time than the journalistic reporting I’ve been doing - but I think it’s absolutely worth the effort and the next logical evolution of whatever this forum becomes.

Thanks again for your kind words. I work 5/6 day weeks in my tech job on top of this, so burnout is a real thing. I think I’ll go for a hike this week and reevaluate how to best proliferate and spread FOSAI.

If you’re reading this now and have ideas of your own - I’m all ears.

Blaed@lemmy.world · edit-2 1 year ago

What kind of content do you want to see more of?

Blaed@lemmy.world · edit-2 1 year ago

Llama 2 / WizardLM Megathread

Blaed@lemmy.world · edit-2 1 year ago

Sharing brev.dev - A new platform for fine-tuning models on cloud GPUs

Blaed@lemmy.world · 1 year ago

Create a Large Language Model from Scratch with Python – Tutorial

Blaed@lemmy.world · 1 year ago

HyperTech News Report #0003 - Expanding Horizons

Blaed@lemmy.world · 1 year ago

HyperTech News Report #0003 - Expanding Horizons

Blaed@lemmy.world · edit-2 1 year ago

HyperTech News Report #0003 - Expanding Horizons

Blaed@lemmy.world · edit-2 1 year ago

Anyone else working with retrieval augmented generation? (RAG)

Blaed@lemmy.world · 1 year ago

This is on the horizon - I will definitely be making a post on the workflow and process once it is figured out.

Blaed@lemmy.world · edit-2 1 year ago

I am actively exploring this question.

So far - it’s been the best performing 7B model I’ve been able to get my hands on. Anyone running consumer hardware could get a GGUF version running on almost any dedicated GPU/CPU combo.

I am a firm believer there is more performance and better quality of responses to be found in smaller parameter models. Not too mention interesting use cases you could apply fine-tuning an ensemble approach.

A lot of people sleep on 7B, but I think Mistral is a little different - there’s a lot of exploring to be had finding these use cases but I think they’re out there waiting to be discovered.

I’ll definitely report back on how the first attempt at fine-tuning this myself goes. Until then, I suppose it would be great for any roleplay or basic chat interaction. Given it’s low headroom - it’s much more lightweight to prototype with outside of the other families and model sizes.

If anyone else has a particular use case for 7B models - let us know here. Curious to know what others are doing with smaller params.

Blaed@lemmy.world · edit-2 1 year ago

Mistral 7B Megathread

Blaed@lemmy.world · edit-2 1 year ago

AutoGen - Enabling Next Generation LLM Applications

Blaed@lemmy.world · 1 year ago

What do you think are some of the most interesting use cases for AGI?

Blaed@lemmy.world · edit-2 1 year ago

What I find interesting is how useful these tools are (even with the imperfections that you mention). Imagine a world where this level of intelligence has a consistent low error rate.

Semantic computation and agentic function calling with this level of accuracy will revolutionize the world. It’s only a matter of time, adoption, and availability.

Blaed@lemmy.world · edit-2 1 year ago

I respect your honesty.

Blaed@lemmy.world · edit-2 1 year ago

Google has absolutely tanked for me these last few years. It revolutionized the world by revolutionizing search. But ChatGPT has done the same, now better - and in a much more interesting way.

I’ll take a 10 second prompt process over 20 minutes of hunting down (advertised) paged results any day of the week.

Blaed@lemmy.world · 1 year ago

I have learned everything I have about AI through AI mentors.

Having the ability to ask endless amounts of seemingly stupid questions does a lot for me.

Not to mention some of the analogies and abstractions you can utilize to build your own learning process.

I’d love to see schools start embracing the power of personalized mentors for each and every student. I think some of the first universities to embrace this methodology will produce some incredible minds.

You should try fine-tuning that legalese model! I know I’d use it. Could be a great business idea or generally helpful for anyone you release it to.

Blaed@lemmy.world · 1 year ago

I cannot understate how nice it is having a coding assistant 24/7.

I’m curious to see how projects like ChatDev evolve over time. I think agentic tooling is going to take us to some very sci-fi looking territory.

Semantic computation is the future.

Blaed@lemmy.world · 1 year ago

I never considered 8 - 11. Those are really interesting use cases. I’m with you on every other point. I’m particularly interested in solving the messy unstructured notes scenario. I really feel you on that one. I’ll see what I can do!

Blaed@lemmy.world · 1 year ago

What I find particularly exciting is that we’re seeing this evolution in real-time.

Can you imagine what these models might look like in 2 years? 5? 10?

There is a remarkable future on the horizon. I hope everyone gets an equal chance to be a part of it.

Blaed@lemmy.world · edit-2 1 year ago

I could not agree more. I really enjoy Andrej Karpathy’s model where in the future AGI does 99% of the technical work and the human in the loop does the creative and critical 1%.

Blaed@lemmy.world · 1 year ago

Why do you like LLMs?

Blaed@lemmy.world · 1 year ago

Mistral seems to be the popular choice. I think it’s the most open-source friendly out of the bunch. I will keep function calling in mind as I design some of our models! Thanks for bringing that up.

Blaed@lemmy.world · 1 year ago

I appreciate your comment! It seems like we’re going the fine-tuning route. I think it’s the best way to do it too. I’m still glad I floated around the foundation model idea. We’ll get one of our own eventually!

Welcome to the show! Enthusiast or not, you are part of !fosai@lemmy.world. Your input is valued and your curiosity is encouraged!

Blaed@lemmy.world · 1 year ago

It seems like we’ll be starting with Mistral - which means the model will be completely open-source under the Apache 2.0 License.

All fine-tunings I release under fosai would be licensed under the same Apache 2.0 agreement, giving you and everyone else complete permissions to modify, download, distribute, and deploy this model as you see fit. It would make the model commercially viable out-of-the-box without any restrictions set by a corporation or entity.

I’m also not a copyright lawyer, so someone correct me If I’m wrong here but if I fine-tune Mistral (which I probably will) and also release the derivative under the Apache 2.0 license - you own the version you choose to download completely. You don’t need to adhere to a usage policy. You are still responsible for what you end up doing with your model (within all local applicable laws), but you also don’t have to worry about Meta (or some other entity) revoking or changing their policy/usage/terms at some point in the future. You are free to do whatever you want with an Apache licensed model.

At the end of the day, Llama 2 is owned and distributed by Meta AI, which has some of those restrictions I mentioned, even though it is somewhat open-source. Here is the license. Some notes from it that might be worth mentioning:

You need to credit Meta whenever you share Llama 2 by including a specific notice.
You have to follow all laws and regulations when using Llama 2 and also adhere to Meta’s usage policy.
You can’t use Llama 2 to make or improve other similar software (large language models), except Llama 2 itself or things derived from it.
If your company or its affiliates have more than 700 million users a month, you can’t just use this agreement. You have to ask Meta for special permission.

Blaed@lemmy.world · 1 year ago

I wouldn’t want risk a legal battle with a company the size of Meta, so I’d vote for the other options just to be one the safe side.

Completely reasonable, I agree.

Do you have the resources for this to be a viable option?

Where there’s a will, there’s a way. I could muster the resources for a foundation model, but it’s definitely not the most optimal option we have at our disposal. The original plan was a.) fine-tune a small series (short-term) b.) release a foundation model (long-term). I only recently considered skipping Plan A, but I’m glad I’ve got feedback to prevent me from doing otherwise. Would’ve enjoyed the process nonetheless.

Are you confident that the end result will be better than Mistral? If not, why spend that much on creating something equivalent or possibly even inferior?

Of course not. I don’t do this to be the best. I offer to do this to understand. To document how to build and release a foundation model from start to finish is knowledge that could be valuable to someone else - which is why I was willing to skip ahead if that was a topic others wanted to dive more into. For me, it’s more about the friends we make along the way. There is grace in polishing a product and being the best, but I’d like to think there is also something special in doing something just to document it for others. There is something fulfilling exploring a new frontier with nothing but sheer curiosity.

Then there’s also the question of how long a model is going to be relevant before some other new model with all the latest innovations is released and makes everything else look outdated… Even if you can create a model which rivals llama-2 and mistral now, are you going to create a new one to compete with llama-3 and mistral-2 when those come along?

I also don’t do this to be relevant. To be a part of the this is enough for me. In my studies, I have found something bigger than me - I see myself doing this for many years so I know I’ll be around to see it evolve and current technologies become irrelevant in time. If you consider existing alongside these models as ‘competing’ then yes, I would be doing that I suppose.

Sorry for the negativity but I think creating a base model sounds likely to be a massive waste of resources. If you have a lot of time and money to throw at this project, I think it would be much better spent on fine-tuning existing models.

Don’t worry, it was very great feedback. Exactly why I made this post! I’m glad you made all your points. It’s the same logic I had (and the same logic I was willing to throw aside for others). At this point, it seems like fine-tuning is what most of you want to see. So fine-tuning it shall be!

Blaed@lemmy.world · edit-2 1 year ago

This will be a fine-tuned model, so it may inherit some of the permissions and license agreements as its foundation model and have other implications depending on your country or local law.

You are correct, if we chose Llama 2 - the fine-tune derivative may be subject to their original license terms. However, Apache 2.0 would apply and transfer to something like a fine-tuned version of Mistral, since its base license is also Apache 2.0.

If there is enough support - I’d be more than open to creating an entirely new foundation model family. This would be a larger undertaking than this initial fine-tuning deployment, but building a completely free FOSAI foundation family of models was the penultimate goal of this project so if this garners enough attention I could absolutely put energy and focus into creating another Mistral-like product instead of splashing around with fine-tuning.

Whatever would help everyone the most! I like where you’re thinking though, I’m going to update the thread to include an option to vote for a new foundation family instead. At the end of the day, it’s likely I’ll do all of the above - I’m just not sure in what order yet…

Blaed@lemmy.world · edit-2 1 year ago

I have come to believe Moore’s law is finite, and we’re starting to see the exponential end of it. This leads me to believe (or want to believe) there are other looming breakthroughs for compute, optimization, and/or hardware on the horizon. That, or crazy powerful GPUs are about to be a common household investment.

I keep thinking about what George Hotz is doing in regards to this. He explained on his podcast with Lex Fridman that there is much to be explored in optimization, both with quantization of software and acceleration of hardware.

His idea of ‘commoditize the petabyte’ is really cool. I think it’s worth bringing up here, especially given the fact it appears one of his biggest goals right now is solving the at-home compute problem. But in a way that you could actually run something like a 180B model in-house no problem.

George Hotz’ tinybox

($15,000)

738 FP16 TFLOPS
144 GB GPU RAM
5.76 TB/s RAM bandwidth
30 GB/s model load bandwidth (big llama loads in around 4 seconds)
AMD EPYC CPU
1600W (one 120V outlet)
Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)

You can pre-order one now. You have $15k laying around, right? Lol.

It’s definitely not easy (or cheap) now, but I think it’s going to get significantly easier to build and deploy large models for all kinds of personal use cases in our near and distant futures.

If you’re serving/hosting models, it’s also worth checking out vLLM if you haven’t already: https://github.com/vllm-project/vllm

Blaed@lemmy.world · 1 year ago

Loved to read everyone’s comments on this one. If you’re here and reading this post now, check out this related thread - you might be interested!

We’re building FOSAI models!

Moderates