• 159 Posts
  • 95 Comments
Joined 1 year ago
cake
Cake day: June 10th, 2023

help-circle










  • I appreciate this comment more than you will know. Thanks for sharing your thoughts.

    It’s been a challenge realizing this time capsule is more than that - but a grassroots community and open-source project bigger than me. Adjusting the content to reflect shared interests has been a concept I have grappled with these last few weeks - especially as we exit some of the exciting innovations we saw earlier this year.

    I think the type of content series you mention is the next step here - that being practical and pragmatic insights that illustrate / enable new workflows and applications.

    That being said, this type of content creation will likely take more time than the journalistic reporting I’ve been doing - but I think it’s absolutely worth the effort and the next logical evolution of whatever this forum becomes.

    Thanks again for your kind words. I work 5/6 day weeks in my tech job on top of this, so burnout is a real thing. I think I’ll go for a hike this week and reevaluate how to best proliferate and spread FOSAI.

    If you’re reading this now and have ideas of your own - I’m all ears.











  • I am actively exploring this question.

    So far - it’s been the best performing 7B model I’ve been able to get my hands on. Anyone running consumer hardware could get a GGUF version running on almost any dedicated GPU/CPU combo.

    I am a firm believer there is more performance and better quality of responses to be found in smaller parameter models. Not too mention interesting use cases you could apply fine-tuning an ensemble approach.

    A lot of people sleep on 7B, but I think Mistral is a little different - there’s a lot of exploring to be had finding these use cases but I think they’re out there waiting to be discovered.

    I’ll definitely report back on how the first attempt at fine-tuning this myself goes. Until then, I suppose it would be great for any roleplay or basic chat interaction. Given it’s low headroom - it’s much more lightweight to prototype with outside of the other families and model sizes.

    If anyone else has a particular use case for 7B models - let us know here. Curious to know what others are doing with smaller params.








  • I have learned everything I have about AI through AI mentors.

    Having the ability to ask endless amounts of seemingly stupid questions does a lot for me.

    Not to mention some of the analogies and abstractions you can utilize to build your own learning process.

    I’d love to see schools start embracing the power of personalized mentors for each and every student. I think some of the first universities to embrace this methodology will produce some incredible minds.

    You should try fine-tuning that legalese model! I know I’d use it. Could be a great business idea or generally helpful for anyone you release it to.









  • It seems like we’ll be starting with Mistral - which means the model will be completely open-source under the Apache 2.0 License.

    All fine-tunings I release under fosai would be licensed under the same Apache 2.0 agreement, giving you and everyone else complete permissions to modify, download, distribute, and deploy this model as you see fit. It would make the model commercially viable out-of-the-box without any restrictions set by a corporation or entity.

    I’m also not a copyright lawyer, so someone correct me If I’m wrong here but if I fine-tune Mistral (which I probably will) and also release the derivative under the Apache 2.0 license - you own the version you choose to download completely. You don’t need to adhere to a usage policy. You are still responsible for what you end up doing with your model (within all local applicable laws), but you also don’t have to worry about Meta (or some other entity) revoking or changing their policy/usage/terms at some point in the future. You are free to do whatever you want with an Apache licensed model.

    At the end of the day, Llama 2 is owned and distributed by Meta AI, which has some of those restrictions I mentioned, even though it is somewhat open-source. Here is the license. Some notes from it that might be worth mentioning:

    • You need to credit Meta whenever you share Llama 2 by including a specific notice.
    • You have to follow all laws and regulations when using Llama 2 and also adhere to Meta’s usage policy.
    • You can’t use Llama 2 to make or improve other similar software (large language models), except Llama 2 itself or things derived from it.
    • If your company or its affiliates have more than 700 million users a month, you can’t just use this agreement. You have to ask Meta for special permission.

  • I wouldn’t want risk a legal battle with a company the size of Meta, so I’d vote for the other options just to be one the safe side.

    Completely reasonable, I agree.

    Do you have the resources for this to be a viable option?

    Where there’s a will, there’s a way. I could muster the resources for a foundation model, but it’s definitely not the most optimal option we have at our disposal. The original plan was a.) fine-tune a small series (short-term) b.) release a foundation model (long-term). I only recently considered skipping Plan A, but I’m glad I’ve got feedback to prevent me from doing otherwise. Would’ve enjoyed the process nonetheless.

    Are you confident that the end result will be better than Mistral? If not, why spend that much on creating something equivalent or possibly even inferior?

    Of course not. I don’t do this to be the best. I offer to do this to understand. To document how to build and release a foundation model from start to finish is knowledge that could be valuable to someone else - which is why I was willing to skip ahead if that was a topic others wanted to dive more into. For me, it’s more about the friends we make along the way. There is grace in polishing a product and being the best, but I’d like to think there is also something special in doing something just to document it for others. There is something fulfilling exploring a new frontier with nothing but sheer curiosity.

    Then there’s also the question of how long a model is going to be relevant before some other new model with all the latest innovations is released and makes everything else look outdated… Even if you can create a model which rivals llama-2 and mistral now, are you going to create a new one to compete with llama-3 and mistral-2 when those come along?

    I also don’t do this to be relevant. To be a part of the this is enough for me. In my studies, I have found something bigger than me - I see myself doing this for many years so I know I’ll be around to see it evolve and current technologies become irrelevant in time. If you consider existing alongside these models as ‘competing’ then yes, I would be doing that I suppose.

    Sorry for the negativity but I think creating a base model sounds likely to be a massive waste of resources. If you have a lot of time and money to throw at this project, I think it would be much better spent on fine-tuning existing models.

    Don’t worry, it was very great feedback. Exactly why I made this post! I’m glad you made all your points. It’s the same logic I had (and the same logic I was willing to throw aside for others). At this point, it seems like fine-tuning is what most of you want to see. So fine-tuning it shall be!


  • This will be a fine-tuned model, so it may inherit some of the permissions and license agreements as its foundation model and have other implications depending on your country or local law.

    You are correct, if we chose Llama 2 - the fine-tune derivative may be subject to their original license terms. However, Apache 2.0 would apply and transfer to something like a fine-tuned version of Mistral, since its base license is also Apache 2.0.

    If there is enough support - I’d be more than open to creating an entirely new foundation model family. This would be a larger undertaking than this initial fine-tuning deployment, but building a completely free FOSAI foundation family of models was the penultimate goal of this project so if this garners enough attention I could absolutely put energy and focus into creating another Mistral-like product instead of splashing around with fine-tuning.

    Whatever would help everyone the most! I like where you’re thinking though, I’m going to update the thread to include an option to vote for a new foundation family instead. At the end of the day, it’s likely I’ll do all of the above - I’m just not sure in what order yet…


  • I have come to believe Moore’s law is finite, and we’re starting to see the exponential end of it. This leads me to believe (or want to believe) there are other looming breakthroughs for compute, optimization, and/or hardware on the horizon. That, or crazy powerful GPUs are about to be a common household investment.

    I keep thinking about what George Hotz is doing in regards to this. He explained on his podcast with Lex Fridman that there is much to be explored in optimization, both with quantization of software and acceleration of hardware.

    His idea of ‘commoditize the petabyte’ is really cool. I think it’s worth bringing up here, especially given the fact it appears one of his biggest goals right now is solving the at-home compute problem. But in a way that you could actually run something like a 180B model in-house no problem.

    George Hotz’ tinybox

    ($15,000)

    • 738 FP16 TFLOPS
    • 144 GB GPU RAM
    • 5.76 TB/s RAM bandwidth
    • 30 GB/s model load bandwidth (big llama loads in around 4 seconds)
    • AMD EPYC CPU
    • 1600W (one 120V outlet)
    • Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)

    You can pre-order one now. You have $15k laying around, right? Lol.

    It’s definitely not easy (or cheap) now, but I think it’s going to get significantly easier to build and deploy large models for all kinds of personal use cases in our near and distant futures.

    If you’re serving/hosting models, it’s also worth checking out vLLM if you haven’t already: https://github.com/vllm-project/vllm