GRPO notebook for Llama 3.1 8B: https://colab.research.google.com/github/unslothai/notebooks...
General finetuning notebook: https://colab.research.google.com/github/unslothai/notebooks...
The Berkeley team's 17K dataset: https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k Hugging Face also released a 220K dataset: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
Also you can install Unsloth on your local machine :)
Kaggle has 2x Tesla T4s as well for free for 30 hours per week!
I expected some sort of way to actually get o1 preview retrained (and downloadable).
Also, calling it O1 preview on just 7 benchmarks is not correct. What if someone comes up with some use cases where O1 preview does better than this.
apart from that, good that things are becoming cheaper.
If you a headline saying 'make your own James Webb Space Telescope in a weekend' they're offering a project that leverages some tech concept from the JWST, like mirror arrays or a particular sort of sensor. They're not promising that you will be able to build a space-capable telescope the size of a semi truck.
The vocabulary used to describe the culturally prevailing leader will be used to explain similar concepts and create analogies. That's an easier tool to communicate to the masses than crafting super tailored messages for only domain experts.
It's why we keep doing this, and it's also why trademarks become generics.
"Google it", "Uber for X", "band aid", "the band sounds like Y", "the actor looks like Z", etc. etc.
This is a core part of how human language works and how we as a species communicate with one another.
"Wow! Quite a feat to deliver an iconic design, a 631 horsepower engine, and performance of 0-150 mph in 15.4 seconds on such a small budget!"
"Actually what we mean is, like the Lamborghini Huracan, our vehicle has two seats."
Also, at $450 no one expect it to truly be a from-scratch complete recreation of a model that cost hundreds of millions to produce.
Instead, they built a model (via fine tuning) using similar technique and got similar results within their attempted are of experimentation that they created their training data for.
I personally was not mislead by the title at all.
There are open source models better than OpenAI's image and video models, and OpenAI is not winning the LLM space by any measure.
The hobbyist absolutely won't feel as though they're trying to fake a Huracan with a Camry here. They're going to build useful products with whatever they choose, regardless of what vendor or open source project produced the model.
Your analogy is silly. OpenAI is more like Band-Aid(r) than Lamborghini Huracan.
Verdict: dishonest
In the last weeks are are seeing a torrent of advances, just because someone opened their architectures.
Imagine where we could go if the training datasets were also publicly available and unbounded by any copyright laws. (I'm not talking about doing anything illegal).
I can only dream, I guess.
https://www.privacyworld.blog/2024/03/japans-new-draft-guide...
I imagine if copyright is a big issue for AI, Japanese startups will have an advantage.
I think we can all agree there does need to be an update. You don't want to forever outlaw deep learning (even if you do want to, that's not going to happen so it's worth helping to shape the future)
It's very complicated with a bunch of moving parts but I really want society to start arguing about it so we can get to a semi-fair place
$25 to Elsevier per GPU purchase
I do share your opinion. Others may argue "What about x country? They don't care!", even though that position is about as good as making anything excusable because someone else did it.
I might add, I'm really not trying to be toxic. Just saying this based on what I see when this comes up.
You weren't going to buy a book instead of asking a question.
Never mind that you've just handed control of an incredibly-powerful tool over to nations that DGAF about copyright law.
If copyright interests want to fight AI, then copyright has to go. It's that simple. It's an unnecessary fight, but somebody needs to convince them of that.
edit: re. the /s I was living offshore and running the most popular bitcoin casino at the time, spending a vast amount of money and energy to block any player who might be American. As a result I didn't make that much money. And I tried to calculate how much I would need to make if I wanted to break the law and hide out forever. I figured I could make $10-15M a year but that wouldn't be enough to hide. I fucked up, I guess. Because the richest man in the world made most of his first round of money facilitating gambling transactions, and he's now got his snout in every federal agency. I should have had the balls, I guess, to ask forgiveness rather than permission.
- They used QwQ to generate training data (with some cleanup using GPT-4o-mini)
- The training data was then used to FT Qwen2.5-32B-Instruct (non-reasoning model)
- Result was that Sky-T1 performs slightly worse than QwQ but much better than Qwen2.5 on reasoning tasks
There are a few dismissive comments here but I actually think this is pretty interesting as it shows how you can FT a foundation model to do better at reasoning.
I made a GitHub project for distilling thinking models (and customs COT inference time fine tuning): https://docs.getkiln.ai/docs/guide-train-a-reasoning-model
Do you have any pointers on assembling fine-tuning data not for isolated tasks, but for a flexible range of queries in a particular problem domain? Similar to general purpose instruction-tuning, but much more focused.
For example, suppose you’re building an app that helps doctors search through research literature to aid in diagnosis, check hypotheses, etc. Of course you would want to have some domain experts and real users available to see what kind of queries they would create. But getting from that point to a well-balanced dataset that adequately represents the distribution of possible queries, instructions, writing/cognitive styles, formatting, dialog flows, etc. your app will encounter —- it just seems kind of hard to know how to approach a task like that. It seems there are infinitely many dimensions you could accidentally overfit on.
It's quite difficult to see all the future decisions you will make due to future insights about future versions of the whole loop. But you will be needing to make some.
I will say one more concrete thing though: the more metadata you collect, generally, the better, but this can make it more expensive.
Also, if you ever need to update your schema.. well this is actually one reason why text data for LLMs is nice: your schema is essentially fluid in the first place, so you could eg stick metadata in the text itself if at some future point you start collecting it.
I guess, also, it's a good thing to constantly add new benchmarks, if possible. Treat your model's capabilities as knowable, but never treat your model's capabilities as actually known.
> Language model post-training is applied to refine behaviors and unlock new skills across a wide range of language models, but open recipes for applying these techniques lag behind proprietary ones. The underlying training data and recipes for post-training are simultaneously the most important pieces of the puzzle and the portion with the least transparency. To bridge this gap, we introduce Tülu 3, a family of fully-open state-of-the-art post-trained models, alongside its data, code, and training recipes, serving as a comprehensive guide for modern post-training techniques. Tülu 3, which builds on Llama 3.1 base models, achieves results surpassing the instruct versions of Llama 3.1, Qwen 2.5, Mistral, and even closed models such as GPT-4o-mini and Claude 3.5-Haiku. The training algorithms for our models include supervised finetuning (SFT), Direct Preference Optimization (DPO), and a novel method we call Reinforcement Learning with Verifiable Rewards (RLVR). With Tülu 3, we build a multi-task evaluation scheme for post-training with development and unseen evaluations, standard benchmark implementations, and substantial decontamination of existing open datasets on said benchmarks. We conclude with analysis and discussion of training methods that did not reliably improve performance. The Tülu 3 release includes model weights, a demo, and the complete recipe — datasets for diverse core skills, a robust toolkit for data curation and evaluation, the training code and infrastructure, and, most importantly, a detailed report for reproducing and further adapting the Tülu 3 approach to more domains.
That said, for someone who's not in the game but been curious as to the details of fine-tuning, it's great to get both the dataset and the code.
Hardly a huge win.
This is massive marketing scam here. Borderline academic dishonesty.
From the title, my best guess was they applied some kind of RL/GRPO to an existing model.
But... they took an existing model that had already undergone SFT for reasoning... and then used it to generate data to SFT the exact same model... nothing wrong with that, but it doesn't seem to warrant the title they chose.
They didn't even change the model size, let alone try a different class of models.
Getting expert model's trajectories is trivial if you have vLLM to do batched inference.