Model Distillation in the API

(openai.com)

64 points | by GavCo3 个月前

7 comments

simonw3 个月前
They announced this at DevDay at the beginning of October.
It's effectively a layer of (well needed) sugar on top of their existing fine-tuning mechanism.
The challenge with fine-tuning is collecting a representative dataset to tune against. The tooling they added makes it easy for you to persist your prompts and responses within the OpenAI platform, and then later select those persisted pairs (that were created with e.g. GPT-4o) and use them to fine-tune a cheaper model (like GPT-4o mini) - such that the more expensive model is effectively "teaching" the cheaper model what to do.
You could do this before, but it was a LOT of work. The new "distillation" features make it easier.
- msp263 个月前
  But doesn't this fully lock you into using OpenAI's offerings? If you store the finetuning dataset yourself, you are free to use it on whatever model/provider/self hosting. And I'm finding it fairly straightforward to switch between providers at will. Plus it's nice to do additional programmatic validation on LLM outputs before using them for finetuning (e.g. standardizing country names, calculations).
  Am I missing something?
  simonw3 个月前
  Yes, total lockin.
  Remember though, OpenAI and pretty much all of the other providers have things in their terms that roughly say you're not allowed to use their models to train models from other providers - so fine-tuning a model against synthetic data created using OpenAI is likely a terms violation anyway.
  janalsncm3 个月前
  Terms violation or not, it’s a common use case and a huge contributor to their traffic. If they actually enforced it people would move because the marginal quality benefits don’t justify indefinite lockin.
behnamoh3 个月前
With each announcement, OpenAI kills yet another class of startups. I wonder if there are areas that OpenAI (and other AI companies) can't enter because those seem to be the only viable startup ideas in the long-term.
Currently, OAI does all the following:
- offers flagship models
- offers lite models
- offers easy finetuning of their models
- offers structured output and guaranteed JSON output.
- offers parallel tool/function calling which remains unmatched.
- has low API costs.
- offers a nice UI for their models
- offers Mac, iOS, Android, Windows app clients.
- offers image generation capabilities INTEGRATED with their language models.
- offers two-tier subscription plans for ordinary/pro (team) users.
- offers custom GPTs which can be used by ordinary people to create GPT experiences tailored to specific tasks (no need to build a website on your own).
- allows users to easily share chats!! (it took Anthropic a long time to have this feature, and even now it's not as good as OpenAI's solution).
- offers prompt caching and task scheduling to further save costs.
- offers unrivaled voice-to-text models at different sizes (Whisper).
- offers text-to-voice models that feel much more natural than the competition.
- has outstanding documentation.
- sets the standard for API (all other companies have to follow their conventions, such as `messages`, `.choices[0].message.content`, etc.)
- has the most capable team to, idk, build AGI/ASI...
- danenania3 个月前
  I wouldn't say this kills a class of startups. Being constrained to a single model provider is quite limiting for this kind of use case. What if you get better cost/performance results by distilling into an open source model? In a landscape that changes so rapidly, there's a lot of value in provider-agnostic tooling.
- ben_w3 个月前
  > - offers unrivaled voice-to-text models at different sizes (Whisper).
  It may be (close to) the best, but Whisper is nowhere near good enough.
  So, if you're got a great idea for how to massively improve on that (I don't), there's a business opportunity there.
- TZubiri3 个月前
  Pick an industry and OpenAI will never compete with you. They are going for the AGI dream, they want to be a company valued at 50T. They will never settle for doing any actual grunt work.
- littlestymaar3 个月前
  That OpenAI will kill every startup that build a product on top of their technology isn't really a good signal for using OpenAI as a technological platform …
  victorbjorklund3 个月前
  Lots of companies are just wrappers around an API. Can be compared to Apple "killing" startups that sold flashlight apps to the first Iphone
  firejake3083 个月前
  Maybe it's better to build specific applications with OpenAI's products (e.g., using the API to automate a specific business process in your company) than to build a company offering a generic additional functionality like a UI or fine-tuning or custom prompts. Or at least, that seems like the target audience that OpenAI wants to sell their product to.
- 3 个月前
  undefined
- viraptor3 个月前
  I wouldn't put their API costs as low. There are a few decent competitors. They only recently caught up with Claude and for example for coding deepseek is much cheaper. Others also got the cheap prompt caching before OpenAI caught up. They may have good features, but pricing is still meh.
- mentalically3 个月前
  [dead]
GavCo3 个月前
From today's ChatGPT search announcement: "The search model is a fine-tuned version of GPT-4o, post-trained using novel synthetic data generation techniques, including distilling outputs from OpenAI o1-preview."
patelajay2853 个月前
We've been working on a Python framework where one of the use cases is easy distillation from larger models to smaller open-source models and smaller-closed source models (where you don't have to still use / pay for the closed-source API service): https://datadreamer.dev/docs/latest/
Here's an (now slightly outdated) example of OpenAI GPT-4 => OpenAI GPT-3.5: https://datadreamer.dev/docs/latest/pages/get_started/quick_...
But you can also do GPT-4 to any model on HuggingFace. Or something like Llama-70B to Llama-1B.
For some tasks, this kind of distillation works extremely well given even a few hundred examples of the larger model performing the task.
- bangaladore3 个月前
  > OpenAI GPT-4 => OpenAI GPT-3.5
  I'm confused why you are mentioning 3.5 here. The weights aren't public, so you aren't actually running any derivative of GPT-3.5
  Or am I mistaken. Can you clarify?
  Tiberium3 个月前
  > distillation from larger models to smaller open-source models and smaller-closed source models
  They don't limit it only to open-source models. And you can finetune 3.5 Turbo on OpenAI API.
Permik3 个月前
Do note that this article was posted October 1, 2024, so this capability has been available for a month.
serjester3 个月前
To anyone that thinks models are going to be commodified, it seems like it's going to be exceedingly difficult to compete with OpenAI. The developer experience working with them is just too good.
Sure you could use a different providers, but you're going to be stuck with an incredibly fragmented ops stack. My experience with Google has been shockingly bad and Anthropic has got a good amount of catching up to do. No one else is remotely competitive. Honestly would love to see something from Meta long term.
janalsncm3 个月前
Part of the point of distilling from their models is that I control the model, its availability, and its cost to me. So while this may be a convenient feature, unless I can download the weights it wouldn’t replace my workflow.
This does raise the bar for any future startups though. If your plan was to distill GPT4 outputs and lease the weights to me through a REST API, I probably won’t be interested.