Model Distillation in the API

(openai.com)

61 points | by GavCo12 小时前

7 comments

  • simonw11 小时前
    They announced this at DevDay at the beginning of October.

    It's effectively a layer of (well needed) sugar on top of their existing fine-tuning mechanism.

    The challenge with fine-tuning is collecting a representative dataset to tune against. The tooling they added makes it easy for you to persist your prompts and responses within the OpenAI platform, and then later select those persisted pairs (that were created with e.g. GPT-4o) and use them to fine-tune a cheaper model (like GPT-4o mini) - such that the more expensive model is effectively "teaching" the cheaper model what to do.

    You could do this before, but it was a LOT of work. The new "distillation" features make it easier.

    • msp2611 小时前
      But doesn't this fully lock you into using OpenAI's offerings? If you store the finetuning dataset yourself, you are free to use it on whatever model/provider/self hosting. And I'm finding it fairly straightforward to switch between providers at will. Plus it's nice to do additional programmatic validation on LLM outputs before using them for finetuning (e.g. standardizing country names, calculations).

      Am I missing something?

      • simonw10 小时前
        Yes, total lockin.

        Remember though, OpenAI and pretty much all of the other providers have things in their terms that roughly say you're not allowed to use their models to train models from other providers - so fine-tuning a model against synthetic data created using OpenAI is likely a terms violation anyway.

  • GavCo12 小时前
    From today's ChatGPT search announcement: "The search model is a fine-tuned version of GPT-4o, post-trained using novel synthetic data generation techniques, including distilling outputs from OpenAI o1-preview."
  • patelajay28512 小时前
    We've been working on a Python framework where one of the use cases is easy distillation from larger models to smaller open-source models and smaller-closed source models (where you don't have to still use / pay for the closed-source API service): https://datadreamer.dev/docs/latest/

    Here's an (now slightly outdated) example of OpenAI GPT-4 => OpenAI GPT-3.5: https://datadreamer.dev/docs/latest/pages/get_started/quick_...

    But you can also do GPT-4 to any model on HuggingFace. Or something like Llama-70B to Llama-1B.

    For some tasks, this kind of distillation works extremely well given even a few hundred examples of the larger model performing the task.

    • bangaladore11 小时前
      > OpenAI GPT-4 => OpenAI GPT-3.5

      I'm confused why you are mentioning 3.5 here. The weights aren't public, so you aren't actually running any derivative of GPT-3.5

      Or am I mistaken. Can you clarify?

      • Tiberium11 小时前
        > distillation from larger models to smaller open-source models and smaller-closed source models

        They don't limit it only to open-source models. And you can finetune 3.5 Turbo on OpenAI API.

  • Permik12 小时前
    Do note that this article was posted October 1, 2024, so this capability has been available for a month.
  • serjester10 小时前
    To anyone that thinks models are going to be commodified, it seems like it's going to be exceedingly difficult to compete with OpenAI. The developer experience working with them is just too good.

    Sure you could use a different providers, but you're going to be stuck with an incredibly fragmented ops stack. My experience with Google has been shockingly bad and Anthropic has got a good amount of catching up to do. No one else is remotely competitive. Honestly would love to see something from Meta long term.

  • janalsncm10 小时前
    Part of the point of distilling from their models is that I control the model, its availability, and its cost to me. So while this may be a convenient feature, unless I can download the weights it wouldn’t replace my workflow.

    This does raise the bar for any future startups though. If your plan was to distill GPT4 outputs and lease the weights to me through a REST API, I probably won’t be interested.

  • behnamoh12 小时前
    With each announcement, OpenAI kills yet another class of startups. I wonder if there are areas that OpenAI (and other AI companies) can't enter because those seem to be the only viable startup ideas in the long-term.

    Currently, OAI does all the following:

    - offers flagship models

    - offers lite models

    - offers easy finetuning of their models

    - offers structured output and guaranteed JSON output.

    - offers parallel tool/function calling which remains unmatched.

    - has low API costs.

    - offers a nice UI for their models

    - offers Mac, iOS, Android, Windows app clients.

    - offers image generation capabilities INTEGRATED with their language models.

    - offers two-tier subscription plans for ordinary/pro (team) users.

    - offers custom GPTs which can be used by ordinary people to create GPT experiences tailored to specific tasks (no need to build a website on your own).

    - allows users to easily share chats!! (it took Anthropic a long time to have this feature, and even now it's not as good as OpenAI's solution).

    - offers prompt caching and task scheduling to further save costs.

    - offers unrivaled voice-to-text models at different sizes (Whisper).

    - offers text-to-voice models that feel much more natural than the competition.

    - has outstanding documentation.

    - sets the standard for API (all other companies have to follow their conventions, such as `messages`, `.choices[0].message.content`, etc.)

    - has the most capable team to, idk, build AGI/ASI...

    • danenania11 小时前
      I wouldn't say this kills a class of startups. Being constrained to a single model provider is quite limiting for this kind of use case. What if you get better cost/performance results by distilling into an open source model? In a landscape that changes so rapidly, there's a lot of value in provider-agnostic tooling.
    • TZubiri11 小时前
      Pick an industry and OpenAI will never compete with you. They are going for the AGI dream, they want to be a company valued at 50T. They will never settle for doing any actual grunt work.
    • ben_w11 小时前
      > - offers unrivaled voice-to-text models at different sizes (Whisper).

      It may be (close to) the best, but Whisper is nowhere near good enough.

      So, if you're got a great idea for how to massively improve on that (I don't), there's a business opportunity there.

    • littlestymaar11 小时前
      That OpenAI will kill every startup that build a product on top of their technology isn't really a good signal for using OpenAI as a technological platform …
      • victorbjorklund11 小时前
        Lots of companies are just wrappers around an API. Can be compared to Apple "killing" startups that sold flashlight apps to the first Iphone
      • firejake30811 小时前
        Maybe it's better to build specific applications with OpenAI's products (e.g., using the API to automate a specific business process in your company) than to build a company offering a generic additional functionality like a UI or fine-tuning or custom prompts. Or at least, that seems like the target audience that OpenAI wants to sell their product to.
    • viraptor9 小时前
      I wouldn't put their API costs as low. There are a few decent competitors. They only recently caught up with Claude and for example for coding deepseek is much cheaper. Others also got the cheap prompt caching before OpenAI caught up. They may have good features, but pricing is still meh.
    • 11 小时前
      undefined
    • mentalically11 小时前
      [dead]