Why use a small model like gpt-4o-mini instead of the biggest one available?
If bigger models are smarter, why would you deliberately pick a small one for a feature people actually use? Isn't that cutting corners?
Leave a Comment
It looks like cutting corners until you run the numbers. For a narrow job — turning a question into one SQL query against a known set of tables — a small model does the job at a fraction of the cost and latency. The task doesn't need a model that can also write poetry and debug Rust. My rule of thumb from doing this on a real budget: match the model to the task, not to the marketing. The expensive model earns its price on open-ended reasoning. Text-to-SQL over a fixed schema is the opposite of open-ended — the schema constrains it heavily. I'd rather spend that budget difference on more data pipelines. If I ever see the small model failing on real user questions, I'll revisit. So far the failure mode is mostly ambiguous questions, and a bigger model wouldn't fix ambiguity anyway.