This post is about using LiteLLM so that you can have more freedom and flexibility in the LLMs you use day-to-day. But let’s start by covering the “why bother”.

You might say “just use good steel”

There’s a joke that circulates the internet about Japanese sword making vs western sword making. It goes that, back in the day, Japanese sword makers developed elaborate folding techniques to turn poor quality steel into beautiful blades with unbelievable cutting power. You may have seen the movies where the perfectly made sword cuts through its opponents’ weapons easily.

People in the west, the joke goes on, just used good steel.

Historical inaccuracies aside, a person could argue this is basically where we are with LLMs right now.

You could spend a bunch of time learning how to combine various grades of model to maximise their combined performance for minimum cost. It is possible that someone with unlimited credits on a frontier model may still be able to cut you in half.

The thing is – most of us don’t have unlimited credits for frontier models. We don’t have the “good steel”, we are just renting it at a price we don’t set.

So it’s hard to imagine that any of us will regret having a better understanding of our options.

This post is for you if…

All of that said, I have no real interest in demonising the well-known companies who are producing truly impressive mainstream tools. Things like Claude Code and Codex are great.

I have some examples of the “why now” for this post linked below, but we all know that everything is moving too fast for individual news stories to stay relevant for long. And that’s kind of my point – why would we assume the tools we have now will remain the ones we want to use forever?

This post isn’t designed to turn your worldview upside down. It’s meant to highlight some options to consider if you’ve been thinking that you want some more LLM independence. So instead of telling you why you simply must pay attention, here’s an option to self-select. I think you would find this post valuable if any of the below apply to you:

Cost consciousness
- You have heard about impressive new models which perform better than we would have dreamed a year ago, while being 50x cheaper than the mainstream options.
- You are trying to limit (or reduce) your ever-growing AI agent bill
- In a similar vein: You are sick of hitting token limits but don’t want to start paying hundreds a month
- You want a way to monitor and set your own spend controls flexibly across multiple services at once (to the extent of setting refresh limits at day or month level, globally, for groups or agents, or individual callers)
- You want to give some tools lots of headroom (i.e. trusted systems, doing important work, that should be spending money to get it right) but you also want to try new things without them spending all your money/ derailing the important stuff
- You have heard stories about cloud providers generating huge bills through lack of hard cost controls, or even just not firing alerts on LLM costs and you want another layer of protection.
Provider safety
- You have read articles about organisational risks of tying your development stack and products to external APIs, and the benefits of being able to fall back to other providers.
- You want options in case the big providers decide to remove key features from the plans you subscribe to (or in case they decide to remove your favourite model altogether)
Quality
- You have seen that it requires impressively detailed analysis to prove that mainstream models have got worse and you want some ongoing quality guarantees
- You want to combine models from different providers to make the most of their strengths
- You want to save the most powerful models for the most complex stuff so you want smaller, simpler models to do smaller tasks more quickly and cheaply
- You want to be able to switch out the models you’re using on-the-fly without having to push any new code to your repos (maybe you’re using something like OpenClaw and want to be able to switch quickly)
Learning speed
- You have listened to talks from companies like Anthropic which emphasise how important it is for you to be able to learn and iterate quickly
- You don’t even have alternative models in mind but you want to be able to experiment/ learn about the landscape first-hand, perhaps to differentiate your skillset from the legions already working with mainstream models
Security and Observability
- You want a layer of separation between some of your API keys and repos where you’ll be installing new libraries, given the increase in supply chain attacks focused on stealing keys
- You don’t like LLMs having direct access to your API keys because no matter how many times you say “pretty please don’t commit this” they might still do it
- You want to start logging API health or even input and output messages.

It doesn’t matter if only some of those bullet points apply to you, but they all apply to me. Essentially, I want to keep using the best-in-class tools out there, but I want the freedom to decide what tools are doing different chunks of work. I also want to be able to do that with confidence that my experiments won’t run out of control.

I am willing to invest a bit of time into this regardless of whether it reduces my costs, reduces my risk of relying on a few suppliers, lets me systematically choose the right model for the task, or just helps me understand what’s out there.

Contents hide

1 You might say “just use good steel”

2 This post is for you if…

3 I don’t care about the pitch, give me the how-to

4 So what’s the pitch?

5 How does LiteLLM help?

6 Things that you control

6.1 What models are used for different work

6.2 Cost

6.2.1 (Costs aren’t perfect)

6.3 What models or MCPs different agents have access to

6.4 Security posture

6.5 Hosting

6.6 Prompts (experimental)

7 Things that are handled for you