[Preview] v1.83.7.rc.1 - Per-User MCP OAuth, Team Spend Logs RBAC
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.83.7.rc.1
pip install litellm==1.83.7
warning
Breaking change — Prometheus latency histogram buckets reduced. The default LATENCY_BUCKETS set has been reduced from 35 to 18 boundaries to lower Prometheus cardinality. Dashboards and PromQL queries that reference specific le= bucket values may stop matching. Review your alerts/dashboards before upgrading and use LATENCY_BUCKETS env override to restore the previous boundaries if needed — PR #25527.
Key Highlights​
- Per-User MCP OAuth Tokens — Each end-user can now hold their own OAuth tokens for interactive MCP server flows, isolating credentials across users
- Team Spend Logs RBAC — Teams with the
/spend/logspermission can view team-wide spend logs from the UI and API - Bulk Team Permissions API — New
POST /team/permissions_bulk_updateendpoint for updating member permissions across many teams in one call - Azure Container Routing — Container routing, managed container IDs, and delete-response parsing for Azure Responses API containers
- UI E2E Test Suite — Playwright-based end-to-end tests for proxy admin, team, and key management flows now run in CI
New Models / Updated Models​
New Model Support (14 new models)​
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|---|---|---|---|---|---|
| AWS Bedrock (GovCloud) | bedrock/us-gov-east-1/anthropic.claude-sonnet-4-5-20250929-v1:0 | 200K | $3.30 | $16.50 | Chat, vision, tool use, prompt caching, reasoning |
| AWS Bedrock (GovCloud) | bedrock/us-gov-west-1/anthropic.claude-sonnet-4-5-20250929-v1:0 | 200K | $3.30 | $16.50 | Chat, vision, tool use, prompt caching, reasoning |
| AWS Bedrock (GovCloud) | us-gov.anthropic.claude-sonnet-4-5-20250929-v1:0 | 200K | $3.30 | $16.50 | Bedrock Converse, with above-200K tier pricing |
| Baseten | baseten/MiniMaxAI/MiniMax-M2.5 | - | $0.30 | $1.20 | Chat |
| Baseten | baseten/nvidia/Nemotron-120B-A12B | - | $0.30 | $0.75 | Chat |
| Baseten | baseten/zai-org/GLM-5 | - | $0.95 | $3.15 | Chat |
| Baseten | baseten/zai-org/GLM-4.7 | - | $0.60 | $2.20 | Chat |
| Baseten | baseten/zai-org/GLM-4.6 | - | $0.60 | $2.20 | Chat |
| Baseten | baseten/moonshotai/Kimi-K2.5 | - | $0.60 | $3.00 | Chat |
| Baseten | baseten/moonshotai/Kimi-K2-Thinking | - | $0.60 | $2.50 | Chat |
| Baseten | baseten/moonshotai/Kimi-K2-Instruct-0905 | - | $0.60 | $2.50 | Chat |
| Baseten | baseten/openai/gpt-oss-120b | - | $0.10 | $0.50 | Chat |
| Baseten | baseten/deepseek-ai/DeepSeek-V3.1 | - | $0.50 | $1.50 | Chat |
| Baseten | baseten/deepseek-ai/DeepSeek-V3-0324 | - | $0.77 | $0.77 | Chat |
Features​
- AWS Bedrock
- AWS GovCloud mode support (
us-govprefix routing) - PR #25254 - Update GovCloud Claude Sonnet 4.5 pricing, raise
max_tokensto 8192, and add prompt-caching costs - Skip dummy
usercontinue message when assistant prefix prefill is set - PR #25419 - Avoid double-counting cache tokens in Anthropic Messages streaming usage - PR #25517
- AWS GovCloud mode support (
- Anthropic
- Support
advisor_20260301tool type - PR #25525
- Support
- Triton
- Embedding usage estimation for self-hosted Triton responses - PR #25345
- Baseten
- Add pricing entries for 11 new Baseten-hosted models - PR #25358
- Google Gemini / Vertex AI
- Mark applicable Gemini 2.5/3 models with
supports_service_tier
- Mark applicable Gemini 2.5/3 models with
Bug Fixes​
- AWS Bedrock
- Pass-through fix for Bedrock JSON body and multipart uploads - PR #25464
- OpenAI
- Mock headers in
test_completion_fine_tuned_modelto stabilize tests - PR #25444
- Mock headers in
LLM API Endpoints​
Features​
- Responses API
- OpenAI / Files API
- Add file content streaming support for OpenAI and related utilities - PR #25450
- A2A
- Default 60-second timeout when creating an A2A client - PR #25514
Bugs​
- Responses API
- Router
- General
- Ensure spend/cost logging runs when
stream=Truefor web-search interception - PR #25424
- Ensure spend/cost logging runs when
Management Endpoints / UI​
Features​
- Teams + Organizations
- Virtual Keys
- Align
/v2/key/inforesponse handling with v1 - PR #25313
- Align
- Authentication / Routing
- Provider Credentials
- Per-team / per-project credential overrides via
model_configmetadata - PR #24438
- Per-team / per-project credential overrides via
- UI
Bugs​
- Improve input validation on management endpoints - PR #25445
- Harden file path resolution in skill archive extraction - PR #25475
AI Integrations​
Logging​
- Ramp
- Add Ramp as a built-in success callback - PR #23769
- Langfuse
- Preserve proxy key-auth metadata on
/v1/messagesLangfuse traces - PR #25448
- Preserve proxy key-auth metadata on
- Prometheus
- Reduce default
LATENCY_BUCKETSfrom 35 → 18 boundaries (see breaking-change note above) - PR #25527
- Reduce default
- General
- S3 logging: retry with exponential backoff for transient 503/500 errors - PR #25530
Guardrails​
- Optional skip system message in unified guardrail inputs - PR #25481
- Inline IAM: apply guardrail support - PR #25241
- Preserve
dictHTTPException.detailand Bedrock context in guardrail errors - PR #25558
Spend Tracking, Budgets and Rate Limiting​
- Session-TZ-independent date filtering for spend / error log queries - PR #25542
- Batch-limit stale managed-object cleanup to prevent 300K+ row updates - PR #25258
MCP Gateway​
- Per-user OAuth token storage for interactive MCP flows - PR #25441
- Block arbitrary command execution via MCP
stdiotransport - PR #25343 - Document missing MCP per-user token environment variables in
config_settings- PR #25471
Performance / Loadbalancing / Reliability improvements​
- Reduce Prometheus latency histogram cardinality (default buckets 35 → 18) - PR #25527
- S3 retry with exponential backoff for transient errors - PR #25530
Documentation Updates​
- Add Docker Image Security Guide covering cosign verification and deployment best practices - PR #25439
- Document April townhall announcements - PR #25537
- Document missing MCP per-user token env vars - PR #25471
- Add "Screenshots / Proof of Fix" section to PR template - PR #25564
Infrastructure / Security Notes​
- Pin cosign.pub verification to initial commit hash - PR #25273
- Fix node-gyp symlink path after npm upgrade in Dockerfile - PR #25048
Dockerfile.non_root: handle missing.npmrcgracefully - PR #25307- Add Playwright E2E tests with local PostgreSQL - PR #25126
- UI E2E tests for proxy admin team and key management - PR #25365
- Migrate Redis caching tests from GHA to CircleCI - PR #25354
- Update
check_responses_costtests for_expire_stale_rows- PR #25299 - Raise global vitest timeout and remove per-test overrides - PR #25468
- Version bumps and UI rebuilds: PR #25316, PR #25528, PR #25578, PR #25571, PR #25573, PR #25577
New Contributors​
- @kedarthakkar made their first contribution in https://github.com/BerriAI/litellm/pull/23769
- @csoni-cweave made their first contribution in https://github.com/BerriAI/litellm/pull/25441
- @jimmychen-p72 made their first contribution in https://github.com/BerriAI/litellm/pull/25530
Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.3.rc.1...v1.83.7.rc.1