Skip to main content

[Preview] v1.83.7.rc.1 - Per-User MCP OAuth, Team Spend Logs RBAC

Deploy this version​

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.83.7.rc.1
warning

Breaking change — Prometheus latency histogram buckets reduced. The default LATENCY_BUCKETS set has been reduced from 35 to 18 boundaries to lower Prometheus cardinality. Dashboards and PromQL queries that reference specific le= bucket values may stop matching. Review your alerts/dashboards before upgrading and use LATENCY_BUCKETS env override to restore the previous boundaries if needed — PR #25527.

Key Highlights​

  • Per-User MCP OAuth Tokens — Each end-user can now hold their own OAuth tokens for interactive MCP server flows, isolating credentials across users
  • Team Spend Logs RBAC — Teams with the /spend/logs permission can view team-wide spend logs from the UI and API
  • Bulk Team Permissions API — New POST /team/permissions_bulk_update endpoint for updating member permissions across many teams in one call
  • Azure Container Routing — Container routing, managed container IDs, and delete-response parsing for Azure Responses API containers
  • UI E2E Test Suite — Playwright-based end-to-end tests for proxy admin, team, and key management flows now run in CI

New Models / Updated Models​

New Model Support (14 new models)​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
AWS Bedrock (GovCloud)bedrock/us-gov-east-1/anthropic.claude-sonnet-4-5-20250929-v1:0200K$3.30$16.50Chat, vision, tool use, prompt caching, reasoning
AWS Bedrock (GovCloud)bedrock/us-gov-west-1/anthropic.claude-sonnet-4-5-20250929-v1:0200K$3.30$16.50Chat, vision, tool use, prompt caching, reasoning
AWS Bedrock (GovCloud)us-gov.anthropic.claude-sonnet-4-5-20250929-v1:0200K$3.30$16.50Bedrock Converse, with above-200K tier pricing
Basetenbaseten/MiniMaxAI/MiniMax-M2.5-$0.30$1.20Chat
Basetenbaseten/nvidia/Nemotron-120B-A12B-$0.30$0.75Chat
Basetenbaseten/zai-org/GLM-5-$0.95$3.15Chat
Basetenbaseten/zai-org/GLM-4.7-$0.60$2.20Chat
Basetenbaseten/zai-org/GLM-4.6-$0.60$2.20Chat
Basetenbaseten/moonshotai/Kimi-K2.5-$0.60$3.00Chat
Basetenbaseten/moonshotai/Kimi-K2-Thinking-$0.60$2.50Chat
Basetenbaseten/moonshotai/Kimi-K2-Instruct-0905-$0.60$2.50Chat
Basetenbaseten/openai/gpt-oss-120b-$0.10$0.50Chat
Basetenbaseten/deepseek-ai/DeepSeek-V3.1-$0.50$1.50Chat
Basetenbaseten/deepseek-ai/DeepSeek-V3-0324-$0.77$0.77Chat

Features​

  • AWS Bedrock
    • AWS GovCloud mode support (us-gov prefix routing) - PR #25254
    • Update GovCloud Claude Sonnet 4.5 pricing, raise max_tokens to 8192, and add prompt-caching costs
    • Skip dummy user continue message when assistant prefix prefill is set - PR #25419
    • Avoid double-counting cache tokens in Anthropic Messages streaming usage - PR #25517
  • Anthropic
    • Support advisor_20260301 tool type - PR #25525
  • Triton
    • Embedding usage estimation for self-hosted Triton responses - PR #25345
  • Baseten
    • Add pricing entries for 11 new Baseten-hosted models - PR #25358
  • Google Gemini / Vertex AI
    • Mark applicable Gemini 2.5/3 models with supports_service_tier

Bug Fixes​

LLM API Endpoints​

Features​

  • Responses API
    • Containers: Azure routing, managed container IDs, and delete-response parsing - PR #25287
    • WebSocket: append ?model= to backend WebSocket URL so model selection routes correctly - PR #25437
  • OpenAI / Files API
    • Add file content streaming support for OpenAI and related utilities - PR #25450
  • A2A
    • Default 60-second timeout when creating an A2A client - PR #25514

Bugs​

  • Responses API
    • Map refusal stop_reason to incomplete status in streaming - PR #25498
    • Fix duplicate keyword argument error in Responses WebSocket path - PR #25513
  • Router
    • Pass custom_llm_provider to get_llm_provider for unprefixed model names - PR #25334
    • Fix tag-based routing when encrypted_content_affinity is enabled - PR #25347
  • General
    • Ensure spend/cost logging runs when stream=True for web-search interception - PR #25424

Management Endpoints / UI​

Features​

  • Teams + Organizations
    • New POST /team/permissions_bulk_update endpoint for bulk permission updates across teams - PR #25239
    • Team member permission /spend/logs to view team-wide spend logs (UI + RBAC) - PR #25458
    • Align org and team endpoint permission checks - PR #25554
  • Virtual Keys
    • Align /v2/key/info response handling with v1 - PR #25313
  • Authentication / Routing
    • Allow JWT to override OAuth2 routing without requiring global OAuth2 enablement - PR #25252
    • Consolidate route auth for UI and API tokens - PR #25473
    • Use parameterized query for combined_view token lookup - PR #25467
  • Provider Credentials
    • Per-team / per-project credential overrides via model_config metadata - PR #24438
  • UI
    • Improve browser storage handling and Dockerfile consistency - PR #25384
    • Align v1 guardrail and agent list responses with v2 field handling - PR #25478
    • Flush Tremor Tooltip timers in user_edit_view tests - PR #25480

Bugs​

  • Improve input validation on management endpoints - PR #25445
  • Harden file path resolution in skill archive extraction - PR #25475

AI Integrations​

Logging​

  • Ramp
    • Add Ramp as a built-in success callback - PR #23769
  • Langfuse
    • Preserve proxy key-auth metadata on /v1/messages Langfuse traces - PR #25448
  • Prometheus
    • Reduce default LATENCY_BUCKETS from 35 → 18 boundaries (see breaking-change note above) - PR #25527
  • General
    • S3 logging: retry with exponential backoff for transient 503/500 errors - PR #25530

Guardrails​

  • Optional skip system message in unified guardrail inputs - PR #25481
  • Inline IAM: apply guardrail support - PR #25241
  • Preserve dict HTTPException.detail and Bedrock context in guardrail errors - PR #25558

Spend Tracking, Budgets and Rate Limiting​

  • Session-TZ-independent date filtering for spend / error log queries - PR #25542
  • Batch-limit stale managed-object cleanup to prevent 300K+ row updates - PR #25258

MCP Gateway​

  • Per-user OAuth token storage for interactive MCP flows - PR #25441
  • Block arbitrary command execution via MCP stdio transport - PR #25343
  • Document missing MCP per-user token environment variables in config_settings - PR #25471

Performance / Loadbalancing / Reliability improvements​

  • Reduce Prometheus latency histogram cardinality (default buckets 35 → 18) - PR #25527
  • S3 retry with exponential backoff for transient errors - PR #25530

Documentation Updates​

  • Add Docker Image Security Guide covering cosign verification and deployment best practices - PR #25439
  • Document April townhall announcements - PR #25537
  • Document missing MCP per-user token env vars - PR #25471
  • Add "Screenshots / Proof of Fix" section to PR template - PR #25564

Infrastructure / Security Notes​

New Contributors​

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.3.rc.1...v1.83.7.rc.1