AI Models & Benchmarks

Source-backed frontier and open-weight model roster - verified June 15, 2026

API / FocusModalities / ToolsNotes
Qwen3-235B-A22BAlibaba · Apr 2025
Qwen3-235B-A22BOpen-source multilingual reasoning and coding235B total / 22B active MoE
128K tokensOutput: not listed
Text
Tools
Hybrid thinking
Not listed
Open source
GA
Apache-2.0
A strong Apache-2.0 model family anchor with hybrid thinking modes, 119-language coverage, and broad local deployment support.Qwen3 release blog
Claude Fable 5Anthropic · Jun 2026
claude-fable-5Demanding reasoning and long-horizon agentic workMost capable widely released Claude model
1M tokens
Output: 128K tokens
Text
Image
Tools
Adaptive
$10 in$50 outper 1M tokens
Proprietary
GA
Anthropic's highest-capability generally available Claude tier, positioned for the most demanding reasoning and long-horizon agent tasks.Anthropic model docs
Claude Opus 4.8Anthropic · 2026
claude-opus-4-8Complex reasoning, agentic coding, high-autonomy workMost capable Opus-tier model
1M tokens (200K on Microsoft Foundry)
Output: 128K tokens
Text
Image
Tools
Adaptive
$5 in$25 outper 1M tokens
Proprietary
GA
A lower-priced alternative to Fable for complex Claude workloads, with the same first-party 1M context on Claude API.Anthropic model docs
Claude Sonnet 4.6Anthropic · Feb 2026
claude-sonnet-4-6Balanced intelligence, latency, and priceFast frontier Claude model
1M tokens
Output: 64K tokens
Text
Image
Tools
Extended and adaptive
$3 in$15 outper 1M tokens
Proprietary
GA
The strongest Claude price-performance row for teams that need near-frontier intelligence without Opus or Fable pricing.Anthropic model docs
DeepSeek-V4-ProDeepSeek · Apr 2026
deepseek-v4-proOpen-weight reasoning, STEM, coding, agents1.6T total / 49B active MoE
1M tokens
Output: 384K tokens
Text
Tools
Thinking and non-thinking
$0.435 in$0.87 outper 1M tokensCache-miss input price shown
Open weights
Preview
Open weights
DeepSeek's open-weight V4 preview flagship, with 1M context, very large output budget, and strong agentic coding claims.DeepSeek API docs
DeepSeek-V4-FlashDeepSeek · Apr 2026
deepseek-v4-flashLow-cost open-weight reasoning284B total / 13B active MoE
1M tokens
Output: 384K tokens
Text
Tools
Thinking and non-thinking
$0.14 in$0.28 outper 1M tokensCache-miss input price shown
Open weights
Preview
Open weights
A unusually inexpensive long-context model that keeps the V4 1M context and dual thinking modes.DeepSeek API docs
Gemini 3.1 Pro PreviewGoogle · Feb 2026
gemini-3.1-pro-previewAgentic workflows, coding, grounded multi-step executionAdvanced Gemini 3 Pro preview
1,048,576 tokens
Output: 65.536K tokens
Text
Image
Video
Audio
PDF
Tools
Thinking
See Google AI pricing
Proprietary
Preview
A broad multimodal Gemini preview with code execution, search grounding, URL context, structured output, and function calling.Google model docs
Gemini 3.5 FlashGoogle · May 2026
gemini-3.5-flashLow-latency frontier work at scaleStable high-throughput Gemini 3 model
1,048,576 tokens
Output: 65.536K tokens
Text
Image
Video
Audio
PDF
Tools
Thinking
See Google AI pricing
Proprietary
GA
Google's stable Gemini 3 Flash row emphasizes sustained frontier performance for agentic loops, coding cycles, and high-volume workflows.Google model docs
Llama 4 ScoutMeta · Apr 2025
Llama-4-ScoutLong-context open-weight multimodal work17B active / 109B total MoE
10M tokens
Output: not listed
Text
Image
No first-party tools
General
Not listed
Open weights
GA
Llama 4 Community License
The standout open-weight long-context option, with native multimodality, single-H100 efficiency, and a 10M context window.Meta Llama 4 docs
Llama 4 MaverickMeta · Apr 2025
Llama-4-MaverickOpen-weight multimodal assistant and chat use17B active / 400B total MoE
Deployment-dependentOutput: not listed
Text
Image
No first-party tools
General
Not listed
Open weights
GA
Llama 4 Community License
Meta's higher-quality open-weight Llama 4 model, optimized for image and text understanding with efficient MoE inference.Meta Llama 4 blog
Mistral Medium 3.5Mistral · Apr 2026
mistral-medium-3-5Agentic and coding use casesFrontier-class multimodal model
256K tokensOutput: not listed
Text
Image
Tools
General
$1.5 in$7.5 outper 1M tokens
Open weights
GA
Modified MIT
Mistral's current frontier-class open-weight model, positioned for agentic and coding workloads with a 256K context window.Mistral model card
Mistral Small 4Mistral · Mar 2026
mistral-small-2603Efficient instruct, reasoning, and coding119B total / 6.5B active MoE
256K tokensOutput: not listed
Text
Image
Tools
Hybrid
$0.15 in$0.6 outper 1M tokens
Open weights
GA
Open weights
A strong low-cost open-weight Mistral row for high-volume use where price, latency, and 256K context matter.Mistral model card
Mistral Large 3Mistral · Dec 2025
mistral-large-2512Open-weight general-purpose multimodal work41B active / 675B total MoE
256K tokensOutput: not listed
Text
Image
Tools
General
$0.5 in$1.5 outper 1M tokens
Open weights
GA
Open weights
Mistral's large open-weight MoE row, useful when a larger general-purpose model is needed but 1M context is not.Mistral model card
GPT-5.5OpenAI · 2026
gpt-5.5Complex reasoning, coding, professional workFlagship reasoning and coding model
1M tokens
Output: 128K tokens
Text
Image
Tools
Configurable
$5 in$30 outper 1M tokens
Proprietary
GA
OpenAI's current flagship API model, with 1M context, 128K max output, and first-party tools for web search, file search, functions, and computer use.OpenAI model docs
GPT-5.4 miniOpenAI · 2026
gpt-5.4-miniCost-sensitive coding and agent workflowsLower-latency GPT-5.4 variant
400K tokensOutput: 128K tokens
Text
Image
Tools
Configurable
$0.75 in$4.5 outper 1M tokens
Proprietary
GA
A practical lower-cost OpenAI option that keeps strong tool support, configurable reasoning, and a large output budget.OpenAI model docs
Grok 4.3xAI · May 2026
grok-4.3Agentic tool calling and fast general useGeneral Grok model
1M tokens
Output: not listed
Text
Image
Tools
Configurable
$1.25 in$2.5 outper 1M tokens
Proprietary
GA
xAI positions Grok 4.3 as its default general-purpose model, with 1M context, low listed pricing, and agentic tool calling.xAI model docs
Grok Build 0.1xAI · May 2026
grok-build-0.1Agentic coding workflowsCoding model beta
256K tokensOutput: not listed
Text
Tools
Coding-specialized
$1 in$2 outper 1M tokens
Proprietary
Preview
A dedicated fast coding model for agentic development loops, with a 256K context window and public beta API availability.xAI model docs
Showing 17 of 17 models

Key Takeaways

Frontier Models Are Agent Platforms
  • The leading proprietary rows now bundle reasoning controls, long context, multimodal input, and tool execution rather than only raw chat completion.
  • OpenAI GPT-5.5, Claude Fable 5, Claude Opus 4.8, Gemini 3.1 Pro, Gemini 3.5 Flash, and Grok 4.3 all need to be compared as workflow platforms, not just language models.
Context And Output Matter
  • 1M-token context is now common among current proprietary frontier models and DeepSeek V4.
  • Output limits differ sharply. Claude Fable 5, Claude Opus 4.8, and GPT-5.5 list 128K max output, while DeepSeek V4 lists a much larger 384K maximum output budget.
Open Models Need Better Labels
  • The table now separates proprietary, open-weight, and open-source access because those terms carry different commercial and redistribution rights.
  • Mistral, DeepSeek, Meta, and Qwen remain strategically important, but license terms and deployment limits should be checked row by row.
Monthly Updates Need Guardrails
  • Provider model APIs can list IDs, but they usually do not provide licensing, benchmark context, modality nuance, or procurement-ready caveats.
  • A useful monthly system should audit freshness, require sources, and keep human review in the loop for claims that vendors publish as marketing copy.

* API / Focus shows the callable model ID when the provider publishes one, plus the primary workload the model is positioned for.

Context / Output uses provider-published limits. Some cloud platforms enforce smaller limits than first-party APIs.

Price is list API price per 1M tokens when published. Missing prices should be checked against the linked source before procurement decisions.

§ Open weights does not always mean OSI open source. Check each license before commercial use, redistribution, fine-tuning, or derivative model release.