AI Models & Benchmarks

Source-backed frontier and open-weight model roster - verified June 15, 2026

	API / Focus		Modalities / Tools			Notes
Qwen3-235B-A22BAlibaba · Apr 2025	`Qwen3-235B-A22B`Open-source multilingual reasoning and coding235B total / 22B active MoE	128K tokensOutput: not listed	Text Tools Hybrid thinking	Not listed	Open source GA Apache-2.0	A strong Apache-2.0 model family anchor with hybrid thinking modes, 119-language coverage, and broad local deployment support.Qwen3 release blog
Claude Fable 5Anthropic · Jun 2026	`claude-fable-5`Demanding reasoning and long-horizon agentic workMost capable widely released Claude model	1M tokens Output: 128K tokens	Text Image Tools Adaptive	$10 in$50 outper 1M tokens	Proprietary GA	Anthropic's highest-capability generally available Claude tier, positioned for the most demanding reasoning and long-horizon agent tasks.Anthropic model docs
Claude Opus 4.8Anthropic · 2026	`claude-opus-4-8`Complex reasoning, agentic coding, high-autonomy workMost capable Opus-tier model	1M tokens (200K on Microsoft Foundry) Output: 128K tokens	Text Image Tools Adaptive	$5 in$25 outper 1M tokens	Proprietary GA	A lower-priced alternative to Fable for complex Claude workloads, with the same first-party 1M context on Claude API.Anthropic model docs
Claude Sonnet 4.6Anthropic · Feb 2026	`claude-sonnet-4-6`Balanced intelligence, latency, and priceFast frontier Claude model	1M tokens Output: 64K tokens	Text Image Tools Extended and adaptive	$3 in$15 outper 1M tokens	Proprietary GA	The strongest Claude price-performance row for teams that need near-frontier intelligence without Opus or Fable pricing.Anthropic model docs
DeepSeek-V4-ProDeepSeek · Apr 2026	`deepseek-v4-pro`Open-weight reasoning, STEM, coding, agents1.6T total / 49B active MoE	1M tokens Output: 384K tokens	Text Tools Thinking and non-thinking	$0.435 in$0.87 outper 1M tokensCache-miss input price shown	Open weights Preview Open weights	DeepSeek's open-weight V4 preview flagship, with 1M context, very large output budget, and strong agentic coding claims.DeepSeek API docs
DeepSeek-V4-FlashDeepSeek · Apr 2026	`deepseek-v4-flash`Low-cost open-weight reasoning284B total / 13B active MoE	1M tokens Output: 384K tokens	Text Tools Thinking and non-thinking	$0.14 in$0.28 outper 1M tokensCache-miss input price shown	Open weights Preview Open weights	A unusually inexpensive long-context model that keeps the V4 1M context and dual thinking modes.DeepSeek API docs
Gemini 3.1 Pro PreviewGoogle · Feb 2026	`gemini-3.1-pro-preview`Agentic workflows, coding, grounded multi-step executionAdvanced Gemini 3 Pro preview	1,048,576 tokens Output: 65.536K tokens	Text Image Video Audio PDF Tools Thinking	See Google AI pricing	Proprietary Preview	A broad multimodal Gemini preview with code execution, search grounding, URL context, structured output, and function calling.Google model docs
Gemini 3.5 FlashGoogle · May 2026	`gemini-3.5-flash`Low-latency frontier work at scaleStable high-throughput Gemini 3 model	1,048,576 tokens Output: 65.536K tokens	Text Image Video Audio PDF Tools Thinking	See Google AI pricing	Proprietary GA	Google's stable Gemini 3 Flash row emphasizes sustained frontier performance for agentic loops, coding cycles, and high-volume workflows.Google model docs
Llama 4 ScoutMeta · Apr 2025	`Llama-4-Scout`Long-context open-weight multimodal work17B active / 109B total MoE	10M tokens Output: not listed	Text Image No first-party tools General	Not listed	Open weights GA Llama 4 Community License	The standout open-weight long-context option, with native multimodality, single-H100 efficiency, and a 10M context window.Meta Llama 4 docs
Llama 4 MaverickMeta · Apr 2025	`Llama-4-Maverick`Open-weight multimodal assistant and chat use17B active / 400B total MoE	Deployment-dependentOutput: not listed	Text Image No first-party tools General	Not listed	Open weights GA Llama 4 Community License	Meta's higher-quality open-weight Llama 4 model, optimized for image and text understanding with efficient MoE inference.Meta Llama 4 blog
Mistral Medium 3.5Mistral · Apr 2026	`mistral-medium-3-5`Agentic and coding use casesFrontier-class multimodal model	256K tokensOutput: not listed	Text Image Tools General	$1.5 in$7.5 outper 1M tokens	Open weights GA Modified MIT	Mistral's current frontier-class open-weight model, positioned for agentic and coding workloads with a 256K context window.Mistral model card
Mistral Small 4Mistral · Mar 2026	`mistral-small-2603`Efficient instruct, reasoning, and coding119B total / 6.5B active MoE	256K tokensOutput: not listed	Text Image Tools Hybrid	$0.15 in$0.6 outper 1M tokens	Open weights GA Open weights	A strong low-cost open-weight Mistral row for high-volume use where price, latency, and 256K context matter.Mistral model card
Mistral Large 3Mistral · Dec 2025	`mistral-large-2512`Open-weight general-purpose multimodal work41B active / 675B total MoE	256K tokensOutput: not listed	Text Image Tools General	$0.5 in$1.5 outper 1M tokens	Open weights GA Open weights	Mistral's large open-weight MoE row, useful when a larger general-purpose model is needed but 1M context is not.Mistral model card
GPT-5.5OpenAI · 2026	`gpt-5.5`Complex reasoning, coding, professional workFlagship reasoning and coding model	1M tokens Output: 128K tokens	Text Image Tools Configurable	$5 in$30 outper 1M tokens	Proprietary GA	OpenAI's current flagship API model, with 1M context, 128K max output, and first-party tools for web search, file search, functions, and computer use.OpenAI model docs
GPT-5.4 miniOpenAI · 2026	`gpt-5.4-mini`Cost-sensitive coding and agent workflowsLower-latency GPT-5.4 variant	400K tokensOutput: 128K tokens	Text Image Tools Configurable	$0.75 in$4.5 outper 1M tokens	Proprietary GA	A practical lower-cost OpenAI option that keeps strong tool support, configurable reasoning, and a large output budget.OpenAI model docs
Grok 4.3xAI · May 2026	`grok-4.3`Agentic tool calling and fast general useGeneral Grok model	1M tokens Output: not listed	Text Image Tools Configurable	$1.25 in$2.5 outper 1M tokens	Proprietary GA	xAI positions Grok 4.3 as its default general-purpose model, with 1M context, low listed pricing, and agentic tool calling.xAI model docs
Grok Build 0.1xAI · May 2026	`grok-build-0.1`Agentic coding workflowsCoding model beta	256K tokensOutput: not listed	Text Tools Coding-specialized	$1 in$2 outper 1M tokens	Proprietary Preview	A dedicated fast coding model for agentic development loops, with a 256K context window and public beta API availability.xAI model docs

Showing 17 of 17 models

Key Takeaways

Frontier Models Are Agent Platforms

The leading proprietary rows now bundle reasoning controls, long context, multimodal input, and tool execution rather than only raw chat completion.
OpenAI GPT-5.5, Claude Fable 5, Claude Opus 4.8, Gemini 3.1 Pro, Gemini 3.5 Flash, and Grok 4.3 all need to be compared as workflow platforms, not just language models.

Context And Output Matter

1M-token context is now common among current proprietary frontier models and DeepSeek V4.
Output limits differ sharply. Claude Fable 5, Claude Opus 4.8, and GPT-5.5 list 128K max output, while DeepSeek V4 lists a much larger 384K maximum output budget.

Open Models Need Better Labels

The table now separates proprietary, open-weight, and open-source access because those terms carry different commercial and redistribution rights.
Mistral, DeepSeek, Meta, and Qwen remain strategically important, but license terms and deployment limits should be checked row by row.

Monthly Updates Need Guardrails

Provider model APIs can list IDs, but they usually do not provide licensing, benchmark context, modality nuance, or procurement-ready caveats.
A useful monthly system should audit freshness, require sources, and keep human review in the loop for claims that vendors publish as marketing copy.