Back to NewsArtificial Intelligence

GPT 5.6 Launch, Government Curbs, and Jalapeno Chip

6/28/202617 min read

GPT 5.6 Launch, Government Curbs, and Jalapeno Chip


Introduction

OpenAI has launched GPT 5.6, but this is not a normal model release. The new model family is here, but only for a small group of trusted partners, with the US government directly involved in limiting initial access. This makes GPT 5.6 more than just OpenAI's next frontier lineup, it is one of the clearest signals yet that advanced AI models are increasingly being treated as strategic technology that governments want to review before the public gets full access.

At the same time, OpenAI unveiled its first custom AI chip, Jalapeno, built with Broadcom, marking a major step toward reducing dependence on NVIDIA hardware for inference workloads. Together, these two stories show an industry entering a new phase: one where both government oversight and hardware independence are becoming central strategic concerns.

The Unusual Launch: Government-Approved Access

According to Axios, the first launch of GPT 5.6 is limited to approximately 20 companies. Those partners were shared with the US government as part of the approval process. Broader access for ChatGPT, Codex, and API users is planned in the coming weeks, but this initial rollout is anything but business as usual.

OpenAI confirmed that the US government directly asked the company to limit access to the new model family, and OpenAI agreed. The company stated this was not simply a matter of server capacity. The government requested the restriction, and the company complied. This represents a significant shift in how frontier AI models are being managed.

The lineup includes three models. The flagship is Soul, positioned as the most capable reasoning model in the family. Terra is the balanced everyday model for general-purpose work. Luna is the faster, cheaper option for lower-cost workloads. Access begins through the API and Codex but remains tightly controlled during the preview period.

The Precedent Problem: From Anthropic to OpenAI

This launch comes directly on the heels of the Anthropic situation. The Trump administration had already moved against Anthropic's most powerful models, including Fable 5 and Mythos 5. The government reportedly pushed restrictions that would have removed access for foreign nationals, and Anthropic ended up taking Fable 5 down entirely. That triggered a broader debate about whether Washington is protecting national security or quietly creating an unofficial licensing system for frontier AI.

The concern many observers have raised is about precedent. Once the government proves it can step in, restrict access to a frontier model, and force companies to change their launch plans without giving the public a clear explanation, that becomes a precedent that can be applied again. What might look like a one-time safety decision can become the new normal, where every major model launch becomes a closed-door negotiation, and users, developers, and even companies may not know the actual rules until a model is already delayed, restricted, or removed.

Dean Ball, a former White House AI adviser who is reportedly joining OpenAI, has argued that Trump's executive order has created something like a de facto involuntary licensing regime. The order asks certain AI companies to voluntarily submit their most advanced models for government review up to 30 days before release. But once that process becomes expected, it may not feel voluntary anymore. Without clear safety standards, companies could face delays that hurt US competitiveness, slow product launches, and weaken the case for billions in AI infrastructure spending.

OpenAI is cooperating but has been clear that this should not become the long-term default. The company stated that this government access process should not persist because it keeps the best tools away from users, developers, enterprises, cyber defenders, and global partners. OpenAI characterized the limited preview as a short-term step toward broader availability while it works with the administration on a cyber executive order framework and a repeatable process for future releases.

GPT 5.6 Soul: Capabilities and Benchmarks

OpenAI says Soul is its strongest model in the family with major gains in agentic coding, biology workflows, and cybersecurity. These are precisely the areas where advanced AI can be both extremely useful and highly sensitive. A model that helps security teams find vulnerabilities is valuable, but a model that helps chain exploits together is dangerous. A model that speeds up biology research is valuable, but one that lowers the barrier for harmful biological or chemical work is dangerous.

Soul adds a new max reasoning effort, enabling deeper thinking for harder problems. It also introduces ultra mode, where the system can use coordinated sub-agents to solve complex tasks beyond what a single agent setup can handle. Instead of one model trying to do everything in a single thread, the system divides the work across multiple agents and coordinates the result. This is powerful for coding and technical work, though it can make token usage explode.

On benchmarks, OpenAI says Soul sets a new state-of-the-art on Terminal Bench 2.1 for terminal-based coding and agentic workflows. It also shows stronger SWE-Bench v1 results than GPT 5.5 while using fewer tokens. TechCrunch reported that Soul is slightly better than Anthropic's Claude Mythos 5 on coding workflows and competitive with the Mythos preview while using around a third of the output tokens. Lower token consumption for equivalent intelligence makes agents cheaper and easier to run at scale.

The Full Model Lineup: Soul, Terra, and Luna

ModelRoleTarget Use Case
SoulFlagship reasoning modelComplex reasoning, agentic coding, security, biology
TerraBalanced everyday modelGeneral-purpose tasks, production workloads
LunaFast, cost-efficient modelHigh-volume, lower-cost workloads

Terra serves as the balanced option for teams that need strong performance without the premium cost of Soul. Luna targets high-volume deployment scenarios where speed and cost efficiency matter more than maximum capability. Together, the three models cover the spectrum from low-latency, low-cost inference through to deep reasoning with multi-agent coordination.

Pricing and Token Economics

ModelInput (per 1M tokens)Output (per 1M tokens)
Soul$5.00$30.00
Terra$2.50$15.00
Luna$1.00$6.00

Prompt caching has also been updated for GPT 5.6. The new system includes explicit cache break points, a 30-minute minimum cache life, cache writes at 1.25 times the uncached input rate, and cache reads with a 90% cached input discount. These changes make repeated prompts significantly cheaper and more predictable, which matters for agents that reuse the same context across multiple interactions.

There is also a speed angle. OpenAI plans to launch GPT 5.6 Soul on Cerebras hardware in July for select customers with speeds up to 750 tokens per second. For coding agents, security tools, and long workflows involving multiple sub-agents, inference speed can become a real product advantage.

Safety Classification and Testing Scale

OpenAI says Soul, Terra, and Luna are classified as high capability in both cybersecurity and biological and chemical risk under its preparedness framework. However, they do not reach the high threshold for AI self-improvement risk, and Soul does not hit the cyber critical threshold.

In browser exploit tests involving Chromium and Firefox, Soul identified bugs and exploitation primitives, but OpenAI says it did not autonomously produce a full-chain exploit under the tested conditions. The company's main argument is that GPT 5.6 Soul is better at helping people find and fix vulnerabilities than reliably carrying out end-to-end attacks. OpenAI expects major benefits for legitimate defensive work while meaningfully constraining prohibited offensive use.

To support that claim, OpenAI describes multiple safety layers: model-level refusal behavior, real-time misuse classifiers for cyber and biology, account-level review, differentiated access, monitoring and enforcement, and ongoing testing. Some preview users may see blocked requests or slower responses when generation is paused for extra review, particularly in dual-use security situations where defensive and offensive work can look very similar at the outset.

OpenAI says the safety behavior is built directly into the core model rather than added as a separate filter. This appears to be a deliberate response to the criticism Anthropic faced with Fable 5. In the brief window when Fable 5 was available, if its classifiers detected high-risk areas like cybersecurity, biology, or chemistry, it would route the user to an older model instead of simply refusing the request. That invisible down-routing created false positives, made the product feel inconsistent, and caused backlash from users who felt they were not getting the model they were promised.

The testing scale is substantial. OpenAI reports using more than 700,000 A100-equivalent GPU hours for automated red teaming focused on universal jailbreaks, along with human expert testing and third-party evaluations. Testing will continue during the preview period, and OpenAI plans to publish an updated system card when the GPT 5.6 family moves closer to general availability.

The Government Framework and What Comes Next

OpenAI had apparently been previewing GPT 5.6 with the government for about a month, including a meeting CEO Sam Altman had with the White House in early June. OpenAI expected some form of staggered release, but according to Axios, it did not expect restrictions as severe as the government approving each customer and limiting the first launch to around 20 partners. The company says the government knows it wants to launch more broadly very soon and supports those plans unless new concerns surface during additional testing.

The next major date is August. Under the executive order, the administration is expected to establish a classified process to assess AI model cyber capabilities and decide which systems qualify as covered frontier models. This means GPT 5.6 is landing in a messy transition period where the government is clearly asserting more influence, but the actual rules are still not fully defined.

Jalapeno: OpenAI's First Custom AI Chip

Alongside the GPT 5.6 launch, OpenAI unveiled Jalapeno, its first custom AI chip built with Broadcom. The chip was announced on June 24 and marks OpenAI's first serious move into custom silicon. Crucially, this chip is not for training frontier models. It is designed for inference, the expensive work of running trained models every time someone uses ChatGPT, Codex, or the API.

This distinction matters because OpenAI's cost problem is not just about training the next giant model. It is about serving users after the model is trained. Every ChatGPT message, every Codex task, every long-running agent workflow requires inference. And as AI products become more agentic, inference becomes more expensive because the model is planning, calling tools, writing code, checking results, and revising.

Jalapeno is an ASIC, meaning it is designed for a specific workload rather than being a general-purpose GPU. OpenAI handled the core chip design. Broadcom contributed silicon manufacturing, connectivity technology including Tomahawk networking chips, and production expertise. Celestica will build the server systems.

DetailInformation
NameJalapeno
TypeASIC for LLM inference
PartnersBroadcom (design, manufacturing, networking), Celestica (server systems)
Cost savings vs GPUs~50% per Broadcom CEO
Development time9 months (design to tape-out)
StatusRunning workloads at production power levels in lab
Initial deploymentLate 2026 (small-scale prototype)
Production ramp2027
Full-scale deploymentFirst half of 2028
Training chip plansUnder consideration

How Jalapeno Changes OpenAI's Cost Structure

Broadcom CEO Hock Tan stated that early testing shows roughly 50% cost savings compared with standard AI GPUs for inference workloads. OpenAI confirms the chip is already running workloads in its labs at production target power levels, including GPT 5.3 and Codex Spark.

The reason OpenAI needed this is straightforward: cost and supply. OpenAI has relied heavily on NVIDIA GPUs for both training and inference. But as ChatGPT usage grows and Codex becomes more important, inference is consuming a growing share of the compute budget. Reuters reported earlier this year that OpenAI had become dissatisfied with how quickly NVIDIA hardware could process certain inference workloads, especially for Codex. Internal teams reportedly tied some Codex performance limits to GPU-based architecture limitations.

Jalapeno does not replace NVIDIA for training. NVIDIA is still expected to train OpenAI's frontier models. But Jalapeno attacks the side of the stack where OpenAI can get immediate savings: running models for users. If OpenAI can cut inference costs by approximately 50%, that changes what it can afford to offer across ChatGPT, Codex, and the API. Lower inference costs mean more generous free tiers, higher rate limits, and more economically viable agentic workflows.

The Custom Chip Landscape: Who Is Building What

OpenAI is not alone in pursuing custom silicon. Every major AI company is now investing in its own chips:

  • Google: TPUs, now in the seventh generation with Ironwood, used for much of Gemini and internal AI workloads
  • Amazon: Over 1 million Trainium processors deployed through AWS, reportedly exploring direct sales to outside customers
  • Microsoft: Launched the Maya 200 accelerator in early 2026 for GPT 5.2 models and Microsoft 365 Copilot on Azure
  • Meta: MTIA for internal workloads
  • ByteDance: Reportedly developing custom GPUs manufactured by TSMC with plans to scale to 350,000 units, partly because US export restrictions limit access to NVIDIA's most advanced hardware

The market is moving quickly. Industry data cited by Tom's Hardware says custom ASIC shipments are projected to grow 44.6% year-over-year in 2026 compared with 16.1% growth for standard GPUs. ASIC-based servers are expected to reach 27.8% of the AI server market this year.

The threat to NVIDIA is not that any single custom chip suddenly beats its best GPU on one benchmark. The threat is that the largest AI spenders are all building serious internal alternatives simultaneously. When Google, Amazon, Microsoft, Meta, and OpenAI all have their own inference silicon, NVIDIA's dominance in that segment faces a structural challenge.

AI-Assisted Chip Design and the Feedback Loop

Perhaps the most important detail about Jalapeno is how OpenAI says it was designed. The company used its own AI models to accelerate parts of the chip design and optimization process. Greg Brockman stated that the degree to which the models accelerated the timeline was very surprising and described cases where the models took components that human engineers had already optimized and found further improvements by applying massive compute to the design process.

This creates a powerful feedback loop: AI models help design better inference hardware. Better inference hardware makes AI models cheaper and faster to run. Cheaper models support more users, more agents, more revenue, and more investment for the next chip generation. If this loop keeps working, companies that control the full stack from models to software to hardware could gain a serious advantage over companies that build only one layer.

Deployment Timeline and Future Ambitions

Deployment is staged. Broadcom says small-scale prototype deployment is planned for late 2026. A significant production ramp is expected in 2027, with full-scale deployment targeted for the first half of 2028. The Decoder reported that Broadcom required Microsoft to guarantee purchases of 40% of the initial chip output as a condition for the first production phase.

OpenAI's longer-term plan is even more ambitious, with a multi-generation compute platform it wants to deploy at gigawatt scale with Microsoft and other partners through 2029, targeting 10 GW of custom compute. OpenAI also confirmed it is considering expanding custom chip work into training, not just inference. That would represent a much more direct challenge to NVIDIA's core business. For now, Jalapeno is mainly about lowering the cost of running OpenAI's models. But if future versions move into training, this becomes a full-stack independence play.

GPT 5.6 vs Competitors Comparison

CapabilityGPT 5.6 SoulClaude Mythos 5GPT 5.5Claude Opus 4.8
Terminal Bench 2.1State-of-the-artStrongGoodGood
SWE-Bench v1Stronger than GPT 5.5, fewer tokensCompetitiveBaselineStrong
Coding workflowsSlightly better than Mythos 5StrongGoodStrong
Token efficiency~1/3 output tokens vs MythosBaselineModerateModerate
Agentic coordinationUltra mode (multi-agent)StandardStandardStandard
Safety approachBuilt-in model-level refusalDown-routing to older modelStandardStandard
Government restrictionsYes (limited preview)Yes (full removal)NoNo

Weekly AI News Timeline

DateEvent
June 24Jalapeno chip unveiled by OpenAI
Late JuneGPT 5.6 family launched with government-restricted preview
Late JuneGPT 5.6 Soul, Terra, Luna pricing announced
JulyGPT 5.6 Soul on Cerebras (up to 750 tok/s)
Coming weeksBroader GPT 5.6 access planned
AugustGovernment classified process for assessing frontier models expected
Late 2026Jalapeno small-scale prototype deployment
2027Jalapeno production ramp
First half 2028Jalapeno full-scale deployment
Through 202910 GW custom compute platform target

Frequently Asked Questions

Why is GPT 5.6 only available to 20 companies? The US government directly asked OpenAI to limit initial access. The 20 launch partners were shared with the government as part of the approval process. OpenAI says broader access is planned in the coming weeks.

What are the three GPT 5.6 models? Soul is the flagship reasoning model with max reasoning effort and ultra multi-agent mode. Terra is the balanced everyday model. Luna is the fast, cost-efficient option for high-volume workloads.

How much does GPT 5.6 cost? Soul starts at $5 input / $30 output per 1M tokens. Terra is $2.50 / $15. Luna is $1 / $6. Prompt caching offers a 90% discount on cached input tokens.

What is the Jalapeno chip? OpenAI's first custom ASIC for LLM inference, built with Broadcom and Celestica. Broadcom's CEO reports roughly 50% cost savings versus standard AI GPUs for inference.

Does Jalapeno replace NVIDIA? Not for training. NVIDIA is still expected to train OpenAI's frontier models. Jalapeno targets inference workloads where OpenAI can get immediate cost savings. However, OpenAI is considering expanding into training chips in future generations.

Summary and Key Takeaways

  • GPT 5.6 launched with government restrictions limiting initial access to approximately 20 pre-approved partners, a first for a major frontier model release
  • Three models in the family: Soul (flagship, multi-agent ultra mode), Terra (balanced), and Luna (fast, cost-efficient)
  • Soul sets state-of-the-art on Terminal Bench 2.1 and outpaces GPT 5.5 on SWE-Bench v1 while using fewer tokens
  • Soul is slightly better than Claude Mythos 5 on coding workflows while using roughly one-third of the output tokens
  • Pricing ranges from $1/$6 per 1M tokens (Luna) to $5/$30 (Soul) with updated prompt caching offering 90% cached input discounts
  • OpenAI used over 700,000 A100-equivalent GPU hours for red teaming; safety features are built into the core model rather than added as external filters
  • The government framework remains undefined, with a classified process for assessing frontier models expected in August
  • Jalapeno is OpenAI's first custom AI chip, built with Broadcom, targeting roughly 50% inference cost savings versus NVIDIA GPUs
  • Small-scale deployment planned for late 2026, production ramp in 2027, full-scale in first half of 2028
  • AI models helped design the chip, creating a feedback loop where models optimize the hardware that runs them
  • OpenAI is considering expanding custom chip work into training, which would directly challenge NVIDIA's core business
Written by OutGrave Team