GPT-5 scores 97th percentile on graduate-level maths and coding benchmarks

Model can autonomously browse the web, write and execute code, and manage files

EU AI Office invokes Article 55 emergency review; deployment may be paused in Europe

OpenAI CEO Sam Altman testifies before Senate Commerce Committee next week

Google DeepMind confirms Gemini Ultra 2 entering final safety evaluations

Technology4 sources analysed

OpenAI GPT-5 Launch Sparks Regulatory Alarm and Industry Arms Race

OpenAI unveiled GPT-5 on Tuesday, claiming the model achieves "near-human reasoning" on standardised benchmarks and can autonomously complete multi-step coding and research tasks. The announcement triggered emergency sessions at the EU AI Office and renewed calls in the US Congress for binding safety evaluations before frontier model deployments. Competitors Google DeepMind and Anthropic both said they were accelerating their own next-generation releases.

Published April 15, 2026

Key Facts

GPT-5 scores 97th percentile on graduate-level maths and coding benchmarks
Model can autonomously browse the web, write and execute code, and manage files
EU AI Office invokes Article 55 emergency review; deployment may be paused in Europe
OpenAI CEO Sam Altman testifies before Senate Commerce Committee next week
Google DeepMind confirms Gemini Ultra 2 entering final safety evaluations

Source Coverage

The New York TimesConcerned

Measured assessment of GPT-5 capabilities alongside deep concerns about deployment speed.

The Times ran a lengthy feature combining benchmark analysis with interviews from AI safety researchers at MIT and Stanford who expressed alarm at the pace of capability gains. The piece noted that GPT-5's ability to autonomously execute multi-step computer tasks represents a qualitative leap beyond previous models, and that OpenAI's safety evaluation period was only 90 days — shorter than that of GPT-4.

BloombergNeutral

Financial angle: GPT-5 as a catalyst for a new wave of enterprise AI spending.

Bloomberg's technology team calculated that GPT-5's autonomous task completion could automate roughly 12% of white-collar knowledge-work tasks currently billed at professional services rates, representing a $340 billion annual market opportunity. The piece quoted CIOs at three Fortune 500 companies who said they were fast-tracking API integration pilots.

BBCConcerned

Public-interest focus on how GPT-5 will affect jobs and education in the UK.

BBC News interviewed teachers, legal clerks and junior doctors who feared displacement, alongside AI ethicists who called for mandatory impact assessments before enterprise deployment. The BBC noted that the UK government's AI Safety Institute had been given only 48 hours' notice of the release, prompting criticism from Science Minister Jo Stevens.

Washington PostCritical

Political analysis of the regulatory vacuum that allowed OpenAI to release without pre-market approval.

The Post's technology policy reporters traced the lobbying campaign OpenAI ran over 18 months to block binding pre-release safety legislation in the US Congress, spending $42 million on advocacy. The piece argued that GPT-5's release exposed a fundamental regulatory gap that neither the White House nor Congress had filled despite two years of hearings and executive orders.

Conclusion

GPT-5 represents a step-change in capability that regulators, competitors and civil-society groups were not fully prepared for, setting up a tense few months of policy battles across three continents.

Logical analysis

Where sources agree

All outlets agree GPT-5 represents a significant capability advancement over previous models
There is consensus that regulatory frameworks were not prepared for this release
All sources acknowledge that competitors will accelerate their own timelines in response

Whether GPT-5 achieves "near-human reasoning"

Outlet	Claim
Bloomberg	OpenAI's benchmark results are credible; GPT-5 scores in the 97th percentile on graduate-level reasoning tasks, representing a genuine step change in capability
Washington Post	The benchmarks cited were partly curated by OpenAI and do not reflect performance on novel problems outside training distributions; independent researchers have not replicated the claims

Duration of OpenAI's pre-release safety evaluation

Outlet	Claim
OpenAI (via Bloomberg)	The safety evaluation for GPT-5 was the most thorough in the company's history, involving external red-teamers and 6 months of internal testing
New York Times	According to three sources familiar with the process, the external evaluation window was 90 days — shorter than GPT-4 — due to competitive pressure from Google DeepMind

Details of OpenAI's internal safety evaluation process are almost entirely absent from all coverage
The specific benchmarks used to claim "near-human reasoning" are rarely scrutinised for potential gaming

The GPT-5 coverage is broadly split between techno-optimist commercial narratives and safety-sceptic policy narratives, with little integration between the two. The most accurate picture emerges from reading Bloomberg for market dynamics, the Times for policy context and the BBC for public-interest implications. The critical gap is accountability: almost no outlet has fully interrogated what "near-human reasoning" actually means in practice.

References

[1]
OpenAI Releases GPT-5, Claiming Human-Level Reasoning
New York Times
[2]
GPT-5 Could Automate $340 Billion in Professional Services
Bloomberg
[3]
OpenAI Spent $42 Million Lobbying Against Safety Rules Before GPT-5 Launch
Washington Post