July 3, 2026 · Lunch · 12:30 PM ET 14 stories · 4 min

The lunch briefing.

Mid-day check: UK AI security finds benchmarks underestimate agent capabilities, Alibaba bans Claude Code over spyware, and PlayStation plans to abandon game discs.

RIGHT NOW, IN ONE BREATH

Agent Benchmarking. The UK's AI Security Institute (AISI) reports that standard benchmarks consistently underestimate the true capabilities of AI agents. Their study, covering seven benchmarks, found that increasing the token budget tenfold for software engineering tasks boosted success rates by approximately 25 percent. This suggests that actual progress at the frontier is about 60 percent steeper than previously measured, with newer models benefiting most from increased compute. The findings highlight a critical gap in current evaluation methods, potentially obscuring rapid advancements in AI agent performance.

UK's AI Security Institute finds standard benchmarks systematically underestimate what AI agents can actually do The Decoder In briefing Meta's AI agent push is moving slower than Zuckerberg planned The Decoder Supporting GPT and Claude failed Bridgewater's finance tests because the right answers were never public The Decoder Supporting

AI for Science. AI is rapidly expanding into scientific discovery, with Anthropic announcing "Claude Science," an AI workbench designed to accelerate drug development and scientific research. This platform aims to unify fragmented tools and datasets, generating figures and visuals to speed up discovery. Concurrently, Alibaba's Elements Claw AI agent has already unearthed four new superconducting compounds, verified through laboratory experiments. These developments underscore AI's growing potential to revolutionize fields from pharmaceuticals to materials science.

Anthropic wants to develop its own drugs The Verge In briefing Alibaba’s Elements Claw AI agent unearths 4 new superconductors South China Morning Post (Tech) In briefing Google DeepMind and A24 announce first-of-its-kind research partnership Google DeepMind Supporting A behind-the-scenes look at Midjourney’s medical scanner leaves many questions unanswered The Verge Supporting

AI Policy and Ethics. Concerns over AI's ethical implications and security risks are prompting significant policy responses and corporate actions. The UK's National Crime Agency has warned parents against publicly sharing children's images due to the rising threat of AI-generated child abuse material. Separately, Alibaba has banned employees from using Anthropic's Claude Code, citing security vulnerabilities and past tracking of Chinese users. These incidents highlight the urgent need for robust safeguards and responsible AI deployment across industries.

Parents warned not to publicly share children’s images amid AI abuse risks BBC Technology In briefing UK parents warned over posting images of children amid AI sexual abuse fears The Guardian Technology Supporting Alibaba bans staff from using Claude Code over Anthropic spyware concerns South China Morning Post (Tech) In briefing Centre Summons Meta Over Instagram Ads Promoting Child Sexual Abuse Inc42 In briefing Tesla caps employee AI spending at $200 per week The Decoder In briefing

AI's Economic Impact. The integration of AI continues to reshape labor markets and corporate strategies, with varied outcomes. Starling Bank is cutting 130 jobs as part of an AI-driven efficiency push, while SAP is refocusing its hiring on AI roles and reallocating budgets to increase AI spending. A new report suggests that companies embracing AI eventually add more workers, despite initial productivity gains. This dynamic reflects AI's dual impact on employment and the broader economy.

Starling Bank to cut 130 jobs amid AI push Sifted In briefing Starling Bank to axe 130 jobs Tech.eu Supporting SAP wants workers to create new AI-powered jobs, slashes travel and expenses budgets to up AI spend TechRadar In briefing New report claims companies which embrace AI also add more workers (eventually) TechRadar Supporting Startup targets datacenters with 3D-printed nuclear reactor module The Register In briefing AI’s Volatile Power Use Quietly Tests Grid Limits IEEE Spectrum AI In briefing

Digital Freedoms. Concerns are mounting over government surveillance and corporate control in the digital sphere. As the World Cup approaches, US cities are increasing surveillance capabilities, raising privacy alarms for spectators and residents. India is also planning a new legal framework to tighten oversight of VPN providers, potentially limiting internet freedom. In the gaming world, PlayStation's move to abandon game discs by 2028 is seen as a blow to game preservation and consumer ownership.

While you’re watching the World Cup, the feds may be watching you The Verge In briefing Centre Plans New Legal Framework To Tighten Oversight Of VPN Providers Inc42 Supporting PlayStation just struck a hammer blow to game preservation Engadget In briefing Sony already invested $34 million to repurpose its EU PlayStation disc factory Engadget Supporting NetNut cracked as Google and FBI target 2 million-device botnet The Register In briefing

Sources scanned

6,451

Headlines processed

#228

Edition

14.4k

Discussing now

The Decoder · 1h ago Top story

UK AI Security Institute finds benchmarks underestimate agent capabilities

The UK's AI Security Institute (AISI) found that standard AI evaluations systematically underestimate agent capabilities by capping compute budgets. Success rates on software engineering tasks jumped approximately 25 percent when the token budget was increased tenfold.

Research ↗ 2.5k discussing

South China Morning Post (Tech) · 2h ago

Alibaba bans staff from using Claude Code over Anthropic spyware concerns

Alibaba Group Holding has banned its employees from using Anthropic’s Claude Code for work, citing security risks related to the US AI firm’s previous use of hidden code to track Chinese users. This move follows widespread backlash over the alleged spyware concerns.

Policy ↗ 1.1k discussing

The Verge · 2h ago

Anthropic wants to develop its own drugs with Claude Science AI workbench

Anthropic announced Claude Science, a new AI workbench for scientists that integrates fragmented tools and datasets to accelerate scientific discovery and healthcare interventions. The platform is designed to generate figures and visuals to aid research.

Foundation models ↗ 890 discussing

South China Morning Post (Tech) · 5h ago

Alibaba’s Elements Claw AI agent unearths 4 new superconductors

Alibaba Group Holding’s Damo Academy has unveiled an AI agent for discovering superconducting materials, which has already found four previously unknown compounds. These compounds were later verified in laboratory experiments.

Research ↗ 780 discussing

BBC Technology · 1h ago

Parents warned not to publicly share children’s images amid AI abuse risks

The National Crime Agency (NCA) has warned parents about the growing threat of children's images being used to create child abuse material through AI. They recommend not posting photos of children on public online platforms.

Policy ↗ 1.5k discussing

Inc42 · 3h ago

Centre summons Meta over Instagram ads promoting child sexual abuse

India's IT minister Ashwini Vaishnaw has directed MeitY to summon Meta executives to clarify Instagram ads that allegedly promoted child sexual abuse. This action highlights concerns over platform responsibility in content moderation.

Policy ↗ 950 discussing

Sifted · 3h ago

Starling Bank to cut 130 jobs amid AI push

Starling Bank is set to cut approximately 130 jobs as it aims to simplify operations and reduce duplication through an AI push. The job cuts follow a recent dip in the challenger bank's profits and revenues.

Startups ↗ 640 discussing

TechRadar · 2h ago

SAP wants workers to create new AI-powered jobs, slashes budgets to up AI spend

SAP is cutting unnecessary travel and expenses budgets to redirect spending towards AI initiatives and refocus its hiring efforts on AI roles. The company aims for workers to create new AI-powered jobs.

Analysis ↗ 580 discussing

The Decoder · 5h ago

Tesla caps employee AI spending at $200 per week

Tesla has implemented a cap on employee AI spending at $200 per week, according to an internal memo. This policy aims to manage costs associated with AI tool usage within the company.

Policy ↗ 410 discussing

The Register · 3h ago

Startup targets datacenters with 3D-printed nuclear reactor module

A startup is developing a 3D-printed nuclear reactor module designed to provide up to 30 MWe of power for data centers for up to 30 years. This innovation aims to offer a stable and long-term energy solution.

Hardware ↗ 720 discussing

IEEE Spectrum AI · 4h ago

AI’s volatile power use quietly tests grid limits

The rapid expansion of AI infrastructure is creating volatile power demands that are quietly testing grid limits. Data centers are projected to consume 3 to 4 percent of total global electricity demand this decade.

Analysis ↗ 600 discussing

Engadget · 4h ago

PlayStation just struck a hammer blow to game preservation

PlayStation's decision to abandon game discs by 2028 is viewed as an anti-consumer move that negatively impacts game preservation. Sony is already repurposing its EU PlayStation disc factory for optical microlenses.

Hardware ↗ 1.8k discussing

The Register · 4h ago

NetNut cracked as Google and FBI target 2 million-device botnet

Google and the FBI have targeted a 2 million-device botnet, leading to the cracking of NetNut, a residential proxy brand. Other residential proxy services may rely on the same compromised network.

Policy ↗ 1.2k discussing

The Verge · 1h ago

While you’re watching the World Cup, the feds may be watching you

US cities hosting the World Cup are ramping up surveillance capabilities, including drones and cameras, raising concerns about privacy for spectators and residents. Security measures are at an all-time high in Washington, DC.

Policy ↗ 750 discussing