Briefings

The lunch briefing.

Mid-day check: UK AI security finds benchmarks underestimate agent capabilities, Alibaba bans Claude Code over spyware, and PlayStation plans to abandon game discs.

RIGHT NOW, IN ONE BREATH

Agent Benchmarking. The UK's AI Security Institute (AISI) reports that standard benchmarks consistently underestimate the true capabilities of AI agents. Their study, covering seven benchmarks, found that increasing the token budget tenfold for software engineering tasks boosted success rates by approximately 25 percent. This suggests that actual progress at the frontier is about 60 percent steeper than previously measured, with newer models benefiting most from increased compute. The findings highlight a critical gap in current evaluation methods, potentially obscuring rapid advancements in AI agent performance.

AI for Science. AI is rapidly expanding into scientific discovery, with Anthropic announcing "Claude Science," an AI workbench designed to accelerate drug development and scientific research. This platform aims to unify fragmented tools and datasets, generating figures and visuals to speed up discovery. Concurrently, Alibaba's Elements Claw AI agent has already unearthed four new superconducting compounds, verified through laboratory experiments. These developments underscore AI's growing potential to revolutionize fields from pharmaceuticals to materials science.

AI Policy and Ethics. Concerns over AI's ethical implications and security risks are prompting significant policy responses and corporate actions. The UK's National Crime Agency has warned parents against publicly sharing children's images due to the rising threat of AI-generated child abuse material. Separately, Alibaba has banned employees from using Anthropic's Claude Code, citing security vulnerabilities and past tracking of Chinese users. These incidents highlight the urgent need for robust safeguards and responsible AI deployment across industries.

AI's Economic Impact. The integration of AI continues to reshape labor markets and corporate strategies, with varied outcomes. Starling Bank is cutting 130 jobs as part of an AI-driven efficiency push, while SAP is refocusing its hiring on AI roles and reallocating budgets to increase AI spending. A new report suggests that companies embracing AI eventually add more workers, despite initial productivity gains. This dynamic reflects AI's dual impact on employment and the broader economy.

Digital Freedoms. Concerns are mounting over government surveillance and corporate control in the digital sphere. As the World Cup approaches, US cities are increasing surveillance capabilities, raising privacy alarms for spectators and residents. India is also planning a new legal framework to tighten oversight of VPN providers, potentially limiting internet freedom. In the gaming world, PlayStation's move to abandon game discs by 2028 is seen as a blow to game preservation and consumer ownership.

69
Sources scanned
6,451
Headlines processed
#228
Edition
14.4k
Discussing now

UK AI Security Institute finds benchmarks underestimate agent capabilities

The UK's AI Security Institute (AISI) found that standard AI evaluations systematically underestimate agent capabilities by capping compute budgets. Success rates on software engineering tasks jumped approximately 25 percent when the token budget was increased tenfold.

Alibaba bans staff from using Claude Code over Anthropic spyware concerns

Alibaba Group Holding has banned its employees from using Anthropic’s Claude Code for work, citing security risks related to the US AI firm’s previous use of hidden code to track Chinese users. This move follows widespread backlash over the alleged spyware concerns.

Anthropic wants to develop its own drugs with Claude Science AI workbench

Anthropic announced Claude Science, a new AI workbench for scientists that integrates fragmented tools and datasets to accelerate scientific discovery and healthcare interventions. The platform is designed to generate figures and visuals to aid research.

Alibaba’s Elements Claw AI agent unearths 4 new superconductors

Alibaba Group Holding’s Damo Academy has unveiled an AI agent for discovering superconducting materials, which has already found four previously unknown compounds. These compounds were later verified in laboratory experiments.

Parents warned not to publicly share children’s images amid AI abuse risks

The National Crime Agency (NCA) has warned parents about the growing threat of children's images being used to create child abuse material through AI. They recommend not posting photos of children on public online platforms.

Centre summons Meta over Instagram ads promoting child sexual abuse

India's IT minister Ashwini Vaishnaw has directed MeitY to summon Meta executives to clarify Instagram ads that allegedly promoted child sexual abuse. This action highlights concerns over platform responsibility in content moderation.

Starling Bank to cut 130 jobs amid AI push

Starling Bank is set to cut approximately 130 jobs as it aims to simplify operations and reduce duplication through an AI push. The job cuts follow a recent dip in the challenger bank's profits and revenues.

SAP wants workers to create new AI-powered jobs, slashes budgets to up AI spend

SAP is cutting unnecessary travel and expenses budgets to redirect spending towards AI initiatives and refocus its hiring efforts on AI roles. The company aims for workers to create new AI-powered jobs.

Tesla caps employee AI spending at $200 per week

Tesla has implemented a cap on employee AI spending at $200 per week, according to an internal memo. This policy aims to manage costs associated with AI tool usage within the company.

Startup targets datacenters with 3D-printed nuclear reactor module

A startup is developing a 3D-printed nuclear reactor module designed to provide up to 30 MWe of power for data centers for up to 30 years. This innovation aims to offer a stable and long-term energy solution.

AI’s volatile power use quietly tests grid limits

The rapid expansion of AI infrastructure is creating volatile power demands that are quietly testing grid limits. Data centers are projected to consume 3 to 4 percent of total global electricity demand this decade.

PlayStation just struck a hammer blow to game preservation

PlayStation's decision to abandon game discs by 2028 is viewed as an anti-consumer move that negatively impacts game preservation. Sony is already repurposing its EU PlayStation disc factory for optical microlenses.

NetNut cracked as Google and FBI target 2 million-device botnet

Google and the FBI have targeted a 2 million-device botnet, leading to the cracking of NetNut, a residential proxy brand. Other residential proxy services may rely on the same compromised network.

While you’re watching the World Cup, the feds may be watching you

US cities hosting the World Cup are ramping up surveillance capabilities, including drones and cameras, raising concerns about privacy for spectators and residents. Security measures are at an all-time high in Washington, DC.