The AI Report
Posts
🔍 OpenAI promises more safety reports

🔍 OpenAI promises more safety reports

HOW: Google may eradicate AI hallucinations

Martin Crowley, Liam Lawson & Amanda Greenwood
May 15, 2025

WORK WITH US • COMMUNITY • PODCASTS • SIGN UP

Thursday’s AI Report

• 1. 🔍 OpenAI promises more safety reports
• 2. 🧠 Own your own intelligence with WebAI
• 3. 👻 Google eradicates AI hallucinations?
• 4. 🌀 ChatGPT gets major upgrade
• 5. 🤖 Find out what AI can do for your business with Upscaile
• 6. ⚙️ Trending AI Tools
• 7. 🏗️ Practical AI Applications
• 8. 📑 Recommended Resources

Read Time: 5 minutes

‼️Tomorrow, this week’s AI Tool Report podcast episode launches. This one’s a masterclass on how to turn monetizable knowledge into a real business, with content creation expert, Celeste Yamile.

✅ Refer your friends and unlock rewards. Scroll to the bottom to find out more!

OpenAI promises more safety reports

🚨 Our Report

OpenAI has launched a new “safety evaluations hub” where it plans to regularly share the results of its internal AI safety tests to increase transparency and build back public trust.

🔓 Key Points

The internal AI safety test results will show how its models perform when tested for hallucinations, jailbreaks, and the creation of harmful content like “hateful content or illicit advice.”
OpenAI announced it would share these AI safety results and metrics “on an ongoing basis” to align with new model releases, and it intends to add other safety evaluations in the future “to measure model safety.”
Although OpenAI plans to “update the hub periodically…to communicate more proactively about safety,” it also warned that the hub doesn’t reflect its full safety efforts; it just “shows a snapshot.”

🔐 Relevance

The launch of the safety evaluations hub comes as OpenAI recently faced criticism from AI ethicists and industry experts for prioritizing new model releases and features over safety testing—after several key OpenAI employees quit over safety concerns. CEO Sam Altman was also accused of misleading OpenAI executives about model safety testing, right before he was briefly ousted in 2023. And it also follows the rapid reversal of its GPT-4o after receiving complaints that it was too agreeable and validated dangerous ideas.

Own Your Intelligence—Before Someone Else Does

The next era of AI will not be ruled by mega-models in the cloud—it will be shaped by a constellation of task-specific models running exactly where your data lives.

Organizations that keep intelligence close will move faster, spend less, and own every byte.

webAI is the end-to-end platform that lets you build, scale, and deploy private, specialized AI now.

Sub-second inference on hardware you already own
Up to 60 % lower TCO vs. cloud AI bills
Full data custody for audit-proof compliance
Visual workspace to fine-tune models in days, not months

We’re already helping global leaders in health tech, aviation, finserv, and manufacturing move to private AI; you don’t have to wait.

Download our new Executive Guide and see how to own your AI future.

🚨 Our Report

Google’s AI research lab, DeepMind, has developed a new AI agent—AlphaEvolve—that can solve practical problems, overcome complex coding and math challenges better than any previous models, and also (reportedly) reduce the risk of hallucinations.

🔓 Key Points

DeepMind built AlphaEvolve to improve the data center efficiency and reduce model training time, and according to internal benchmarks, its algorithm saves 0.7% of compute resources and cuts training time by 1%.
It does this—and reduces hallucinations—by using an automated evaluation system that leverages various Gemini models to generate answers, evaluate, and then re-evaluate them for accuracy, across several stages.
Google is using AlphaEvolve internally to improve data center and training efficiency, but feels (once it's built a public interface) it’ll “be transformative” across industries and “wider business applications.”

🔐 Relevance

This isn’t a particularly massive breakthrough: DeepMind has applied similar techniques in the past, but the use of Google’s advanced Gemini models makes this version much more capable. Having said that, because it solves problems based on an automated, self-evaluation algorithm, it can currently just solve numerical problems, like coding and math, etc, which makes it great for computer science and system optimization, but that’s its limit at the moment.

OpenAI has integrated its new GPT-4.1 and GPT-4.1 mini into ChatGPT. These models excel at coding and following instructions (compared to GPT-4o), so developers who use ChatGPT to debug or write code will be ecstatic.
ChatGPT Plus, Pro, and Team users will get access to the powerful GPT-4.1, and everyone else will get access to GPT-4.1 mini (OpenAI has confirmed it will now be removing GPT-4.0 mini for all users).
The models were released (for developers) in April, but faced criticism over the lack of safety reports. OpenAI argued that because they weren’t frontier models, they didn’t need them, but the new safety hub may fix this.

Find Out What AI & Automation Can Do for Your Business

Your time is too valuable to be spent on inefficiencies.

What if AI and automation could unlock huge time and cost savings, allowing your team to focus on strategic growth?

In just 3-5 weeks, Upscaile’s AI Audit provides you with:

Actionable workflow improvements, powered by AI and automation
An AI roadmap to identify and prioritize high-value efficiency opportunities
An ROI analysis to forecast cost savings and productivity gains

Our AI audits have identified opportunities that have helped over 100 enterprises save 80,000+ hours of manual work, without the need to hire additional staff.

Curious about the reports we’ve made for others and if your business qualifies too?

Prompt Inspiration

After typing this prompt, you will get quality control and collaboration guidelines and processes that will help you work with professional translators and agencies so you can reach a global audience.

Design a process for working with professional translators or agencies, including quality control and collaboration guidelines.

P.S. Use the Prompt Engineer GPT by The AI Report to 10x your prompts.

STARTUPS

Name: Suno
Value: $500M
Funding raised: $125M in Series B

Suno is an AI music creation platform that combines its own music generation model with ChatGPT to enable users to create complete songs—including music, lyrics, and vocals—from simple text prompts.

PODCASTS

Fraud Prevention in the Digital Age

This podcast episode looks at how to tackle the challenges of modern fraud prevention in a digital world. It includes insights on the limitations of legacy systems and the complexities of data aggregation as fraud threats grow more sophisticated.