Startup News: Essential Steps and Lessons from Google’s FACTS Benchmark for Enterprise AI in 2025

Discover how Google’s FACTS benchmark exposes AI models’ 70% accuracy ceiling, urging enterprises to use verification tools and external data for reliable AI solutions.

F/MS LAUNCH - Startup News: Essential Steps and Lessons from Google’s FACTS Benchmark for Enterprise AI in 2025 (F/MS Startup Platform)

Google’s new FACTS benchmark has caught the attention of enterprise AI developers worldwide, but not necessarily for the reasons you'd expect. As a female entrepreneur juggling multiple startups, I enjoy dissecting new frameworks like this to see what they mean for businesses. FACTS isn’t just a technical achievement; it’s a wake-up call that raises real questions about the reliability and limitations of the AI tools businesses depend on.

What Is the 70% Factuality Ceiling?

FACTS is Google’s ambitious attempt to evaluate the factual accuracy of generative AI models in different contexts. It doesn’t simply test whether a chatbot can give correct answers. Instead, it measures how well AI systems respond using context documents, web search tools, visual data, and internal memory. According to Google’s initial tests, even top-tier models like OpenAI's GPT-5 and Google's Gemini Pro hit a frustrating ceiling: no model scored above 70% factual accuracy overall.

Think about that. In critical enterprise tasks requiring reliability, whether it’s financial reporting, customer support, or research, models are correct only about two-thirds of the time. And even that number depends on whether supplementary tools like vector databases or search engines are integrated to augment the AI’s capabilities.

Why Entrepreneurs Should Care

For founders, especially those of us bootstrapping startups, efficiency and trustworthiness are non-negotiable. Say you’re developing a customer service chatbot or a knowledge base AI for your product. If the model delivers incorrect information even 30% of the time, it creates risks, angry customers, wasted time fixing errors, and tarnished reputations. Google’s benchmark tells us that unless we adopt strategies to validate outputs, enterprise AI might undermine our operations instead of optimizing them.

Here’s why it matters: when you're building revenue-dependent models, mistakes can cost more than just money, they can cost trust. According to Google’s results shared on the FACTS benchmark webpage, businesses using generative AI in production must design systems that verify outputs constantly.

A Practical Guide for Improving AI Accuracy in Enterprise

If you're already using or planning to deploy AI in your startup, here are a few actionable steps to improve accuracy:

  1. Combine AI systems with external tools
    Connect your chatbot or AI to databases, domain-specific APIs, and search tools. For instance, Gemini Pro’s accuracy in search-related tasks spiked to 83.8% when it leveraged external tools. Learn more about vector database solutions like Google's DeepMind integration to access relevant guides.

  2. Build Rigorous Feedback Loops
    Use systems where human agents or team members can easily flag inaccuracies. Make corrections visible to ensure iterative improvements.

  3. Train Models on Contextual Data
    Models perform better when asked to process relevant contextual information rather than vague or broad data points. Google’s FACTS test showed up to a 15% difference based on input specificity.

  4. Don’t Automate Blindly
    Resist the temptation to automate all processes. Certain outputs, like contracts or sensitive reports, are worth reviewing manually or integrating verification tools. Learn about how Google’s contextual grounding makes this important.

  5. Run Independent Benchmarks
    Before deploying AI tools for high-stake tasks, test them against your workflows. Use available benchmarks like the FACTS suite shared publicly via Kaggle.

Common Mistakes Entrepreneurs Should Avoid

Mistakes connected to AI in startups often arise from overestimating its capabilities. Here’s where many founders trip up:

  • Skipping Verification
    Assuming “the AI knows” is a recipe for disaster. Failure to validate key data outputs can lead to risky fillings or missed deadlines.

  • Ignoring Multimodal Failures
    Many AI models still lack visual consistency. Gemini Pro, for example, scored less than 50% on multimodal tests, which means its visual-based outputs should always be reviewed by humans.

  • Relying Solely on Internal Model “Memory”
    Tests showed internal memory retrieval accuracy as low as 76.4%. Build systems that prioritize real-time retrieval through external repositories or real-world references.

Lessons for Female Entrepreneurs in Europe

Being a woman in tech comes with unique challenges, but also ample opportunities to think smarter about tech adoption rather than getting swept up in its hype. I’ve learned from running Fe/male Switch and CADChain, startups in both deeptech and education fields, that the value of questioning technology pays off, both in impact and outcomes. For women building businesses in a landscape increasingly reliant on AI, FACTS is a reminder that due diligence matters. It’s not enough to say “AI will change everything”; we need to push ourselves to ask at what cost and where are the gaps.

Another take-away here is collaboration. No founder can, and should, do it all alone, especially when tech adoption requires constant learning. Many female entrepreneurs I mentor through Fe/male Switch have benefitted from interdisciplinary teams who fill gaps in knowledge from data science, coding, and linguistic expertise.

Conclusion: What the Future Holds

After studying Google’s FACTS benchmark, I think one thing is clear: forward-thinking founders will need to treat AI less as a silver bullet and more as a tool to be honed carefully. Use it where it makes the most sense for your business operations, but build checks, augment weak areas, and stay skeptical of its limits. You don’t need blind optimism, you need smart strategies. And one smart strategy? Always ask when AI might be getting it wrong, and map out ways to catch those errors before they impact your startup’s operations.


FAQ

1. What is the 70% factuality ceiling in Google’s FACTS benchmark?
The 70% factuality ceiling is the maximum accuracy generative AI models like OpenAI's GPT-5 and Google’s Gemini Pro have achieved on Google’s FACTS benchmark. It shows that even top-tier models are only correct about two-thirds of the time in real-world conditions. Learn more about the 70% factuality ceiling

2. What does the FACTS benchmark test?
FACTS evaluates AI models on four real-world failure modes: contextual factuality, search-based accuracy, multimodal integration, and internal memory-based responses. This benchmark provides a comprehensive view of how reliable these models are under different conditions. Explore Google’s FACTS Benchmark Suite

3. Why do enterprise owners need to pay attention to AI limits?
AI's factual limitations can negatively impact critical tasks like financial reporting or customer support. Errors cost trust, time, and money, making validation and frequent audits of AI outputs crucial for businesses. Understand why enterprises need AI accuracy

4. How can businesses improve AI factuality in production systems?
Businesses can improve factual accuracy by using external tools like vector databases, building feedback loops for iterative updates, and integrating verification systems, especially for high-stake tasks. Discover actionable strategies for better AI accuracy

5. How does multimodal performance impact enterprises?
Google’s FACTS benchmark shows that multimodal tasks (visual and textual integration) have poor accuracy rates, typically below 50%. Enterprises using AI for visual-based tasks should plan for human review as a safeguard. Learn more about multimodal benchmarks

6. What role does context have in AI factuality?
AI models achieve higher factual accuracy, up to 15% more, when their inputs are narrow and context-specific versus general or ambiguous data. Understand the impact of contextual data on AI

7. Should startups automate all operations with AI?
No, full automation can be risky. Sensitive outputs, such as contracts or financial reports, should be manually reviewed, even when AI is involved in the process. For higher accuracy, use verification tools alongside automation.

8. How does Google’s FACTS support enterprises?
FACTS serves as a standardized metric for evaluating AI models’ reliability in production tasks, helping businesses compare AI systems and choose the best for their specific needs. Explore FACTS as a reliable enterprise tool

9. Why do startups often mismanage AI implementation?
Common mistakes include skipping output verification, overestimating the model's internal memory, and ignoring the weaknesses in multimodal capabilities. These oversights can lead to costly errors in operations.

10. What lessons can female entrepreneurs take away from FACTS?
FACTS highlights the value of skepticism in tech adoption. Female founders should prioritize asking critical questions about an AI's reliability and build collaborative, interdisciplinary teams to mitigate risks. Learn from FACTS insights for entrepreneurs

About the Author

Violetta Bonenkamp, also known as MeanCEO, is an experienced startup founder with an impressive educational background including an MBA and four other higher education degrees. She has over 20 years of work experience across multiple countries, including 5 years as a solopreneur and serial entrepreneur. Throughout her startup experience she has applied for multiple startup grants at the EU level, in the Netherlands and Malta, and her startups received quite a few of those. She’s been living, studying and working in many countries around the globe and her extensive multicultural experience has influenced her immensely.

Violetta Bonenkamp's expertise in CAD sector, IP protection and blockchain

Violetta Bonenkamp is recognized as a multidisciplinary expert with significant achievements in the CAD sector, intellectual property (IP) protection, and blockchain technology.

CAD Sector:

  • Violetta is the CEO and co-founder of CADChain, a deep tech startup focused on developing IP management software specifically for CAD (Computer-Aided Design) data. CADChain addresses the lack of industry standards for CAD data protection and sharing, using innovative technology to secure and manage design data.
  • She has led the company since its inception in 2018, overseeing R&D, PR, and business development, and driving the creation of products for platforms such as Autodesk Inventor, Blender, and SolidWorks.
  • Her leadership has been instrumental in scaling CADChain from a small team to a significant player in the deeptech space, with a diverse, international team.

IP Protection:

  • Violetta has built deep expertise in intellectual property, combining academic training with practical startup experience. She has taken specialized courses in IP from institutions like WIPO and the EU IPO.
  • She is known for sharing actionable strategies for startup IP protection, leveraging both legal and technological approaches, and has published guides and content on this topic for the entrepreneurial community.
  • Her work at CADChain directly addresses the need for robust IP protection in the engineering and design industries, integrating cybersecurity and compliance measures to safeguard digital assets.

Blockchain:

  • Violetta’s entry into the blockchain sector began with the founding of CADChain, which uses blockchain as a core technology for securing and managing CAD data.
  • She holds several certifications in blockchain and has participated in major hackathons and policy forums, such as the OECD Global Blockchain Policy Forum.
  • Her expertise extends to applying blockchain for IP management, ensuring data integrity, traceability, and secure sharing in the CAD industry.

Violetta is a true multiple specialist who has built expertise in Linguistics, Education, Business Management, Blockchain, Entrepreneurship, Intellectual Property, Game Design, AI, SEO, Digital Marketing, cyber security and zero code automations. Her extensive educational journey includes a Master of Arts in Linguistics and Education, an Advanced Master in Linguistics from Belgium (2006-2007), an MBA from Blekinge Institute of Technology in Sweden (2006-2008), and an Erasmus Mundus joint program European Master of Higher Education from universities in Norway, Finland, and Portugal (2009).

She is the founder of Fe/male Switch, a startup game that encourages women to enter STEM fields, and also leads CADChain, and multiple other projects like the Directory of 1,000 Startup Cities with a proprietary MeanCEO Index that ranks cities for female entrepreneurs. Violetta created the "gamepreneurship" methodology, which forms the scientific basis of her startup game. She also builds a lot of SEO tools for startups. Her achievements include being named one of the top 100 women in Europe by EU Startups in 2022 and being nominated for Impact Person of the year at the Dutch Blockchain Week. She is an author with Sifted and a speaker at different Universities. Recently she published a book on Startup Idea Validation the right way: from zero to first customers and beyond, launched a Directory of 1,500+ websites for startups to list themselves in order to gain traction and build backlinks and is building MELA AI to help local restaurants in Malta get more visibility online.

For the past several years Violetta has been living between the Netherlands and Malta, while also regularly traveling to different destinations around the globe, usually due to her entrepreneurial activities. This has led her to start writing about different locations and amenities from the POV of an entrepreneur. Here’s her recent article about the best hotels in Italy to work from.

About the Publication

Fe/male Switch is an innovative startup platform designed to empower women entrepreneurs through an immersive, game-like experience. Founded in 2020 during the pandemic "without any funding and without any code," this non-profit initiative has evolved into a comprehensive educational tool for aspiring female entrepreneurs.The platform was co-founded by Violetta Shishkina-Bonenkamp, who serves as CEO and one of the lead authors of the Startup News branch.

Mission and Purpose

Fe/male Switch Foundation was created to address the gender gap in the tech and entrepreneurship space. The platform aims to skill-up future female tech leaders and empower them to create resilient and innovative tech startups through what they call "gamepreneurship". By putting players in a virtual startup village where they must survive and thrive, the startup game allows women to test their entrepreneurial abilities without financial risk.

Key Features

The platform offers a unique blend of news, resources,learning, networking, and practical application within a supportive, female-focused environment:

  • Skill Lab: Micro-modules covering essential startup skills
  • Virtual Startup Building: Create or join startups and tackle real-world challenges
  • AI Co-founder (PlayPal): Guides users through the startup process
  • SANDBOX: A testing environment for idea validation before launch
  • Wellness Integration: Virtual activities to balance work and self-care
  • Marketplace: Buy or sell expert sessions and tutorials

Impact and Growth

Since its inception, Fe/male Switch has shown impressive growth:

  • 5,000+ female entrepreneurs in the community
  • 100+ startup tools built
  • 5,000+ pieces of articles and news written
  • 1,000 unique business ideas for women created

Partnerships

Fe/male Switch has formed strategic partnerships to enhance its offerings. In January 2022, it teamed up with global website builder Tilda to provide free access to website building tools and mentorship services for Fe/male Switch participants.

Recognition

Fe/male Switch has received media attention for its innovative approach to closing the gender gap in tech entrepreneurship. The platform has been featured in various publications highlighting its unique "play to learn and earn" model.