The words Innovation Explained with the ai underlined on gradient background with a data node pattern.The words Innovation Explained with the ai underlined on gradient background with a data node pattern.

Voice-enabled artificial intelligence combines text-to-speech (TTS) and speech-to-text (STT) technologies with intelligent automation platforms to create conversational experiences that feel natural, responsive, and human. When layered on top of agentic AI (autonomous software agents that can reason, plan, and execute multi-step tasks), voice capabilities transform static chatbots and clunky interactive voice response (IVR) systems into dynamic, emotionally aware conversationalists. On March 25, 2026, ElevenLabs and IBM announced a collaboration to bring ElevenLabs’ industry-leading TTS and STT technologies to IBM watsonx Orchestrate.

In this article, we’ll discuss what this partnership means for the enterprise AI landscape, how ElevenLabs’ voice technology enhances IBM’s agentic AI platform, and why voice-first experiences are becoming essential for organizations operating at global scale. We’ll also explore the specific industries and use cases poised to benefit most from this integration, the enterprise-grade security and compliance features that make it viable for regulated environments, and what the future holds as both companies continue to push the boundaries of human-AI interaction.

TL;DR Snapshot

ElevenLabs and IBM are integrating ElevenLabs’ premium text-to-speech and speech-to-text capabilities into IBM watsonx Orchestrate, IBM’s agentic AI orchestration platform. The partnership gives companies access to a wide variety of voice and language options, paired with enterprise-grade security features like PCI compliance, HIPAA-supporting Zero Retention Mode, and data residency controls. The goal is to move agentic AI beyond text-only interactions and into voice-first experiences that feel natural, scale globally, and meet the governance demands of large organizations.

Key takeaways include…

Voice is becoming a critical interface for enterprise AI. ElevenLabs brings human-quality speech synthesis in 70 languages and over 10,000 voices to IBM’s watsonx Orchestrate, enabling AI agents that sound natural and emotionally aware rather than robotic.
Enterprise security is built in, not bolted on. The integration includes PCI compliance for secure payment processing, Zero Retention Mode designed to support HIPAA-compliant data handling, and data residency options, making it viable for financial services, healthcare, government, and other regulated sectors.
This signals the broader industry shift toward voice-first agentic AI. Both companies have stated their intention to continue collaborating, pointing toward a future where voice is the primary interface between enterprises and the autonomous AI agents handling their workflows.

Who should read this: Enterprise technology leaders, CX and contact center professionals, AI product managers, developers building voice-enabled applications, and anyone following the evolution of agentic AI.

Why Voice Matters for Agentic AI

Agentic AI represents a fundamental evolution in how software operates within organizations. Unlike traditional chatbots that follow scripted decision trees, agentic AI systems can reason through complex tasks, connect to enterprise tools and databases, and take action autonomously. IBM watsonx Orchestrate embodies this approach! It’s a unified platform that lets businesses build, deploy, manage, and govern AI agents that automate workflows across their operations, connecting to existing systems, models, and automation tools.

But there’s a catch. As these agents increasingly interact directly with customers and employees, the way they communicate matters enormously. Long wait times, rigid call flows, and robotic-sounding voices can undermine even the most sophisticated underlying intelligence. As ElevenLabs co-founder Mati Staniszewski put it in the announcement, voice is where AI either earns trust or loses it. The gap between what an AI agent can do and how it sounds has become a real bottleneck for enterprise adoption, particularly in phone-based and voice-first channels where tone, pacing, and emotional nuance determine whether a user feels helped or frustrated.

This is precisely the problem the ElevenLabs and IBM collaboration is designed to solve. By embedding ElevenLabs’ speech technology directly into watsonx Orchestrate, enterprise AI agents can now communicate with the nuance, emotion, and rhythm of human speech. The result is an AI interaction that doesn’t just resolve the query but does so in a way that feels natural and trustworthy.

What the Integration Delivers

The technical scope of this integration is substantial. Clients building agents with IBM watsonx Orchestrate gain access to ElevenLabs’ premium speech quality, which has been widely recognized for producing some of the most lifelike synthetic voices available today. ElevenLabs launched in January 2023 with what it described as the first human-like AI voice model, and the company has since grown to serve millions of users and thousands of businesses across its platforms.

Illustration of a person speaking to an AI Agent.

The multilingual capabilities are particularly noteworthy. With support for 70 languages and multiple regional accents, AI phone agents built on watsonx Orchestrate can now serve truly global user bases without sacrificing voice quality or cultural appropriateness. This is especially relevant for government agencies that must support multiple languages to assist constituents with healthcare, human services, education, and civic activities. Banks, insurance companies, healthcare providers, and utilities similarly stand to benefit from the ability to reach more communities in their preferred language.

Beyond the voice itself, the integration is designed to handle the demands of enterprise-scale deployment. It supports high-volume and highly concurrent interactions across global user bases, meaning organizations don’t have to worry about voice quality degrading as call volumes spike. This combination of quality, breadth, and scalability is what distinguishes this integration from simpler TTS add-ons that might work in a demo but buckle under real-world enterprise loads.

Enterprise Security and Compliance

For many organizations, the quality of the voice is only part of the equation. There’s a lot of concern about agentic AI security at the moment, with companies like Cisco recently announcing new safety focused frameworks, and companies like SentinelOne releasing AI-centered platform expansions. And regulated industries like financial services, healthcare, and government face strict requirements around how data is handled, stored, and transmitted. Any voice AI solution that doesn’t meet those requirements or that lacks in security features is a non-starter, regardless of how good it sounds.

The ElevenLabs and IBM integration addresses this head-on. Clients can access enterprise-grade protections including PCI compliance for secured payment processing, which is essential for any agent that handles credit card numbers or financial transactions over the phone. The integration also includes Zero Retention Mode, which is designed to support HIPAA-compliant data handling by ensuring that audio inputs and outputs are not stored after processing. Data residency options give organizations control over where their data lives, which is critical for meeting regional regulatory requirements.

ElevenLabs brings strong compliance credentials to the table. The company holds SOC 2 Type II, ISO 27001, and PCI DSS Level 1 certifications, with GDPR compliance and HIPAA support. End-to-end encryption protects data in transit and at rest, and the company offers data residency in both the EU and US. When paired with IBM watsonx Orchestrate’s own governance and trust infrastructure, the combination creates a security posture that enterprise IT and compliance teams can work with confidently.

Industries and Use Cases Poised to Benefit

Illustration of industries that can benefit from voice-enabled AI.

IBM has highlighted several key areas where this integration is expected to have the most immediate impact: customer support, sales, employee experience, internal operations, and government services. Each of these represents a scenario where voice-first AI can deliver measurable improvements over text-only alternatives.

In customer support, voice agents that sound natural and can handle multiple languages dramatically expand the population of customers an organization can serve effectively. In sales, the ability to convey warmth, urgency, or reassurance through voice can meaningfully affect conversion rates. For employee experience and internal operations, voice-enabled agents can simplify everything from IT help desk interactions to HR policy inquiries, reducing friction and freeing human staff for higher-value work.

Government services represent a particularly compelling use case. Public-sector agencies often need to serve diverse, multilingual populations across sensitive domains like healthcare, education, and social services. A voice-first AI agent that speaks a constituent’s language naturally, handles their inquiry securely, and routes complex issues to a human when needed is a powerful tool for improving public access and trust. The 70-language support with regional accents makes this kind of deployment practical in a way it simply wasn’t before.

Looking Ahead: The Voice-First Future of Enterprise AI

Both ElevenLabs and IBM have signaled that this integration is the beginning, not the endpoint, of their collaboration. The companies intend to continue working together to help enterprises move beyond text-only agents and toward voice-first, human-centered AI experiences designed for enterprise scale.

This trajectory aligns with a broader industry shift. As agentic AI becomes more capable, handling increasingly complex, multi-step workflows with greater autonomy, the interface through which humans interact with these agents becomes proportionally more important. Text-based chat served the early era of AI assistants well, but as agents take on tasks that were previously handled by phone calls, in-person interactions, and speech-driven workflows, the demand for high-quality voice is only going to grow.

IBM’s Vice President of AI Technology Partnerships, Nick Holda, framed the collaboration as an example of IBM’s open ecosystem approach, which gives clients the flexibility to choose the models and tools that fit their business. This philosophy, letting enterprises mix and match best-in-class components rather than locking them into a single vendor’s full stack, is increasingly important as the AI landscape fragments into specialized players. ElevenLabs brings world-class voice, IBM brings enterprise orchestration, governance, and scale. Together, they offer a compelling blueprint for how the next generation of enterprise AI will be assembled.

Frequently Asked Questions

IBM watsonx Orchestrate is a unified platform that allows organizations to build, deploy, manage, and govern AI agents that automate business workflows. It connects to existing enterprise systems, models, and automation tools, enabling multiple AI agents to collaborate and providing a scalable foundation for trustworthy, explainable enterprise AI. Users can interact with it using natural language, and it draws from a catalog of pre-built and custom skills to execute tasks.

ElevenLabs is an AI research and product company that specializes in voice technology. Founded in January 2023, it launched with what it described as the first human-like AI voice model. Today, the company serves millions of users and thousands of businesses through three main platforms: ElevenAgents (enterprise conversational AI), ElevenCreative (content creation tools for speech, music, image, and video in over 70 languages), and its developer API.

Agentic AI refers to AI systems that can autonomously reason, plan, and take action to accomplish goals, rather than simply responding to individual prompts. Unlike traditional chatbots or simple AI assistants, agentic AI can break down complex tasks, connect to external tools and data sources, make decisions, and execute multi-step workflows with minimal human intervention.

Text-to-speech (TTS) is technology that converts written text into spoken audio, allowing computers and AI agents to “speak” aloud. Speech-to-text (STT) is the reverse. It transcribes spoken language into written text, enabling AI systems to “listen” and understand what a person is saying. Together, they form the foundation of voice-enabled AI interactions.

PCI DSS (Payment Card Industry Data Security Standard) compliance is a set of security standards designed to ensure that organizations handling credit card information maintain a secure environment. For voice AI, PCI compliance means that AI agents processing payments over the phone can handle sensitive card data without exposing it to security risks.

Zero Retention Mode is an ElevenLabs feature for enterprise customers in which most data from requests and responses is immediately deleted once the request is completed. This means audio inputs and outputs are not stored after processing, providing an additional layer of security for sensitive workflows and supporting compliance with data handling regulations like HIPAA.

The integration supports voice interactions in 70 languages with multiple regional accents, enabling enterprises to deploy AI agents that can communicate naturally with diverse, global user bases.

TL;DR Snapshot

Why Voice Matters for Agentic AI

What the Integration Delivers

Enterprise Security and Compliance

Industries and Use Cases Poised to Benefit

Looking Ahead: The Voice-First Future of Enterprise AI

Frequently Asked Questions

What is IBM watsonx Orchestrate?+

What is ElevenLabs?+

What is agentic AI?+

What is text-to-speech (TTS) and speech-to-text (STT)?+

What is PCI compliance?+

What is Zero Retention Mode?+

How many languages does this integration support?+