Tech

The MCP Gap and the Limits of AI at Petabyte Scale

May 2, 2026

By Ashley Vassell, Senior Product Manager, Hydrolix

For those who follow developments in AI technology, it’s probably well known that MCP, or Model Context Protocol, is a lightweight protocol that enables data sources, APIs, and other tools to communicate directly with AI. It represents a significant move away from simply feeding AI static data to enabling it to ask questions and obtain current data in real-time.

However, as MCP is integrated into petabyte-scale production data (not sample data sets) but rather into the true, live time-series logs, the answer to this question is more complex than implied.

MCP Adoption Now

In practice, it is an on-call engineer using MCP to determine what has occurred over the last six hours. Or a CDN Operations Team investigating cache performance anomalies in plain language rather than writing SQL. A security analyst querying billions of log lines without relying upon the accuracy of sample datasets.

As described above, this occurs throughout the industry in high-volume data environments, including, but not limited to, observability platforms, CDN analytics, security log repositories, IoT telemetry systems, and any other environment where large amounts of time-series data are managed, and rapid answers are required. Companies such as Hydrolix that operate in this environment have created MCP servers to enable direct communication between AI applications and petabyte-scale data repositories.

The teams that have been quickest to adopt MCP are the same teams that are struggling with high volumes of time-sensitive data. For these teams, the gap between “data exists” and “data is available” has always been the largest hurdle. These are not AI-first companies. Rather, they were drowning in data long before anyone labeled it an AI issue.

Scaling Problems

Historically, data infrastructure was designed for human use. MCP uses them entirely differently, and issues related to scalability are beginning to arise rapidly.

Query Safety and Cost Control

Using an LLM in combination with MCP to generate SQL for databases containing petabytes of data creates a scenario in which a slow query is more than just a slow query. Instead, it is a resource hog that could result in a substantial unintended billing expense. An LLM does not recognize that scanning a month of data is significantly more expensive than scanning a single day of data. Therefore, it simply requests its next logical question.

Authentication and Identity

As previously mentioned, many early MCP implementations are localized configurations in which users use their own credentials. Although MCP has progressed relatively quickly in OAuth and IdP integration, as well as role-based access controls, widespread implementation across all clients and servers remains a concern. While a standard exists, achieving consistent implementation across each client-server pair in a production environment remains the primary point of failure.

Observability Blind Spots

Traffic generated by an AI application querying a data repository is generally anonymous and unattributed, making it difficult to distinguish from human-generated traffic. To what extent do you believe MCP generates accelerated incident resolutions for your on-call engineers? You cannot quantify or measure this. As such, obtaining visibility into your MCP usage becomes increasingly difficult to provide evidence for additional resource allocation to support growth and adoption.

Token Limits Meet Volume

This concept is interesting because, traditionally, a human understands the schema associated with their dataset and has some general knowledge of what they are attempting to find. However, in the world of MCP, when an individual asks an LLM “what has occurred over the last six hours,” the LLM must then decide how best to query that data to ensure it fits within the context window. The decisions made by the LLM regarding aggregation and sampling represent reasoning occurring before actual data analysis begins. Users have little understanding of the data included or excluded based on those decision-making processes. Traditionally, in BI, users can review raw data returned from queries. With AI serving as an intermediary, raw data is never in attendance.

Evolution of Data Infrastructure

Data infrastructure is rarely viewed as exciting from an innovation perspective; it ultimately determines if AI transforms existing workflows or merely proves to be expensive. The following describes what early-adopter companies are learning.

Understand your cost model prior to integrating any new applications. Ideally, establish a baseline for estimating costs for potential runaway queries. Establishing query guard rails at the base of your data (e.g., query timeouts, row limitations, etc.) becomes non-negotiable during initial configuration.

Although piloting may begin using personal API keys for experimentation purposes, it is vital to create a plan and timeline for establishing proper authentication methods prior to allowing the temporary status quo.

Prior to initiating MCP or as soon after as possible, instrument MCP traffic so you can isolate traffic originating from humans versus machines, quantify adoption rates, assign costs, and build a business case for expanding capabilities.

Begin with teams experiencing extreme pain (i.e., operator teams responsible for answering questions preferably within minutes vs. hours or days) since they will indicate what works and what does not at scale, and these can serve as proof-of-concept for wider adoption.

The MCP Gap and the Limits of AI at Petabyte Scale

MCP Adoption Now

Scaling Problems

Query Safety and Cost Control

Authentication and Identity

Observability Blind Spots

Token Limits Meet Volume

Evolution of Data Infrastructure

Economic Insider Contributor

Strong Law, P.C. Strengthens Its Commitment to Injured Clients Across the St. Louis Region

US Factory Output Grows as Manufacturing Jobs Decline

Trent Harrison’s The Diet-Proof Body: A Balanced Approach to Fitness and Wellness

The Slowest Part of Building Energy Isn’t the Engineering. Permeta Is Betting It’s the Paperwork.

Jason Venturelli Breaks Down the Economics of Purchasing D6 Fuel Oil From a Supplier

Fed’s Goolsbee Cites Inflation Challenges Despite Stable Jobs

Working Capital Loans in 2026: The Complete Guide for Small Business Owners Who Need to Move Fast

More Than Algorithms and How One Software Engineer Is Redefining AI Efficiency at Scale

Get published in Economic Insider

Follow Us

Explore

Legal