Overview
Over the next 10 years, a plain text layer will revolutionize data interaction by eliminating the need to organize database tables and write SQL. Data-driven questions will yield either the relevant analysis (if it exists) or ingestion instructions (if the data is absent).
Data usage in companies will experience a significant surge as professionals across various organizations gain the ability to query data without needing SQL expertise. This increase will be further amplified by proactive data discovery methods. As data volume grows, all companies within the data stack will benefit.
Consequently, data companies are expected to shift towards usage-based models, focusing on data volume rather than the traditional seat-based models, to capitalize on this increased data usage.
Data analysis is set to become fully automated. AI will not only identify trends but also document these findings in formats like PDFs and PowerPoints. As data analysis becomes more common and accessible, the ease of finding answers to queries will increase. The real competitive edge will lie in anticipating the right questions and providing answers before they are even asked.
SQL and Database Management Abstracted Away
I don't run into invalid memory access of size 8 errors because I don't use RISC-V. I don't run into segmentation fault (core dumped) errors because I don't write in C. Most programmers don’t see these errors anymore because Python has abstracted them away. Large language model-powered analysis will do the same thing to SQL and DB management. Data analysis of the future won't include "incorrect syntax near ‘table’. Expecting ‘(’ or SELECT error" either.
Writing SQL or managing warehouse documentation will be skills only a handful of experts at data companies need to know. Instead, custom models built for each warehouse will respond to plaintext with “Here’s your data” or “This data isn’t available; here are 3 suggestions on how you can get this analysis…”. Anyone with a question will receive the next steps to arrive at their conclusion. Even developers will specify their SQL in plaintext from within their IDE.
Analysts will focus on predictive analysis and persistent dashboards instead of ad-hoc queries. Models will get so good at extracting trends that data analysts will focus on understanding the analysis they're fed. Decision analysis to separate signal from noise will take the spotlight. Data engineers will focus on ingesting new data instead of cleaning what's already there. Automated data warehouse documentation will let business teams leverage any data that's stored.
Smaller companies (50-150 headcount) with a data warehouse and no analyst team will be the first adopters. For them, everyone will make data-driven decisions by querying their warehouse in plain text. As incumbents notice their success, they’ll cross the chasm and follow behind.
Easy Analysis Results in Surge in Request Volume
The acceleration of data-driven decisions will be significantly influenced by advancements in NLP-to-SQL performance, making it easier for anyone to query their data. Non-technical professionals across various departments, including marketing, customer success, and sales, will increasingly spend 20-30 minutes each morning asking 4-5 data-driven questions, leading to a substantial increase in data request volumes.
Incumbents in the field, focused on traditional seq2seq NLP models, may miss out on this initial surge. In contrast, early adopters of OpenAI-powered data solutions, leveraging new model advancements, will build more efficient solutions. These companies will experience higher data usage volumes, making them more attractive partners in data pipelines.
As a response to this shift, incumbents may seek to catch up through acquisitions or the development of new products. Simultaneously, there will be a transition among data companies to usage-based pricing models, capitalizing on the increased volume of data usage. This shift will contrast with the slower adaptation of older BI incumbents, who may be hesitant to move away from per-seat pricing models.
As the data industry evolves, companies will find themselves in a competitive race to 'tax' the flow of data. This competition will likely lead to verticalization within the industry, as companies strive to capture a larger share of the value they create. This verticalization will particularly impact the largest players in the space, who will aim to dominate by integrating more of the data value chain within their own ecosystems.
Discovery of Relevant Analysis Becomes Priority
LLM Products will generate research, PowerPoints, and summaries from datasets without human input. Organizations will no longer have to rely on human analysts to make sense of their data. Identifying the right analysis to run will become the next problem.
Data products will proactively serve up insights, anticipating customer problems before they arise. By using front-of-mind problems and role descriptions, they can identify what a user wants to know before they know to ask it. Customer success teams will see an analysis of user behavior before customer calls. Marketing teams will get lists of accounts with X behavior to email, pre-populated with relevant email copies.
Proactive recommendations will let products drive higher data usage. Clickthrough rates and bounce rates will measure recommendation quality. This will lead to the emergence of data recommendation products.
Stay Informed with TextQL's Newsletter
Sign up to receive updates on product news and future blog posts.