Data-driven 2021: Predictions for a new year in data, analytics and AI

Towards the end of each year, I receive a slew of predictions, from data/analytics industry executives and luminaries, focused on the year ahead. This year, those predictions filled a 49-page-long document. 

While I couldn’t include all of them, I’ve rounded up many of this year’s prognostications, from over 30 companies, in this post. The roster includes numerous well-known data/analytics players, including Cloudera, Databricks, Micro Focus, Qlik, SAS, and Snowflake, to name a few. Thoughts from execs at Andreessen Horowitz, the Deloitte AI Institute and O’Reilly are in the mix as well, as are those from executives at smaller but still important industry players.

This year’s groupings include data warehouse vs. data lake; the democratization of artificial intelligence (AI); responsible AI; the convergence of AI and business intelligence (BI); growth in data literacy; the data governance imperative; and, of course, the interplay between analytics and the COVID-19 pandemic. Anyway, enough preamble; let’s get on with this year’s predictions.

Warehouse vs. lake: Can we all get along?

One popular topic this year was the relative strength, and ultimate survivability, of the data warehouse and data lake approaches to analytics.

Bob Muglia, Snowflake‘s former CEO, says that fully transacting images and videos together with any source of data in a data warehouse is “…coming in the next two to three years, and that’s going to be the nail in the coffin for the data lake.” Micro Focus‘ Open Source Relations ManagerPaige Roberts, feels “the data warehouse vendors have an unbeatable head start [over data lake vendors] because building a solid, dependable analytical database like Vertica can take ten years or more alone. The data lake vendors have only been around about ten years and are scrambling to play catch-up.

George Fraser, CEO of Fivetran, says “I think 2021 will reveal the need for data lakes in the modern data stack is shrinking.” Adding that “…there are no longer new technical reasons for adopting data lakes because data warehouses that separate compute from storage have emerged.” If that’s not categorical enough for you, Fraser sums things up thus: “In the world of the modern data stack, data lakes are not the optimal solution. They are becoming legacy technology.”

Data lake supporters are even more ardent. In a prediction he titled “The Data Lake Can Do What Data Warehouses Do and Much More”, Tomer Shiran, co-founder of Dremio, says “data warehouses have historically had…advantages over data lakes. But that’s now changing with the latest open source innovations in the data tier.” He mentions Apache Parquet and Delta Lake as two such innovations and lesser known projects Apache Iceberg and Nessie as well. Together, these projects allow data to be stored in open, columnar formats across file systems, versioned and processed with transactional consistency.

Martin Casado, General Partner of Andreessen Horowitz, put it this way: “If you look at the use cases for data lakes vs. data analytics, it’s very different. Data lakes tend to be more unstructured data, compute intensive, focused on operational AI. The use case for operational AI is larger and growing faster. Over time, I think you can argue that it’s the data lake that ends up consuming everything.”

Dipti Bokar, at PrestoDB-focsed Ahana says “As cloud adoption has become mainstream, companies are creating and storing the majority of their data in the cloud, especially in cost-efficient Amazon S3-based data lakes.” Her colleague, Dave Simmen, Ahana’s CTO, says “A federated, disaggregated stack…is displacing the traditional data warehouse with its tightly coupled database.” Simmen also believes that “…we’ll see traditional data warehousing and tightly coupled database architectures relegated to legacy workloads.”

Over at Databricks, the strategy is to focus on data lake technology, but to imbue it with certain data warehouse-like qualities. Joel Minnick, Databricks’ VP of Marketing, explains it this way: “The vision we see taking shape now is called the lakehouse. It provides a structured transactional layer to a data lake to add data warehouse-like performance, reliability, quality, and scale. It allows many of the use cases that would traditionally have required legacy data warehouses to be accomplished with a data lake alone.”

What about players with no dog in the race? O’Reilly‘s, Rachel Roumeliotis, VP of AI and Data, acknowledges the validity of the lake and lakehouse models: “Data lakes have experienced a fairly robust resurgence over the last few years, specifically cloud data lakes…these will remain on the radar in 2021. Similarly, the data lakehouse, an architecture that features attributes of both the data lake and the data warehouse, gained traction in 2020 and will continue to grow in prominence in 2021.” Roumeliotis gives a nod to the warehouse model, adding: “Cloud data warehouse engineering develops as a particular focus as database solutions move more to the cloud.”

Over at Starburst, which focuses on Trino (formerly PrestoSQL) — an engine that works great for querying data lakes, but which can also connect to data warehouses and numerous other data sources — CEO Justin Borgman says “We’ll see business leaders pointing…to make data-driven decisions, which encompasses all types of data no matter where it lives – in the cloud, on prem, in data lakes or data warehouses.”

All for AI and AI for all?

As you can imagine, there were a great number of predictions focused on AI and machine learning (ML) this year; there were so many, in fact, that they breakdown into a few substantial subcategories. One set of predictions focuses on how AI will become more democratized, accessible, affordable and mature.

Starburst’s Borgman says “ML/AI will become more accessible to a broader base of users.” He adds that while data science backgrounds have been necessary to take advantage of AI up until now, that this “is changing to include anyone in the organization who needs data access to make more intelligent decisions.” Alex Peña, Lead Research and Development Engineer at Linode, thinks the economics of AI will improve its accessibility too, saying “Smaller businesses are going to be able to take advantage of AI as the cost of cloud GPU services comes down.” Ryan Wilkinson, Chief Technology Officer at IntelliShift, would concur, stating: “with hardware at a point to support AI…the ML and AI software running in the cloud will mature faster than ever before.”

Ryohei Fujimaki, Ph.D., Founder & CEO of dotData, sees automated machine learning (AutoML) as another driver of AI accessibility for non-data scientists, predicting that, in 2021 “…we will see the rise of AutoML 2.0 platforms that take ‘no-code’ to the next level.” Fujimaki also feels that AutoML will help take AI beyond predictive analytics use cases, because it “…can also provide invaluable insights into past trends, events and information that adds value to the business by allowing businesses to discover the ‘unknown unknowns,’ trends and data patterns that are important, but that no one had suspected would be true.”

Responsible, ethical AI

Another major topic is that of responsible AI/responsible ML, and the general importance of trust and explainability in AI/ML models. Amy Hodler, Director of Graph Analytics and AI Programs at Neo4j, says that although “…discussions on Responsible AI have stalled” due to the pandemic, that “The need for Responsible AI has not changed and the need to start a public discussion is as important as ever.” O’Reilly’s Roumeliotis strikes a similar chord regarding limited progress heretofore and how it will drive major activity in 2021: “Until now, corporate adoption of responsible ML has been lukewarm and reactive at best. In the next year, increased regulation (such as GDPR, CCPA), antitrust, and other legal forces will force companies to adopt responsible ML practices.” Nick Elprin, CEO at Domino Data Lab, sees things in a similar light: “…rapidly evolving privacy standards first seen with GDPR and now California’s CCPA, will require in 2021 that attention equally be paid to making AI models more transparent and secure.”

Without a responsible AI approach, it becomes difficult for the C-Suite and team members to trust the AI models they’re designing, and without that trust, it’s almost impossible to utilize AI to business advantage. João Oliveira, Business Solutions Manager at SAS, says that “The more visibility that decision makers have into AI results, the more confidence they have in the decisions that are being made by the models.” Oliveira further believes that confidence begets adoption, stating that “…human oversight and explaining the models at each step in a decision process will start to bring acceptance to AI and automated decisioning.” Santiago Giraldo, Cloudera‘s Senior Product Marketing Manager of Machine Learning, not only agrees, but goes on to stipulate that, for the business, such AI adoption is existentially necessary. He puts it this way: “In 2021, a business’ ability to trust its model — to the extent that they are able to produce action from AI-derived insight – will be determinant of its ability to survive.”

Another Cloudera executive, Cindy Maike, VP of Industry Solutions, says “We will see ethical AI become front and center in the next 12 to 24 months.” Beena Ammanath, executive director of the Deloitte AI Institute thinks “2021 will be the year of action for AI ethics” and says that “enabling trust in AI systems will be at the center of every AI conversation.” Ammanath feels that “Companies will start to act on deciding the ethical dimensions of their AI strategies and implement AI models that can be governed for ethical implications as part of MLOps.” Indeed, RELX‘s annual Emerging Tech Executive Report provides some corroboration for this, saying “Over 8 in 10 business leaders believe that ethical considerations are a strategic priority in the design and implementation of their AI systems.”

Domino Data Lab’s Elprin substantiates the danger of neglecting ethical AI, by predicting that “In 2021, we’ll see broader awareness across industries of legal implications and risks of automated decisions. We may see public lawsuits related to discrimination or liability that involve decisions made by models.” But it’s not all doom and gloom. James Kingston, VP of Research and Innovation Partnerships at Dataswift, AI researcher, and Director of the HAT-LAB, provides some carrot at the end of that stick, explaining that “By combining ethical, compliant and privacy-preserving principles with technology infrastructure built to scale for the future, society will move towards a system where the value of data will benefit both individuals and enterprises alike.”

AI and BI, in league

While artificial intelligence seems to have eclipsed business intelligence in terms of importance, it’s not a zero-sum game. The reality is that AI and BI are being mixed, matched and integrated quite a lot, providing a new frontier of innovation in the BI world.

Dan Sommer, Senior Director, Global Market Intelligence Lead at Qlik, says “AI will play a major role…surfacing micro-insights and helping us move from scripted and people-oriented processes to more automated, low-code and no code data preparation and analytics. If more people can be self-sufficient with data earlier in the value chain, anomalies can be detected earlier and problems solved sooner.”

Ramesh Panuganty, CEO of BI company MachEye, says “Business Intelligence is shifting to a new paradigm of advanced data analytics with the integration of Natural Language, Natural Search, AI/ML, Augmented Analytics, Automated Data Preparation, and Automated Data Catalogs. This will transform business decision-making processes with higher-quality real-time insights.” Dhiren Patel, MachEye’s Chief Product Officer & Head of Customer Success, predicts that “As new AI-powered BI products emerge, silos will be broken and every user will be able to leverage data analytics and find insights easily.”

This convergence may go beyond technology, though, and involve practitioners and their skill sets, as well. dotData’s Fujimaki says that “…more and more businesses will begin asking BI teams to develop and manage AI/ML models” and thinks this will give rise to “a new class of BI-based ‘AI developers’.”

Data literacy, data culture spreads

Many of our industry prognosticators believe data skills and literacy will become ubiquitous and commonplace in 2021. Sudheesh Nair, CEO of ThoughtSpot, believes that “As data literacy rises, analytics skills will become the norm for all business professionals and start to disappear from candidates’ resumes.” Nair drives his point home through analogy: “Just as you’re unlikely to see “Office proficiency” today, you’re unlikely to see “data proficiency” by the end of the decade.”

Sam Mahalingam, CTO, Altair, boldly predicts that “In 2021 everyone becomes a data pro.” Mahalingam justifies the prediction by asserting that “With the latest advancements in predictive analytics tools, augmented analytics, and explainable AI models, the analysis and interpretation of data is becoming easier and quicker for business professionals at every skill level.”

Lucy Kosturko, Manager, Social Innovation at SAS, sees a generational/cultural angle here too, explaining that “A generation raised on data…is beginning to enter the workforce” and that “Their innate abilities to track and understand data will improve the ways we work.” Kosturko further believes these “data natives” will “bring data literacy skill sets and a comfort level with data that will help make all aspects of organizations more analytical and more innovative with data.”

Aaron Kalb, Alation‘s Chief Data and Analytics Officer, believes that 2021 will be the year that “data literacy goes mainstream.” He continues: “In 2019, most people found math, stats, and data to be boring, intimidating, or irrelevant. But after a year of scrutinizing margins of error in election polling, watching exponential COVID case curves and learning about “R-naught,” those topics certainly seem important and impactful, and more accessible too.”

Kalb believes this increased data literacy goes beyond individuals, and applies to entire organizations. He predicts that in 2021 “‘data culture’ will start to appear achievable.” He admits that “Until now, ‘data culture’ has been a bit of a buzzword and pipe dream.” but says that “in 2021, we’ll see some role models emerge: organizations which have successfully ‘made the switch’ and repeatable patterns for how the right mix of people, processes, and technologies can drive real change.”

Governance now

Another popular topic in this year’s big batch of predictions was the proliferation of data and how it now makes data management and governance a priority.

Rick Hedeman, Sr. Director of Business Development at 1touch.io, lays out the problem this way: “As we enter a new year, data sprawl continues to accelerate, data lakes are popping up all over the place, and information governance is getting much more difficult.” he says that “Companies in every industry see value in knowing as much about customer behavior and sentiment as possible, but it is largely a ‘collect first, ask questions later’ approach.”

Chris Bergh, CEO of DataKitchen, under the heading “Data Governance Moving Front and Center” observes that “At many large enterprise organizations, data governance is often seen as an obstacle to innovation and productivity. However, many modern organizations are beginning to realize that this has to change if they want to be agile and successful.”

The issue encompasses not just conventional text and numeric data but media, too. Nutanix CEO Dheeraj Pandey says “Governance around images and video data being produced at the edge will bring even more meaningful applications of AI and ML in the hybrid enterprise.”

Under the heading “Data Privacy and Governance Kicks Into Another Gear in the United States,” Tomer Shiran at Dremio believes that the United States will end up adopting national regulations similar to the European Union’s GDPR and the California Consumer Protection Act. He says that “This will require companies to double down on privacy and data governance in their data analytics infrastructure.”

Governance doesn’t just apply to tables, data sets and Parquet files in the data lake, either. It applies to ML models as well. Cloudera’s Cindy Maike says that “As we look to 2021, we will see the conversation of ethical AI and data governance be applied to multiple different areas, such as contact tracing (fighting COVID-19), connected vehicles and smart devices…and personal cyber profiles.”

Balaji Ganesan, co-founder and CEO of Privacera (who is also the co-founder of the Apache Ranger project, a popular data security standard) predicts that the regulatory environment will mean 2021 will usher in “The End of the Wild West of Information Sharing.” He also sees a tie-in between governance and the COVID-19 pandemic, asserting “…the remote working requirements of the COVID-19 pandemic will force enterprises to accelerate data governance and compliance projects in 2021.”

Analytics and the COVID-19 pandemic

And speaking of the pandemic (you didn’t think we’d go through a massive slate of predictions for 2021 without discussing COVID-19, did you), many of this year’s predictors see it as a forcing function in many of the technology predictions they’ve made.

For example, Spiros Liolis, Chief Technologist at Micro Focus, believes “Smart software will automate and take on more and more repetitive functions, in the supply chain, as a result of COVID-19 lessons.” And Ashu Singhal, Co-Founder and President at Benchling, says “More R&D organizations will continue moving their infrastructure to the cloud because of COVID-19 and ML investments will rapidly increase.”

Here’s Alation’s Aaron Kalb‘s take: “When the pandemic turned the world economy upside down, organizations were forced to invest rapidly in business intelligence and data catalog software just to understand what the heck was going on and make basic business decisions.” And Domino Data Lab’s Elprin opines that “Organizations are making dramatic budget cuts in many areas in an effort to overcome the effects of COVID-19 and keep their business viable. Yet, in 2021 we predict that many will sustain or actually increase their investment in data science to help drive the critical business decisions that may literally make the difference between survival and liquidation.”

The pandemic is also recognized as a driver for prioritization of ethical/responsible AI. Deloitte’s Ammanath said “The COVID-19 global pandemic has ignited urgent demand for AI solutions and increased worldwide focus on ethical use of AI.” Natalia Modjeska, Research Director at Info-Tech Research Group, though, believes the pandemic will actually be a counterbalance around the application of AI, saying “We’ll see a more cautious approach to AI: less irrational exuberance and more of a level-headed analysis of benefits and risks, investments required and ROI…because of the massive data shift brought about by Covid-19, which rendered much of the pre-pandemic data unusable”

Regarding that shift, MachEye’s Patel comments that “Customer behavior and purchase habits have sharply changed as a result of the pandemic.” Because of this, Patel predicts that “Analytical reports and dashboards, even as recent as 2019, will become useless. The focus will shift to analyzing customer behavior changes in real time to gain actionable insights.”

Buno Pati, CEO of Infoworks, points out that analytics can impact how we manage the pandemic, rather than the other way around. He explains that “Wars are won or lost on logistics, and we are currently at war with COVID-19…gaining access to data about qualified nurses, specialists, respiratory therapists and radiologists, hospitals will rapidly get patients the care they need.”

Jans Aasman, CEO of Franz Inc. believes graph analytics will significantly impact the effectiveness of COVID-19 contact tracing: “Leading healthcare institutions will create Event Knowledge Graphs centered around a COVID patient and analyze all individuals and places the infected person came within 6 feet of, for 15 minutes or more [and]…will advise potentially infected people to be tested and quarantined to slow the spread of the disease.”

Finally, Greg Horne, Global Principal, Healthcare, at SAS thinks the analytics/COVID-19 interplay will go beyond contact tracing and impact the COVID vaccination effort as well. He predicts that “Analytics will not only play a role in approvals for the vaccine development process but will also be important for planning roll out and tracking distribution, side effects and effectiveness.” 

Out with the old

Given the sluggish rollout of the vaccine in the United States so far, one wonders if Horne’s prediction will come to fruition and improve matters. Certainly, most of us are hopeful that data and analytics can help hasten the end to this pandemic. Let’s also be hopeful that next year’s predictions focus on a fully post-pandemic 2022.

With that, I wish all ZDNet readers a happy, safe, prosperous and healthy 2021.

Cloudera is a customer of Brust’s advisory firm, Blue Badge Insights.