From data to knowledge and AI via graphs: Technology to support a knowledge-based economy

These past few months have not been kind to any of us. The ripples caused by the COVID-19 crisis are felt far and wide, and the world’s economies have taken a staggering blow. As with most things in life, however, this crisis has also brought some interesting side effects.

The way to cope for practically every one of us, including organizations of all sizes and shapes, can be summarized in one infamous buzzword: digital transformation. This can mean different things depending on who you ask, from video calls and remote collaboration apps to more software as a service, cloud and machine learning investment.

Also: Digital transformation: The difference between success and failure 

Whatever digital transformation means, as the aphorism by Microsoft CEO Satya Nadella went, the COVID-19 crisis brought years worth of digital transformation in months. Those who had strategically invested are reaping the benefits, those who did not are either out of the game or left with a patchwork of disparate initiatives and apps in place.

There’s one thing in common however, regardless of which part in that journey you’re at: as digital transformation accelerates, an increasing part of all business activity is leaving its footprint behind in the form of data. Eventually, every employee, customer and supplier interaction, every lead, every information bit and every process, will either take place digitally or be documented digitally.

What this means in turn is that in theory, it should be possible to not just derive insights, as the promise of big data and analytics has been, but to go from data to information, and from information to knowledge.

Yes, this does sound a lot like the journey towards the fourth industrial revolution, and the promise of artificial intelligence. In the near future, organizations will be data-driven, and the economy will be knowledge-based. Here’s a shortlist of technologies and processes that can support it, and what they are about.

The data pyramid: from data to knowledge

The representation of the relationships among data, information, knowledge and — ultimately — wisdom, known as the data pyramid, has long been part of the language of information science. In the new knowledge-based digital world, encoding and making use of business and operational knowledge is the key to making progress and staying competitive.

The representation of the relationships among data, information, knowledge and –ultimately– wisdom, known as the data pyramid, has long been part of the language of information science

So how do we go from data to information, and from information to knowledge?

Data is a collection of facts in a raw or unorganized form, such as numbers or characters. Without context, data does not mean much.

For example, “18122020” is just a sequence of numbers. But if we define this sequence as a date in the DDMMYYY format, we can then interpret it as the 18th of December, 2020. With this added context, the numbers acquire a meaning.

Information is data that has been processed in a way that makes it easier to measure, visualize and analyze — for a specific purpose.

For example, we can organize our data in a way that exposes relationships between various seemingly disparate and disconnected data points. We can analyze the performance of the Dow Jones index by creating a graph of data points for a particular period of time, based on the data at each day’s closing.

Knowledge is information that has been processed, organized and structured in some way, applied or put into action.

For example, by capturing and expressing the meaning of relationships pertaining to our data points, we can automate insights, and extract new knowledge. A knowledge graph of semantic relationships can help explain how certain stocks influence the Dow Jones index, and how different events may affect their prices.

Adding context to data turns it to information. Processing information turns it to knowledge. The keys to these transformations are connections and metadata. But there is another well-known progression at play here.

By now, the classification of various forms of analytics from simple to advanced is also well known and understood. The theory behind this classification is that the more advanced forms of analytics lead to AI. Of course, there is an implicit question there: what do we talk about, when we talk about AI?

According to today’s conventional wisdom, the answer is machine learning, or more specifically even, deep learning. That has not always been held as a self-evident truth, however. It probably won’t be held as such for ever either. And it certainly is not held as such for everyone.

For a brief introduction to a more holistic view on AI, and a strong dose of inspiration, we recommend our recent conversation with AI prodigy Gary Marcus. From our part, let’s try to approach the analytics progression from a different viewpoint, focusing on a specific data structure: graphs.

Graph Analytics

What is the optimal way to go through a number of bridges? The history of graph theory is linked to this question. The bridges in question were the bridges of Königsberg. The year was 1736. And the person who formulated a model to answer this question was Leonhard Euler.

Euler was a Swiss scientist and engineer, and much of his work is foundational to modern science. Euler’s solution on the bridges of Königsberg is the basis of graph theory. There is a line connecting Euler to modern data science.

What Euler did was to model the bridges and the paths connecting them as nodes and edges in a graph. Euler formalized the relationships between nodes and edges. That formed the basis for many graph algorithms that can tackle problems such as the bridges of Königsberg.

bookshelf-of-knowledge.jpg

In the new knowledge-based digital world, encoding and making use of business and operational knowledge is the key to making progress and staying competitive.

The most famous graph algorithm is probably PageRank — the foundation of Google’s empire. PageRank models documents on the web as a graph, and uses links among them to derive relevance for a specific query. But Google’s success with PageRank is just the tip of the iceberg.

From the 18th century to today, a number of graph algorithms have been developed. Path finding, centrality, community detection and similarity are some of the main classes of graph algorithms. Graph algorithms have many applications in data analytics.

From eBay and NASA to investigative journalists and independent data scientists, graph analytics power real-world use cases and make a difference every day in areas of recommendations and fraud detection to network analysis and natural language processing. This is why the analyst firm Gartner predicts that “graph analytics will grow in the next few years due to the need to ask complex questions across complex data.”

Graph Databases

Leveraging connections in data is a prominent way of getting value out of data. Graph is the best way of leveraging connections, and graph databases excel at this. Graph databases make expressing and querying connection easy and powerful.

This is why graph databases are a good match in use cases that require leveraging connections in data: Anti-fraud, Recommendations, Customer 360 or Master Data Management. From operational applications to analytics, and from data integration to machine learning, graph gives you an edge.

There is a difference between graph analytics and graph databases. Graph analytics can be performed on any back end, as they only require reading graph-shaped data. Graph databases are databases with the ability to fully support both read and write, utilizing a graph data model, API and query language.

Graph databases have been around for a long time, but the attention they have been getting since 2017 is off the charts. AWS and Microsoft moving in the domain, with Neptune and Cosmos DB respectively, exposed graph databases to a wider audience.

This hitherto niche domain has been the hottest in data management since then. Besides trends, however, there are real reasons why graph databases are interesting, and real use cases they can help with. To quote Gartner again:

The application of graph processing and graph DBMSs will grow at 100 percent annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science. Graph data stores can efficiently model, explore and query data with complex interrelationships across data silos.

Knowledge Graphs

Connecting data silos is a prerequisite for knowledge management, and knowledge graphs excel at this. Knowledge graphs are a specific subclass of graphs, also known as semantic graphs. They come with metadata, schema, global identifier, and reasoning capabilities, which makes them ideal for capturing and managing knowledge.

It’s understandable why many people tend to think of graph as a new technology. The truth, however, is this technology is at least 20 years old. It has been largely initiated by none other than Tim Berners-Lee, who is also credited as the inventor of the Web.

Berners-Lee published his Semantic Web manifesto in 2001. Although sidelined over the years, these principles and technology are still largely behind the Knowledge Graph renaissance. Google played a key role in the rise of graphs, and knowledge graphs. As the web itself is a prime use case for graphs, PageRank was born.

gartnerhypeai2020.png

Gartner has included knowledge graphs in its 2020 hype cycle for AI, at the peak of inflated expectations

Despite PageRank’s success, crawling and categorizing content on the web is a very hard problem to solve without semantics and metadata. Hence Google embraced semantic technology, and coined the term Knowledge Graph in 2012.

This, and the widespread adoption of schema.org that followed, marked the beginning of the meteoric rise of graph technology and knowledge graphs. Knowledge graphs can address key challenges such as data governance and data integration.

Ultimately, knowledge graphs can serve as the digital substrate to unify the philosophy of knowledge acquisition and organization with the practice of data management in the digital age. And ontologies, the elaborate schemas that govern knowledge graphs, are now being used by the Morgan Stanleys of the world.

Graphs, AI, and natural language processing

You would be tempted to think that knowledge graphs are the end all for capturing and managing knowledge; you would be wrong. Knowledge graphs excel in capturing knowledge explicitly, in a top-down way. Knowledge graphs are part of AI, the so-called good old AI, or symbolic AI. This is why Gartner has included knowledge graphs in its 2020 hype cycle for AI, at the peak of inflated expectations, no less.

Knowledge graphs and ontologies may fare better than any other technology for managing explicit, a priori knowledge, but what about implicit, emergent, evolving knowledge? This is where machine learning works well, but here, too, graphs may have a helping hand to lend.

What does graph have to do with machine learning? A lot, actually. And it goes both ways. Machine learning can help bootstrap and populate knowledge graphs. The information contained in graphs can boost the efficiency of machine learning approaches.

oaagraph-ga-ml-1.jpg

What does graph have to do with machine learning? A lot, actually. And it goes both ways. Image: Oracle

Machine learning, and its deep learning subdomain, make a great match for graphs. Machine learning on graphs is still a nascent technology, but one which is full of promise. Amazon, Alibaba, Apple, Facebook and Twitter are just some of the organizations using this in production, and advancing the state of the art. More than 25% of the research published in top AI conferences is graph-related.

Last but not least, according to Facebook AI researcher Fabio Petroni, graphs may not be the best way to obtain knowledge: “As human beings, we already invented the best way of representing knowledge — text. With recent advances in Natural Language Processing (NLP), we now have machines that can retrieve pieces of context, reason on top of that and solve knowledge intensive tasks without using a knowledge base, just using text and understanding text.”

Having come full circle, this reference points to the kind of holistic approach to AI, knowledge representation and reasoning that Gary Marcus’ vision lays out. We have been keeping track of this field, and will continue to do so and report back here.