The Importance of Data Architecture

A fit-for-purpose data architecture is imperative in the age of artificial intelligence (A.I.).

Celman Elden D. Sudaria
8 min readAug 31, 2023

In the evolving landscape of artificial intelligence (A.I.), data and the ability to harness and transform it to information and knowledge has emerged as the most valuable asset and a critical capability, respectively. This is because A.I. requires data to learn, to make predictions, to optimize processes, and to generate text, images and other content. However, the capability to harness and transform data requires a well-structured and thoughtfully designed data architecture.

In this article, we will explore the continuing importance of data architecture in the age of A.I.

What is data architecture?

To understand data architecture, let’s define first the following two words that make up the term: data and architecture.

Data is a set of discrete, objective facts or observations. When meaning and context are added to it, data transform to become information that can help trigger action.

Before we define architecture, let’s first discuss the word structure in the context of architecture. According to Oxford Languages, structure is the arrangement of and relations between the parts or elements of something complex. It is an element or a collection of elements to provide support or enclosure.

An example of structure in nature is the lattice structure found in bones and in beehives. [1]

Figure 1. Lattice structure in bone and honeycomb. (Image is from the research paper entitled ‘Assessing Tetrahedral Lattice Parameters for Engineering Applications Through Finite Element Analysis’.). [1]

Now, with this in mind, let’s go back to define architecture.

Architecture is the “art and technique of designing and building structures. It is both the process and the product of sketching, conceiving, planning, designing, and constructing structures”. The term comes from the Latin word ‘architectura’; from Ancient Greek ἀρχιτέκτων (arkhitéktōn) ‘architect’; from ἀρχι- (arkhi-) ‘chief’, and τέκτων (téktōn) ‘creator’. [2]

Now, combining the concepts of data and architecture together with structure, we can define data architecture as a practice and as an output.

  • As a practice, data architecture is “the art and technique of designing and building structures of the data or structures that will enable the use of data”.
  • As an output, data architecture is a “structure of the data” or a “structure that enables the use of data”.

Why is data architecture important?

Just as different art forms like music, film and poetry have an arrangement of its parts or elements, data also requires structure.

For example, in music, a modern song usually has the following structure: intro, verse, pre-chorus, chorus, verse, pre-chorus, chorus, bridge and chorus. In rock music or heavy metal music, the song has guitar solos. The modern songs we love to listen to follow a structure. [3]

In film, a movie has a narrative structure which can be either linear or non-linear. According to screencraft, the most common and core structure of storytelling is the three-act structure which is composed of beginning (setup), middle (confrontation) and end (resolution). [4]

And lastly, I still remembered back in high school when I was asked by my teacher to create a haiku or a sonnet or a limerick. Each of these types of poetry has its own unique structure. A haiku is made up of three lines with first line having 5 syllables, second line having seven syllables and the third line have 5 syllables; while a sonnet has fourteen lines with ten syllables included in each line; and finally, a limerick consists of five lines of humorous and lighthearted poetry.

These artforms are very good in evoking emotions, in telling stories and in inspiring us partly because of the effective use of structure. And data architecture does the same for data. With data architecture, we can better understand the form or structure of the data that consequently enhances our ability to be able to use it.

Another useful analogy is to view data architecture as a blueprint, which is an essential artifact in architecture for constructing a building because it allows the builders to understand the design of the building from the architect.

Similarly, with a data architecture, data architects are able to communicate how data is structured or how data is harnessed (in a modern data platform) with data engineers, data scientists and business users. In this case, data architecture is both a standard (that promotes quality & effectiveness) and a communication tool (that promotes collaboration & trust).

What are examples of data architecture?

Let me describe three forms or manifestations of data architecture that are useful in analytics and in A.I. initiatives.

The first one is data model. Data models are essential to artificial intelligence solutions and to all data and analytics solutions in general because it provides an easy-to-understand view of what are the data entities and how they are related to each other.

The data model is a diagram composed of basic shapes like rectangles & circles (that represent entities) and connected by lines (that represent relationships).

Figure 2 below shows four examples of different data models — dimensional data model, data vault data model, graph data model and document DB data model — that can support and enable analytics and A.I. initiatives.

The way we structure the data in a data model impacts how we use the data. For example, with a dimensional data model implemented as star schema, aggregation queries like “what is our total sales amount and count by product in each branch for the month of December” can be relatively faster because the aggregations (i.e., total sales amount) are done in the fact table and can be grouped by dimensions (i.e., by product, in each branch, for a certain month).

Another example: if the use case calls for queries where we want to find “who are the colleagues of a person who have at least one year of experience on data modeling”, then a graph data model is a better structure to use because relationships between nodes (e.g., colleagues of a person) are implemented into the structure of the graph data model.

Figure 2. The different data models. (Image is from author’s personal notes. Some embedded images within the image are from Neo4j and from MongoDB.)

The second example is a reference architecture of a modern data platform. This is a “catch-all” example because we can create different versions of a reference architecture (from different perspectives) that will support analytics or A.I. initiatives.

One version can be a diagram showing how data is flowing from the source systems to the data platform via an ingestion mechanism, then data is validated, standardized, enriched or protected via a curation engine, then this enriched data (or information) is stored in a polyglot storage using the best-fit data model, and data is finally provisioned (as a data product) via an automated provisioning layer.

Another version of this reference architecture can be a diagram showing the foundational capabilities and components like data governance, metadata management, master data management and data quality management needed in a modern data platform. We then add capabilities and components like feature store or ML sandbox needed to enable, execute or implement the A.I. solution.

An example of this reference data architecture is shown below.

Figure 3. A reference architecture of a modern data platform. (Image is from author’s personal notes.)

A third example of a manifestation of a data architecture is knowledge graph which is defined by Stardog as the “representation of data that is enriched with real-world context, is based on the graph data structure, and has a flexible schema that allows for multiple definitions of the same data”.

Knowledge which is a collection of information which in turn is data imbued with meaning and context is organized in a knowledge graph that can serve as semantic layer. An A.I. solution or application can then use this semantic layer for training models, making predictions, optimizing processes and generating content.

For example, we can leverage knowledge graph in retrieval augmented generation (RAG) to enhance the response of large language models (LLMs).

Figure 4. The knowledge graph as a semantic layer. (Image is from the article entitled ‘Implementing Knowledge Graphs in Enterprises — Some Tips and Trends’ by Andreas Blumauer.). [5]

Data models, reference architecture of modern data platform, knowledge graphs are just three examples or manifestations of data architecture that will power and enable any analytics or A.I. initiatives.

In conclusion

In a digital landscape dominated by data and artificial intelligence, having a robust and well-grounded data architecture is a critical and vital necessity. It is the backbone that supports every data-driven and A.I.-led endeavor, from extracting insights to making informed decisions.

Data architecture’s role in shaping how data is collected, stored, structured, processed and provisioned directly impacts the success of analytics and A.I. initiatives.

Next time you see, use or experience an A.I. solution that impresses you, remember that behind the scenes, a well-designed data architecture is playing a vital role in making it all possible.

References (acknowledgments):

[1] Assessing Tetrahedral Lattice Parameters for Engineering Applications Through Finite Element Analysis — https://www.researchgate.net/publication/353729419_Assessing_Tetrahedral_Lattice_Parameters_for_Engineering_Applications_Through_Finite_Element_Analysis#pf2

[2] ‘Architecture’ in Wikipedia — https://en.wikipedia.org/wiki/Architecture

[3] ‘Song structure’ in Wikipedia — https://en.wikipedia.org/wiki/Song_structure

[4] 10 Screenplay Structures that Screenwriters Can Use — https://screencraft.org/blog/10-screenplay-structures-that-screenwriters-can-use

[5] Implementing Knowledge Graphs in Enterprises -Some Tips and Trends — https://www.linkedin.com/pulse/implementing-knowledge-graphs-enterprises-some-tips-blumauer/

Disclaimer: All views expressed on this story are my own and do not represent the opinions and viewpoints of any entity or organization that I have been, am now, or will be affiliated.

This story has been published for information and illustrative purposes only and is not intended to serve as advice of any nature whatsoever. The information contained and the references made in this story is in good faith, neither my employer nor its any of its directors, agents or employees give any warranty of accuracy (whether expressed or implied), nor accepts any liability as a result of reliance upon the information including (but not limited) content advice, statement or opinion contained in this paper.

This story also contains certain information available in public domain, created and maintained by private and public organizations. I do not control nor guarantee the accuracy, relevance, timeliness or completeness of such information. This story constitutes a view as on the date of publication and is subject to change.

This story makes only a descriptive reference to trademarks that may be owned by others. The use of such trademarks herein is not an assertion of ownership of such trademarks by me or my employer nor is there any claim made to these trademarks and is not intended to represent or imply the existence of an association between me and the lawful owners of such trademarks.

--

--

Celman Elden D. Sudaria
Celman Elden D. Sudaria

Written by Celman Elden D. Sudaria

A Data Architect with over 20 years of experience in Data Architecture, Data Management & Data Engineering. https://ph.linkedin.com/in/celmaneldendsudaria

No responses yet