Updated: Nov 10
Maybe it is just me, do any of you go about your day looking at things and say, “I wonder what the data structures behind that looks like"? I do… I remember being at a large theme park that had kiosks to reserve ride times. As I was reserving my time, I started solutioning the data design in my head. I know you're smiling deep inside because you do the same thing. Well, at least I hope I am not the only one who does this. I am always looking at ways to make technical things make sense to non-technical people and those that have worked with me long enough know that I often make my analogies related to food, as I have done in my last few posts, or about building and constructing homes. It's probably because I enjoy food and at the start of my career, I wanted to be an Architect. I didn’t become a building Architect, but I did become a Data Architect. Construction and building things are still somewhat of an interest to me.
So, Keith, what is your point? Well, today I had to go to the grocery store and as I was in the checkout line, I was watching the items being checked out by the person in front of me and in my head, I was going Beep, Beep, Beep… (insert sales order line, insert Sales order line…) I noticed in the magazine rack, right next to a celebrity gossip magazine, a magazine with home plans. Then, it happened again. My brain went into Data Vault Geek mode. One of the three pillars of Data Vault 2.0 is the architecture, and just like home plans in a magazine that will illustrate to you the layout of a home, Data Vault 2.0 gives you the layout for your Data Vault implementation. Home plans (Blueprints) do not give you the exact details of specific components you will need to build that home, as those are discussions that you would have with your architect and contractor, such as, we want wood floors over here and tiles over there, oiled bronzed light fixtures here and cabinets of a certain brand over there. As you make those component selections that fit your needs and budget within that laid out floor plan, the same holds true with the Data Vault 2.0 floor plan, It will not tell you what data repository, ELT solution or ingestion technology to leverage. Those selections all need to be made by your organization. Those selections are dependent on the type of data you have, your technical expertise, budgetary needs, and other variables that only your organization understands. What is important and extremely critical for a successful Data Vault 2.0 implementation is that you stay within the defined Data Vault Flow and Architecture. Like a blueprint for a home has a flow and arrangement of rooms, the Data Vault 2.0 architecture has an identical concept. Let us now take a brief walk through of the Data Vault architectural floor plan. I think you will find the Data Vault architecture to be very flexible and a scalable to meet your organizational needs. This Data Vault layout is made of three primary areas and is the perfect layout for any size organization.
Staging / Landing Area
When we first enter the Data Vault, we find ourselves in the Staging / Landing layer. This is a very welcoming location and is a great arrival point for your incoming data. In the Staging and Landing layer, we are ready to accommodate data that arrives in large batches or even individual smaller sets of data that are streaming in all day. This area of the architecture is accepting of data in all formats. This layer can take in Structured, Semi Structured and even Unstructured formats of data. In the Staging and Landing area we do not ask our incoming data to change or re-format themselves. Now some organizations may only allow their data to stay in this area for a relatively short period of time while others choose a more persistent style Staging / Landing layer for their organization. I am not going to go into the pros or cons of either currently, maybe a topic for another day. The only addition to the data coming into this area is the addition of important pieces of metadata. This additional metadata is used for assisting with your data being able to move along over to the next layer in your floor plan.
Data Warehouse Layer
The second layer of our three-layer layout is the Data Warehouse Layer. Now from my personal perspective, this is probably one of the most important areas of the Data Vault 2.0 architecture. The purpose of this area is to contain all your organizations historical data and to be fully auditable back to its originating source. The data in this area is organized by following the Data Vault 2.0 modeling techniques and is handled in a non-immutable fashion. Meaning data is never updated or deleted and even more importantly, it is never transformed with any type of soft business rule. A more popular name for this area is the Raw Data Vault Layer and you will tend to hear this term used more often. Some organizations may also choose to add on an addition to the Data Warehouse Layer. That optional addition is called the Business Vault. The rule in loading the Raw Data vault is that there are no soft business rules applied, Some organizations with overly complex business rules may choose to create this area to assist the development of the Information Delivery Layer
Information Mart / Delivery Layer
The third and last layer of the Data Vault floor plan is the Information Mart Area. While the two other areas are not often typically accessed by the end-users, (though, I think over more recent years, seasoned data scientists have leveraged the data within the Raw Vault layer) The information Mart Layer is designed to provide the data in a way the business user feels most comfortable with. Ultimately, the overall goal of the Information Mart Layer is to provide valuable information to the business so that they may be able to derive important and critical insight. The structures in this area can often be virtual structures over the Raw and or Business Vaults structures. Structures can be in Third Normal, Dimensional or even flat denormalized structures at low or even aggregated grains. This is also the area that business rules can and should be applied. This is also an area where some BI and analytics tools can be provided the ability to write-back as well, because all the raw low grain data resides in the Raw Data Vault. If there is a need for a business rule to change, a new structure can easily be generated, and old structures deprecated.
These three areas in combination make for a very solid and time-tested floor plan and architecture for your Data Vault implementation. Though over the years many technologies, data formats and data volume have changed, this architecture is still viable and holds true. The last point I want to make is, if you want to have an effective and strong Data Vault, you must adhere to these three layers as they are mandatory constructs. If I were to put my construction hat back on, I would say, “It is against code to not include or skip any of these layers”. So, work closely with your Architecture and IT team to be sure your Data Vault Floor plan is following these three Layers while you are picking your complimentary vendor solutions.