Data Engineer Interview Series:Data Modeling-Part 2

Data Engineering Life Cycle

Anju Mercian

--

from Fundamentals of Data Engineering book by Joe Reis and Matt Housley

Flow of the article

The goal of this article is to provide a cheatsheet to those preparing for a data engineering interview.

Caveat: this is a cheatsheet I wrote when I was preparing for my interviews. If you would like me to add or change anything please let me know in the comments.

In this article, I will go over all the definitions one needs to be aware of regarding data modeling, while preparing for a data engineering interview. This is the second half to the first article (link). In this article I will go over the theory of the data modeling concepts and will end the article with a data modeling question, the solution to the question and the thought process I will explain in the third article.

What is a data model?

A data model represents the way data relates to the real world. It is an abstraction that organizes elements of data and how they relate to each other. A data model reflects how, the data must be structured and standardized to best reflect an organization’s processes, workflows and logic.

  • A good data model captures how communication and work-streams naturally flow within an organization.
  • In contrast a poor data model is confusing, incoherent and haphazard.

Data modeling is the process for creating data models for an information system. Data modeling is to organize data into a database system to ensure that your data persisted and easily usable. Support business and user application.

Data modeling is about how to structure data to be used by different people within an organization. the process of designing data to make it usable for downstream users like Machine learning engineers, data scientists, BI team.

Ensures that the business logic and rules are translated into data layer.

The data modeling discussion in this article is on batch data processing. There are different techniques for stream data processing, which i will cover in future articles.

Why Data modeling is important

  • Data organization is…

--

--

Anju Mercian

ML and Data Enthusiast|Writing about my experience with learning ML/Data