Impoving Data Quality in Analytics Department with Test Driven Development

Nowadays, Analytics department is being more critical organization in the company. Lots of company are getting to be a data-driven company based on data. In this approach, quality of the data is a keystone of the flow

Motivation

According to DIKW framework, data is being collected from different sources like consumer data, order data. After then, data is processed via different tools and an information is created. For example, data visualization is a part of information layer. The last step would be to understand and evaulate the information then take an action. For example, managers realize the sales decrease via dashboards,than they can take an action.

The point is to note that if you give incorrect information, what would be the action? At this point, I would like to catch your attention to data quality

What is Test Driven Development ?

According to Test Driven Development, all the process should be started with to write test cases. This step requires well defined business cases in order to start with test. For example, in case you integrate a new data source to your analytical datawarehouse, you need to understand data validation steps.

Let’s take a look at two approaches

Case A

Bizz : Username and total sales data should be integrated to datawarehouse

Dev : Start development

Case B (TDD)

Bizz : Username and total sales data should be integrated to datawarehouse

Dev : According to TDD, I should start with test cases. What is the use case ? What are the data checks for username and total sales data ?

Bizz: Of course. All the data should …..

According to TDD, you should wite test case at first. This is the first step of test driven development. After than, second step is to develop your code in order to pass the tests. When your tests are passed, you can improve the codes as a third step. That is all

Let’s understand the flow step by step with simple code snippets

Bizz : As a business owner, I need to see username data in datawarehouse. This field should between 5 and 10 characters

Start with test case

This test case will be failed because there is no UsernameValidator class. In order to pass the test, you should develop isValid method

Run the test case again

You should be able to see that the test are passed.

Final Words

Data is very important to create a business value.In case you provide incorrect data, your company could take a wrong decisions because of the dirty data. In this case, Test Driven Development comes to play

Data & Cloud Architect and Trainer .