Impoving Data Quality in Analytics Department with Test Driven Development
Nowadays, Analytics department is being more critical organization in the company. Lots of company are getting to be a data-driven company based on data. In this approach, quality of the data is a keystone of the flow
Motivation
According to DIKW framework, data is being collected from different sources like consumer data, order data. After then, data is processed via different tools and an information is created. For example, data visualization is a part of information layer. The last step would be to understand and evaulate the information then take an action. For example, managers realize the sales decrease via dashboards,than they can take an action.
The point is to note that if you give incorrect information, what would be the action? At this point, I would like to catch your attention to data quality
What is Test Driven Development ?
According to Test Driven Development, all the process should be started with to write test cases. This step requires well defined business cases in order to start with test. For example, in case you integrate a new data source to your analytical datawarehouse, you need to understand data validation steps.
Let’s take a look at two approaches
Case A
Bizz : Username and total sales data should be integrated to datawarehouse
Dev : Start development
Case B (TDD)
Bizz : Username and total sales data should be integrated to datawarehouse
Dev : According to TDD, I should start with test cases. What is the use case ? What are the data checks for username and total sales data ?
Bizz: Of course. All the data should …..
According to TDD, you should wite test case at first. This is the first step of test driven development. After than, second step is to develop your code in order to pass the tests. When your tests are passed, you can improve the codes as a third step. That is all
Let’s understand the flow step by step with simple code snippets
Bizz : As a business owner, I need to see username data in datawarehouse. This field should between 5 and 10 characters
Start with test case
This test case will be failed because there is no UsernameValidator class. In order to pass the test, you should develop isValid method
Run the test case again
You should be able to see that the test are passed.
Final Words
Data is very important to create a business value.In case you provide incorrect data, your company could take a wrong decisions because of the dirty data. In this case, Test Driven Development comes to play