Thymeleaf is a simple library that allows to run the HTML templates in Java applications. In this blog, I am going to create a simple web form within a Java project in order to get some inputs via thee web page. This blog will be very summarised to see how the approach is simple

Steps need to be done

Step 1-Create a simple java project via maven

Step 2-Add the following libraries to the maven, we are going to use Spring Boot Starter. The main libraries are spring-boot-starter-web and thymeleaf

Step 3-Create an Application with the tag @SpringBootApplication this will be the starting point…

Near-real time processing is one of the popular approach that require to process data within minutes. Once the data arrives to the storage layer, you need to process within a couple of minutes.

There is no exact specification to define near-real in terms of the processing time period. I used the following graph as a reference from Oreilly that gives an idea to explain different processing types. For the near-real time, you can see the processing time is between roughly 5 min to 60 min.

For me, the most important thing is to focus on how much value that you…

Kubernetes Application Developer exam is one of the challengeable assessment, it is different from other exams since you need to implement real configurations instead of selecting A,B,C,D.

In this section, I am going to give you some exam preparation questions and answers for CKAD-Certified Kubernetes Application Developer. In order to run them, you can use minikube on your local computer.

Question 1

In the default namespace, which of the following command help you to identify the top memory consuming pod?

  • kubectl top pod
  • kubectl exec pod
  • kubectl logs pod
  • kubectl get pods -o wide

Answer 1

Once you run the command at the below…

Do you have any difficulty analysing the CSV, Json, Avro, Parquet, XML file in your company or research project? In this blog, I am going to explain how to analyse this type of files using standard SQL. Another point is to note that , there is no size limit! Sounds good?

The following image shows just a simple example of how you query CSV file using standard SQL. Let’s deep dive into the architecture and steps that needs to be done

CCDAK Confluent Certified Developer for Apache Kafka
CCDAK Confluent Certified Developer for Apache Kafka

CCDAK is one of the most popular exam for Apache Kafka. In this section, I have listed up some example questions

Question 1

Kafka Connect can be run in these modes; (Select two option)

  • Distributed Mode
  • Vertical mode
  • Batch mode
  • Standalone mode

Answer 1

Kafka can be run with Standalone mode and Distributed mode. Standalone mode is useful for development and testing Kafka Connect on a local machine.

Distributed mode runs Connect workers on multiple machines (nodes)

Question 2

To add a field without default value is a ….. compatibility

  • Backward
  • Forward
  • Full
  • Nonen

Answer 2

To add a field without default value is forward compatibility (or delete…

When you need to process any amount of data, there are different types of data processing approaches like batch, stream processing and micro-batch. According to your use case, you can use these processing methods with the help of libraries such as Spark,Hadoop etc.

Before explaining 3 different processing methods, I would like to give some hints about the value of data processing. When you see the following diagram, please put attention to the interesting term; The diminishing value of data …

In this section, I m going to quickly explain RTO and RPO which are mostly used in your disaster recovery strategy. Before digging into this topic, I would recommend to read this blog

Disaster Recovery Strategy in AWS

RTO (Recovery Time Object) is the time difference between the disaster starting time and system restoring time. Let’s suppose the disruption happens in your infrastructure. Depends on your disaster recovery strategy, your infrastructure will be restored sometime later, and the business process will continue accordingly. This time interval can be entitled Recovery Time Object (RTO).

RPO (Recovery Point Objective) is the recovery…

In this section, I m going to explain how to make a disaster recovery plan in your company

Most of the companies are moving their IT infrastructure to cloud in order to take advantage in terms of cost, scalability and flexibility. When you move your IT infrastructure to the cloud, there are different things needs to be considered. One of the most important thing is a disaster recovery strategy. The point is to note that if any disaster happens, you need to make sure the business process keeps going.

These are the basic disaster recovery strategies you need to consider;

In this section, I’m going to explain you how to retrieve data from S3 to your PySpark application. Let’s start step by step

Opening EMR cluster

At first, you need to open an EMR cluster on AWS. These steps are very simple, you are using the benefits of cloud.

Open EMR service from AWS console and create your cluster. The only thing you need to do is to select/download EMR key pair. Otherwise, you couldn’t connect the EMR server

Nowadays, Analytics department is being more critical organization in the company. Lots of company are getting to be a data-driven company based on data. In this approach, quality of the data is a keystone of the flow



Data & Cloud Architect and Trainer .

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store