Do you have any difficulty analysing the CSV, Json, Avro, Parquet, XML file in your company or research project? In this blog, I am going to explain how to analyse this type of files using standard SQL. Another point is to note that , there is no size limit! Sounds good?
The following image shows just a simple example of how you query CSV file using standard SQL. Let’s deep dive into the architecture and steps that needs to be done
Kafka is one of the popular open-source data streaming tool that is used for multiple use-cases. The main idea is to implement the event-driven architecture to process and distribute the data in a scalable and real-time way. Since you are working with data, it is important to implement a consistent data integration, and from this point of view, I will mention how to tackle unexpected situations on streaming projects like invalid data, exceptions etc
In this section, I am going to explain one of the most popular architectural transformation approach which is about moving from monolith to event-driven Microservices architecture. Lots of companies are making this transformation and I would like to elaborate on why the IT world is making that transformation and examine all pros and cons of the different approaches.
In general, a monolithic application confronts a single database and multiple layered applications on top of it. These layers can be differentiated based on the architectural design such as user interface, business layer and data layer.
AWS Certified Solutions Architect Professional Exam is one of the most popular and challengeable exam within AWS certification exams. In this section, I have listed up some example questions and answers to them
The accounting firm has an on-prem database in the Oracle server. Basically, the database is used to store customer information and accounting movements. Due to audit rules, user information needs to be stored for 5 years. Once the audit authority comes in 5 years, they want to query via SQL within random customers in order to see whether customers are being tracked. Due to some financial issues…
Thymeleaf is a simple library that allows to run the HTML templates in Java applications. In this blog, I am going to create a simple web form within a Java project in order to get some inputs via thee web page. This blog will be very summarised to see how the approach is simple
Step 1-Create a simple java project via maven
Step 2-Add the following libraries to the maven, we are going to use Spring Boot Starter. The main libraries are spring-boot-starter-web and thymeleaf
Step 3-Create an Application with the tag @SpringBootApplication this will be the starting point…
Near real-time processing is one of the popular approaches that require processing data within minutes. Once the data arrives at the storage layer, you need to process it within a couple of minutes.
There is no exact specification to define near-real in terms of the processing time period. I used the following graph as a reference from Oreilly that gives an idea to explain different processing types. For the near-real-time, you can see the processing time is between roughly 5 min to 60 min.
Kubernetes Application Developer exam is one of the challengeable assessment, it is different from other exams since you need to implement real configurations instead of selecting A,B,C,D.
In this section, I am going to give you some exam preparation questions and answers for CKAD-Certified Kubernetes Application Developer. In order to run them, you can use minikube on your local computer.
In the default namespace, which of the following command help you to identify the top memory consuming pod?
Once you run the command at the below…
CCDAK is one of the most popular exam for Apache Kafka. In this section, I have listed up some example questions
Kafka Connect can be run in these modes; (Select two option)
Kafka can be run with Standalone mode and Distributed mode. Standalone mode is useful for development and testing Kafka Connect on a local machine.
Distributed mode runs Connect workers on multiple machines (nodes)
To add a field without default value is a ….. compatibility
To add a field without default value is forward compatibility (or delete…
When you need to process any amount of data, there are different types of data processing approaches like batch, stream processing and micro-batch. According to your use case, you can use these processing methods with the help of libraries such as Spark,Hadoop etc.
Before explaining 3 different processing methods, I would like to give some hints about the value of data processing. When you see the following diagram, please put attention to the interesting term; The diminishing value of data …
In this section, I m going to quickly explain RTO and RPO which are mostly used in your disaster recovery strategy. Before digging into this topic, I would recommend to read this blog
RTO (Recovery Time Object) is the time difference between the disaster starting time and system restoring time. Let’s suppose the disruption happens in your infrastructure. Depends on your disaster recovery strategy, your infrastructure will be restored sometime later, and the business process will continue accordingly. This time interval can be entitled Recovery Time Object (RTO).
RPO (Recovery Point Objective) is the recovery…