Creating simple data pipeline with AWS Step Functions

Serkan SAKINMAZ
5 min readNov 27, 2022

--

Data pipeline is the skeleton of your application when you run multiple steps to create the final output. It consist of data collection, data processing , data transformation and output creation.

You can see the sample pipeline at the below;

Source : hazelcast

There are multiple options to create data pipeline like Apache Oozie and Apache Airflow. In this case, AWS present one of the serverless service which is entitled AWS Step Functions

In this blog, we are going to create a sample pipeline in AWS using Step Functions. AWS Step function is a workflow service in order to execute different AWS service with workflow manner.

Features of Step Functions

  • Visualisation of the steps
  • Monitoring
  • Serverless
  • Schedule the workflow
  • Built-in integrations with AWS services like Lambda,IAM etc
  • Error handling
  • Able to implement parallel states

Important features in the workflow

StartAt → Indicates where the step function is going to start

State → State represents a single work in the step functions. For example, you need to download a file from S3. This is one of state in the workflow

Type → In general, it is used with “Type”: “Task” that represent a single unit of work

Next → Indicates the state to be started after finishing current task

Comment → Description of the state

InputPath → Indicates an input for task state

“InputPath”: “$.input”

“input”: { “val1”: “a”, “val2”: “b”, “val3”: “c” }

Parameters → Define the parameters to be used in the state machine definition

"Parameters": {
"CarDetails": {
"model.$": "$.car.model",
"year.$": "$.car.year",
"owner": "Serkan"
}
}

“Type”: “Pass” → Pass the input as an output

Resource → Indicates the AWS service resource

“Resource”:”arn:aws:lambda:eu-central-1:01234567890:function:my_lambda_function”

Sample Application with AWS Step Functions

Step 1-Open the Step Functions service

Step 2-Click to state machines on the left side

Step 3-Click to Create State machine in order to create new step function

Step 4-To upload your code, select the “Write your workflow in code”

Step 4 — Select Standard and paste your code to Definition

As you see on the left side, Step functions uses configuration on the left side and the steps are being represented via graphical flow on the right side.

The code is copied from aws.com

Step 4-Check the graphical layer

In this example, there is a choose after “Hello World example”. If it returns “True”, wait for 3 seconds. After waiting 3 seconds, there is a parallel state that process “Hello” and “World” separately. Once the parallel jobs have finished, latest “Hello World” task is started and finishing with “End” state.

Step 5— Click to Next

Step 6-Keep the information as is and click to create

Step 7-You will see the state machine in the list

Step 8-Click to state machine and click to “Start execution” button

You can adjust the input.Since the flow has a decision based on “IsHelloWorldExample”, you can put the following input.

The step functions will run and you will see the visual running graphs

We have successfully run the application. Let’s change the parameter to run with different way

Run the step functions again;

You can see the step function flow with different way. As a next step, I would like to call one of Lambda function to be called via Step functions

Let’s do step by step

Step 1- The Lambda function will take two number as an input in order to find the sum of values. Create a lambda function with the below codes

import json

def lambda_handler(event, context):
number1 = event[‘Number1’]
number2 = event[‘Number2’]
sum = number1 + number2

return {
‘statusCode’: 200,
‘Sum’: sum
}

Step 2-Create a state machine, drag drop the Lambda

After Lambda integration, you should see the following flow

Select the Lambda function

Click to next and you will see the code is generated

The state machine is going to be created

Click to start execution and run with parameter

{
“Number1”: 5,
“Number2”: 10
}

When you run step functions, you will see the sum of the numbers that is calculated via Lambda

Congrats ! You have implemented the step funciton that calls the Lambda successfully.

Conclusion

In this blog, we have implemented a sample application via step functions. As you see, step functions are very useful to create data processing flow, since it is serverless, you don’t need to manage any infra. If yu don’t want to implement the configuration, you can also drag and drop the AWS services to be used in the flow

--

--