Creating simple data pipeline with AWS Step Functions
Data pipeline is the skeleton of your application when you run multiple steps to create the final output. It consist of data collection, data processing , data transformation and output creation.
You can see the sample pipeline at the below;
There are multiple options to create data pipeline like Apache Oozie and Apache Airflow. In this case, AWS present one of the serverless service which is entitled AWS Step Functions
In this blog, we are going to create a sample pipeline in AWS using Step Functions. AWS Step function is a workflow service in order to execute different AWS service with workflow manner.
Features of Step Functions
- Visualisation of the steps
- Monitoring
- Serverless
- Schedule the workflow
- Built-in integrations with AWS services like Lambda,IAM etc
- Error handling
- Able to implement parallel states
Important features in the workflow
StartAt → Indicates where the step function is going to start
State → State represents a single work in the step functions. For example, you need to download a file from S3. This is one of state in the workflow
Type → In general, it is used with “Type”: “Task” that represent a single unit of work
Next → Indicates the state to be started after finishing current task
Comment → Description of the state
InputPath → Indicates an input for task state
“InputPath”: “$.input”
“input”: { “val1”: “a”, “val2”: “b”, “val3”: “c” }
Parameters → Define the parameters to be used in the state machine definition
"Parameters": {
"CarDetails": {
"model.$": "$.car.model",
"year.$": "$.car.year",
"owner": "Serkan"
}
}
“Type”: “Pass” → Pass the input as an output
Resource → Indicates the AWS service resource
“Resource”:”arn:aws:lambda:eu-central-1:01234567890:function:my_lambda_function”
Sample Application with AWS Step Functions
Step 1-Open the Step Functions service
Step 2-Click to state machines on the left side
Step 3-Click to Create State machine in order to create new step function
Step 4-To upload your code, select the “Write your workflow in code”
Step 4 — Select Standard and paste your code to Definition
As you see on the left side, Step functions uses configuration on the left side and the steps are being represented via graphical flow on the right side.
Step 4-Check the graphical layer
In this example, there is a choose after “Hello World example”. If it returns “True”, wait for 3 seconds. After waiting 3 seconds, there is a parallel state that process “Hello” and “World” separately. Once the parallel jobs have finished, latest “Hello World” task is started and finishing with “End” state.
Step 5— Click to Next
Step 6-Keep the information as is and click to create
Step 7-You will see the state machine in the list
Step 8-Click to state machine and click to “Start execution” button
You can adjust the input.Since the flow has a decision based on “IsHelloWorldExample”, you can put the following input.
The step functions will run and you will see the visual running graphs
We have successfully run the application. Let’s change the parameter to run with different way
Run the step functions again;
You can see the step function flow with different way. As a next step, I would like to call one of Lambda function to be called via Step functions
Let’s do step by step
Step 1- The Lambda function will take two number as an input in order to find the sum of values. Create a lambda function with the below codes
import json
def lambda_handler(event, context):
number1 = event[‘Number1’]
number2 = event[‘Number2’]
sum = number1 + number2
return {
‘statusCode’: 200,
‘Sum’: sum
}
Step 2-Create a state machine, drag drop the Lambda
After Lambda integration, you should see the following flow
Select the Lambda function
Click to next and you will see the code is generated
The state machine is going to be created
Click to start execution and run with parameter
{
“Number1”: 5,
“Number2”: 10
}
When you run step functions, you will see the sum of the numbers that is calculated via Lambda
Congrats ! You have implemented the step funciton that calls the Lambda successfully.
Conclusion
In this blog, we have implemented a sample application via step functions. As you see, step functions are very useful to create data processing flow, since it is serverless, you don’t need to manage any infra. If yu don’t want to implement the configuration, you can also drag and drop the AWS services to be used in the flow