How Amazon is Billing the S3 Service and Key Tips to Reduce Cost
AWS S3 is a web service which is used to store every type of objects in a cloud such as video, picture, text. The main advantage of the S3 is to provide highly scalable, low latency, reliable and cheap storage system.
Before using AWS S3 Service in your organisation ,it would be better to understand that how AWS is billing the service and which type of S3 service you should use
According to technical use case, AWS is offering different type of S3 services
S3 Storage Classes
It will be good start to understand S3 service types before diving into billing strategy
Aws is offering a different type of services. Regarding to analytical use-case, you should select an appropriate service type. Durability means that you might lose 0.000000001% of the objects. On the other hand, Availability means
S3-Standart is used for most-frequently accessed data with millisecond latency.
S3-IA is used for infrequent access which is more cheaper than S3-Standart. The important thing is, in case accessing data frequently, the cost would much more than S3-Standart. When you select S3-IA, you charged a minimum 30 days even if data is stored less period.
S3-Intelligent is a newly announced service which offers frequent access layer and infrequent access layer at the same time. All the data access patterns are monitored by AWS, and in-frequent accessed data is moved to lower-cost tier.
S3-Onezone is similar with S3-IA which is stored only one availability zone. In case if you have re-creatable files, it would be better to use S3-Onezone.
Glacier is useful for archiving. When you don’t need to retrieve data in seconds, you can choose Glacier in order to decrease the cost
S3 Billing Strategy
S3 Billing strategy is varying based on service type, the volume of data, region, data transfer, replication and data retrieving.
I am referencing AWS web site and Ireland region. But, these costs might be changed by the AWS
Data Volume
This feature is used for the volume of data.
Data Transfer
Data Transfer means that transferring data;
- from the internet to Amazon S3
- from S3 to the internet
- from one Availability Zone to another Availability Zone
Basically, in the case of transfer any data from the internet to S3, you aren’t charged. But, if you transfer data from S3 to the internet or another region, you are charged.
Select Query
When you scan data for processing, you are charged separately
Transfer Acceleration
First of all, it would be better to understand what the transfer acceleration is. In case you want to accelerate your data transfer, this would be an option. Let’s imagine your customer is in Japan and would like to upload data to Ireland region. S3 offers some edge locations to accelerate the uploading time.
Athena and Redshift Queries
Aws has different billing strategy for Athena and Redshift. In summary; Athena is used for querying from S3 via SQL.
Take a look at below example;
Redshift is a Data Warehouse solution which can read data from S3 as well.
Billing is based on the amount of data scanned in each query, priced at $5 per TB scanned.
Key Tips to Reduce Cost
Using Lifecycle Management
Lifecycle management is a service that provides to configure some of the rules such as data deletion or migration between types of S3
Use partitioning
Partitioning is the most important thing to reduce the query time as well as the query cost
Using parquet format rather than CSV
Using parquet format or other data storage format could decrease your file size and accelerate the query time
Final Words
When you select the type of service, you need to make sure to understand the volume of data as well as access frequency. This will help you to reduce the cost and accelerate the query time