Smart Strategies to Reduce Your AWS Athena Costs Efficiently
Written on
Understanding AWS Athena Costs
Are you considering utilizing AWS Athena or looking for effective ways to reduce your current expenses? This guide will provide insights into maintaining both efficiency and affordability from the outset. We will explore techniques for controlling Athena costs, including the use of Workgroups and CloudWatch metrics, as well as the advantages of columnar data formats and query reuse.
Know Your Athena Expenses
Athena offers a remarkably low-cost entry point, creating the impression that operating it will always be inexpensive. However, as the next billing cycle approaches, you might be in for a shock if your AWS Cost Explorer reveals a steadily increasing charge without a clear breakdown. Don't fret; we can address this concern.
Creating an Athena Workgroup for Each Application
Utilizing Athena workgroups is an excellent strategy to manage your Athena expenses effectively. Since Athena charges based on the number of bytes scanned during queries and does not allow for tagging individual queries, setting up a dedicated workgroup for each application is beneficial. By tagging your workgroup, you can track costs associated with each application in the AWS Cost Explorer.
Here's how to create a workgroup and apply tags to it:
# Create Athena WorkGroup and add tags to it
athena_workgroup_configuration = cdk.aws_athena.CfnWorkGroup.WorkGroupConfigurationProperty(
result_configuration=cdk.aws_athena.CfnWorkGroup.ResultConfigurationProperty(
output_location=f's3://{athena_bucket.bucket_name}/athena',),
publish_cloud_watch_metrics_enabled=True
)
athena_workgroup = cdk.aws_athena.CfnWorkGroup(
self, 'my-app-athena-workgroup',
name='dev.my-company.my-app',
state='ENABLED',
work_group_configuration=athena_workgroup_configuration,
tags=[
cdk.CfnTag(key='Environment', value='dev'),
cdk.CfnTag(key='Company', value='my-company'),
cdk.CfnTag(key='Product', value='my-app')
]
)
athena_workgroup.apply_removal_policy(cdk.RemovalPolicy.RETAIN)
Setting Up CloudWatch Alerts for Data Monitoring
Athena's pricing structure is contingent on both the amount of data scanned per query and the total number of queries executed. To maintain control over your Athena costs, you should closely monitor these metrics. This can be done through CloudWatch, where you can establish alarms for both parameters. First, ensure that your Workgroup is configured to send metrics to CloudWatch via the publish_cloud_watch_metrics_enabled parameter.
Next, here's an example of how to create an alarm for monitoring the average processed bytes per query:
# Monitor Average Bytes Scanned Per Query
average_processed_bytes = cdk.aws_cloudwatch.Metric(
namespace='AWS/Athena',
metric_name='ProcessedBytes',
dimensions_map={
'WorkGroup': athena_workgroup.name},
period=cdk.Duration.minutes(1),
statistic='Average'
)
average_processed_bytes_alarm = cdk.aws_cloudwatch.Alarm(
self, 'my-app-component-athena-average-processed-bytes-alarm',
metric=average_processed_bytes,
threshold=10 * 1024 * 1024 * 1024,
evaluation_periods=1,
comparison_operator=cdk.aws_cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
alarm_name='my-app-component-athena-average-processed-bytes-alarm',
alarm_description='Alarm if average processed bytes exceed 10GB',
actions_enabled=True
)
You can also set up an alarm to monitor the total number of queries executed in a day:
# Monitor Total Queries Per Day
number_of_queries_1day_sum_metric = cdk.aws_cloudwatch.Metric(
namespace='AWS/Athena',
metric_name='ProcessedBytes',
dimensions_map={
'WorkGroup': athena_workgroup.name},
period=cdk.Duration.hours(24),
statistic='SampleCount'
)
total_queries_1day_alarm = cdk.aws_cloudwatch.Alarm(
self, 'my-app-component-athena-total-queries-1day-alarm',
metric=number_of_queries_1day_sum_metric,
threshold=1000,
evaluation_periods=1,
comparison_operator=cdk.aws_cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
alarm_name='my-app-component-athena-total-queries-1day-alarm',
alarm_description='Alarm if daily total queries exceed 1000',
actions_enabled=True
)
Although both alarms utilize the ProcessedBytes metric, they differ in terms of statistics and periods.
Reduce Your Athena Costs
Your Athena expenses are heavily influenced by the amount of data scanned per query. Here are several strategies to help you minimize these costs:
- Utilize Columnar File Formats: While Athena can query tables backed by ND-JSON and CSV files stored on S3, it is advisable to avoid these formats. Consider this example:
- Querying a dataset of one million records in ND-JSON format.
- The same dataset in Parquet format, which is significantly cheaper—up to eight times less costly.
- Avoid Using 'SELECT *' in Queries: By leveraging columnar formats such as Parquet or ORC, you can further reduce costs by specifying only the columns needed rather than using SELECT *. This can lead to substantial savings. For instance, a query on a million records in Parquet that only specifies the required column results in a cost reduction of up to 40 times compared to querying all columns.
- Implement Athena Query Reuse: This feature is not only free but also expedites your queries. When multiple clients execute the same query on data that is refreshed daily, query reuse allows Athena to use previous results rather than scanning data again. The first query incurs a cost, while subsequent identical queries during the reuse period are free and much faster.
Key Considerations
- Use Athena Workgroups and CloudWatch Alerts to track your application expenses and understand Athena's traffic model.
- Store data in columnar formats like Parquet to achieve a cost reduction of six to eight times compared to ND-JSON and CSV formats.
- Avoid using SELECT * in your queries by specifying only the necessary columns.
- Leverage the Athena Query Reuse feature to save costs on frequently accessed data that rarely changes.
Final Thoughts
If you found this information helpful, please consider clapping a few times and following for more insights. For additional reading on AWS Serverless technologies, check out another article that has received acclaim from the AWS Serverless Hero community.
The first video provides detailed strategies for optimizing query costs with AWS Athena.
The second video discusses how to analyze cost and usage reports effectively using Amazon Athena.