jkisolo.com

Smart Strategies to Reduce Your AWS Athena Costs Efficiently

Written on

Understanding AWS Athena Costs

Are you considering utilizing AWS Athena or looking for effective ways to reduce your current expenses? This guide will provide insights into maintaining both efficiency and affordability from the outset. We will explore techniques for controlling Athena costs, including the use of Workgroups and CloudWatch metrics, as well as the advantages of columnar data formats and query reuse.

Know Your Athena Expenses

Athena offers a remarkably low-cost entry point, creating the impression that operating it will always be inexpensive. However, as the next billing cycle approaches, you might be in for a shock if your AWS Cost Explorer reveals a steadily increasing charge without a clear breakdown. Don't fret; we can address this concern.

Creating an Athena Workgroup for Each Application

Utilizing Athena workgroups is an excellent strategy to manage your Athena expenses effectively. Since Athena charges based on the number of bytes scanned during queries and does not allow for tagging individual queries, setting up a dedicated workgroup for each application is beneficial. By tagging your workgroup, you can track costs associated with each application in the AWS Cost Explorer.

Here's how to create a workgroup and apply tags to it:

# Create Athena WorkGroup and add tags to it

athena_workgroup_configuration = cdk.aws_athena.CfnWorkGroup.WorkGroupConfigurationProperty(

result_configuration=cdk.aws_athena.CfnWorkGroup.ResultConfigurationProperty(

output_location=f's3://{athena_bucket.bucket_name}/athena',

),

publish_cloud_watch_metrics_enabled=True

)

athena_workgroup = cdk.aws_athena.CfnWorkGroup(

self, 'my-app-athena-workgroup',

name='dev.my-company.my-app',

state='ENABLED',

work_group_configuration=athena_workgroup_configuration,

tags=[

cdk.CfnTag(key='Environment', value='dev'),

cdk.CfnTag(key='Company', value='my-company'),

cdk.CfnTag(key='Product', value='my-app')

]

)

athena_workgroup.apply_removal_policy(cdk.RemovalPolicy.RETAIN)

Setting Up CloudWatch Alerts for Data Monitoring

Athena's pricing structure is contingent on both the amount of data scanned per query and the total number of queries executed. To maintain control over your Athena costs, you should closely monitor these metrics. This can be done through CloudWatch, where you can establish alarms for both parameters. First, ensure that your Workgroup is configured to send metrics to CloudWatch via the publish_cloud_watch_metrics_enabled parameter.

Next, here's an example of how to create an alarm for monitoring the average processed bytes per query:

# Monitor Average Bytes Scanned Per Query

average_processed_bytes = cdk.aws_cloudwatch.Metric(

namespace='AWS/Athena',

metric_name='ProcessedBytes',

dimensions_map={

'WorkGroup': athena_workgroup.name

},

period=cdk.Duration.minutes(1),

statistic='Average'

)

average_processed_bytes_alarm = cdk.aws_cloudwatch.Alarm(

self, 'my-app-component-athena-average-processed-bytes-alarm',

metric=average_processed_bytes,

threshold=10 * 1024 * 1024 * 1024,

evaluation_periods=1,

comparison_operator=cdk.aws_cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,

alarm_name='my-app-component-athena-average-processed-bytes-alarm',

alarm_description='Alarm if average processed bytes exceed 10GB',

actions_enabled=True

)

You can also set up an alarm to monitor the total number of queries executed in a day:

# Monitor Total Queries Per Day

number_of_queries_1day_sum_metric = cdk.aws_cloudwatch.Metric(

namespace='AWS/Athena',

metric_name='ProcessedBytes',

dimensions_map={

'WorkGroup': athena_workgroup.name

},

period=cdk.Duration.hours(24),

statistic='SampleCount'

)

total_queries_1day_alarm = cdk.aws_cloudwatch.Alarm(

self, 'my-app-component-athena-total-queries-1day-alarm',

metric=number_of_queries_1day_sum_metric,

threshold=1000,

evaluation_periods=1,

comparison_operator=cdk.aws_cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,

alarm_name='my-app-component-athena-total-queries-1day-alarm',

alarm_description='Alarm if daily total queries exceed 1000',

actions_enabled=True

)

Although both alarms utilize the ProcessedBytes metric, they differ in terms of statistics and periods.

Reduce Your Athena Costs

Your Athena expenses are heavily influenced by the amount of data scanned per query. Here are several strategies to help you minimize these costs:

  1. Utilize Columnar File Formats: While Athena can query tables backed by ND-JSON and CSV files stored on S3, it is advisable to avoid these formats. Consider this example:
    • Querying a dataset of one million records in ND-JSON format.
    • The same dataset in Parquet format, which is significantly cheaper—up to eight times less costly.
  2. Avoid Using 'SELECT *' in Queries: By leveraging columnar formats such as Parquet or ORC, you can further reduce costs by specifying only the columns needed rather than using SELECT *. This can lead to substantial savings. For instance, a query on a million records in Parquet that only specifies the required column results in a cost reduction of up to 40 times compared to querying all columns.
  3. Implement Athena Query Reuse: This feature is not only free but also expedites your queries. When multiple clients execute the same query on data that is refreshed daily, query reuse allows Athena to use previous results rather than scanning data again. The first query incurs a cost, while subsequent identical queries during the reuse period are free and much faster.

Key Considerations

  • Use Athena Workgroups and CloudWatch Alerts to track your application expenses and understand Athena's traffic model.
  • Store data in columnar formats like Parquet to achieve a cost reduction of six to eight times compared to ND-JSON and CSV formats.
  • Avoid using SELECT * in your queries by specifying only the necessary columns.
  • Leverage the Athena Query Reuse feature to save costs on frequently accessed data that rarely changes.

Final Thoughts

If you found this information helpful, please consider clapping a few times and following for more insights. For additional reading on AWS Serverless technologies, check out another article that has received acclaim from the AWS Serverless Hero community.

The first video provides detailed strategies for optimizing query costs with AWS Athena.

The second video discusses how to analyze cost and usage reports effectively using Amazon Athena.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Exploring the ASUS Chromebook Vibe CX34: A Mixed Bag

An in-depth look at the ASUS Chromebook Vibe CX34, its specs, performance, and the challenges facing ChromeOS.

Exploring the Joys and Benefits of Self-Employment

Discover the advantages of being self-employed, including flexible hours, job security, and a balanced personal life.

Embracing Life Today: C.S. Lewis's Insights on Procrastination

Explore C.S. Lewis's thoughts on seizing the moment rather than waiting for an ideal time that may never arrive.

4 Quick Side Hustle Ideas to Earn Money Instantly!

Discover four effective side hustle ideas that can help you start making money today with minimal effort.

Transformative Insights from a Decade of Spiritual Exploration

Discover valuable lessons learned from ten years of spiritual exploration that can foster personal healing and growth.

Understanding the Essence of Uncertainty in Quantum Mechanics

A simplified exploration of uncertainty principles in quantum mechanics and mathematics, highlighting their fundamental nature.

Unlocking Your Potential: 11 Insights from Ryan Holiday

Explore 11 thought-provoking insights from Ryan Holiday to overcome procrastination and enhance your productivity.

generate a new title here, between 50 to 60 characters long

Explore curated stories from ILLUMINATION's collection #72, featuring insights from various contributors and an interview with Dr. Mehmet Yildiz.