> For the complete documentation index, see [llms.txt](https://docs.whaly.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.whaly.io/warehouse/amazon-athena/connect-your-athena.md).

# Connect your Athena

In order to connect your Athena cluster, Whaly needs some credentials. This guide will details the necessary steps:

1. Create an IAM User and generate an Access Key (+secret)
2. Select your region & work group

### Prerequisites <a href="#prerequisites" id="prerequisites"></a>

To connect Athena to Whaly, you need the following:

* An AWS Project
* Admin rights on the AWS Project (to create IAM User, custom policy and a S3 Bucket)
* An S3 bucket on which the query results can be written. [You can create one if needed.](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html)&#x20;

{% hint style="info" %}
To save cost on the Output Bucket, you can configure [its Bucket Lifecycle rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html) to delete any file after 1 day as the results won't be used after a query have resolved.&#x20;
{% endhint %}

## Create an IAM User and generate an Access Key (+secret)

To connect to your AWS Athena cluster, Whaly need to have a User and its credentials (Access Key). In order to create such a User, [please follow this guide.](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html)

When being asked which permissions and policies the user should have, [please create a custom Policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) that have the following rights:

{% hint style="info" %}
In the Policy definition, you need to fill the ARNs of the S3 buckets that Whaly will have access to.&#x20;

Whaly user needs to access to 2 kinds of S3 buckets:

* Input buckets: Those are the buckets in which you have the data that is being queried by Athena
* A single Output bucket: This is the bucket that Whaly will use to store the query results
  {% endhint %}

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "athena:GetTableMetadata",
                "athena:StartQueryExecution",
                "athena:ListDataCatalogs",
                "athena:GetQueryResults",
                "athena:GetDatabase",
                "athena:GetDataCatalog",
                "athena:ListWorkGroups",
                "athena:ListQueryExecutions",
                "athena:GetWorkGroup",
                "athena:StopQueryExecution",
                "athena:ListEngineVersions",
                "athena:GetQueryResultsStream",
                "athena:ListDatabases",
                "athena:GetQueryExecution",
                "athena:ListTableMetadata",
                "athena:BatchGetQueryExecution"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabases",
                "glue:GetDatabase",
                "glue:GetTables",
                "glue:GetTable",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchGetPartition"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucketMultipartUploads",
                "s3:PutBucketPublicAccessBlock",
                "s3:AbortMultipartUpload",
                "s3:CreateBucket",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
            // In this list, you should include the S3 ARNs of both inputs and output buckets
            // Ex:
            // Output bucket
            // "arn:aws:s3:::whaly-athena-output/*",
            // "arn:aws:s3:::whaly-athena-output",
            // Input buckets
            // "arn:aws:s3:::whaly-athena-input",
            // "arn:aws:s3:::whaly-athena-input/*",
            // ...
            ]
        }
    ]
}
```

* Once the IAM User created with the proper policy, [you can create an Access Key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html) under it to retrieve the **Access Key Id** and the **Access Key Secret**.

## Select your region & Workgroup

In order to properly query your Athena data, Whaly needs to know in which region you want to run the compute. It should be one of the [AWS Region](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html) (ex. us-east1). A good practise would be to use the same one as the one you are using when doing SQL Queries in the console:

<figure><img src="/files/1KhiXw1Yz6TsYV5BhzeC" alt=""><figcaption></figcaption></figure>

Also, you'll need to [select an existing work group or create one.](https://docs.aws.amazon.com/athena/latest/ug/workgroups-create-update-delete.html) Inside Whaly, you'll need to pass the name of the Workgroup you wish to use when querying your data with Whaly.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.whaly.io/warehouse/amazon-athena/connect-your-athena.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
