Accelerate the Value of Data

Set up File Notification mode in AWS

Learn how to set up File Notification mode in AWS for Databricks.

Set up event notification with the required permissions so Databricks is aware of incoming raw files.
To set up File Notification mode:
  1. Create an event notification for the Staging bucket to send messages to an SQS queue, setting the Suffix to .gz. For more information see Walkthrough: Configuring a bucket for notifications.
  2. Update the policy with the assume role permission for Databricks, adding the permission to read the SQS queue.
    For example:
    {
        "Version": "2024-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::<stagingbucketname>/*"
                ]
            },
            {
                "Sid": "VisualEditor2",
                "Effect": "Allow",
                "Action": [
                    "sqs:DeleteMessage",
                    "sqs:ReceiveMessage"
                ],
                "Resource": [
                    "arn:aws:sqs:*:BUCKETS_AWS_ACCOUNT_ID:<queuename>"
                ]
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject"
                ],
                "Resource": "arn:aws:s3:::<targetbucketname>/*"
            }
        ]
    }
  3. Update the Staging bucket policy to update the role name with the Databricks role name.
    For example:
    {
        "Version": "2024-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::BUCKETs_AWS_ACCOUNT_ID:role/<rolename>"
                },
                "Action": [
                    "s3:GetBucketLocation",
                    "s3:ListBucket"
                ],
                "Resource": "arn:aws:s3:::<stagingbucketname>"
            },
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::BUCKETs_AWS_ACCOUNT_ID:role/<rolename>"
                },
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::stagingbucketname/<envirnoment_name>/<tenant_id>/*"
            }
        ]
    }
  4. Update the Table bucket policy to update the role name with the Databricks role name.
    For example:
    {
        "Version": "2024-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::BUCKET_AWS_ACCOUNT_ID:role/<rolename>"
                },
                "Action": [
                    "s3:GetBucketLocation",
                    "s3:ListBucket"
                ],
                "Resource": "arn:aws:s3:::<tablebucketname>"
            },{
                "Effect": "Allow",
                "Principal": {
                    "AWS": "arn:aws:iam::BUCKET_AWS_ACCOUNT_ID:role/<rolename>"
                },
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::<tablebucketname>/*"
            }
        ]
    }