Implement AWS PrivateLink with Snowflake using Terraform
Implementation details for AWS PrivateLink utilised with Snowflake
Introduction
Snowflake is a fully managed SaaS (software as a service) that provides a single platform for data warehousing, data lakes, data engineering, data science, data application development, and secure sharing and consumption of real-time / shared data. Snowflake features out-of-the-box features like separation of storage and compute, on-the-fly scalable compute, data sharing, data cloning, and third-party tools support in order to handle the demanding needs of growing enterprises.
AWS PrivateLink is an AWS service for creating private VPC endpoints that allow direct, secure connectivity between your AWS VPCs and the Snowflake VPC without traversing the public Internet. The connectivity is for AWS VPCs in the same AWS region.
Terraform is an infrastructure as code tool that I use to define both cloud and on-prem resources in human-readable configuration files that I can version, reuse, and share. This is what I will use in this article to configure AWS PrivateLink for Snowflake.
For clarification, there are multiple Snowflake plans and you only get AWS PrivateLink capability on the more expensive plans. See below
Additional capability is only available on Business Critical plan or Virtual Private Snowflake. If you have an account on the lower tier and require PrivateLink then you will need to contact support and they will be able to perform an upgrade for you. Remember such upgrades are not free :).
Architecture
AWS PrivateLink is a purpose-built technology that enables direct, secure connectivity among VPCs while keeping network traffic within the AWS network. Using PrivateLink, we can connect to Snowflake without going over the public Internet, and without requiring proxies to be setup between Snowflake and their network as a stand-in solution for egress traffic control. Instead, all communication between the customer VPC and Snowflake is performed within the AWS private network backbone.
Snowflake leverages PrivateLink by running its service behind a Network Load Balancer (NLB) and shares the endpoint with customers’ VPCs. The Snowflake endpoint appears in the customer VPC, enabling direct connectivity to Snowflake via private IP addresses. Customers can then accept the endpoint and choose which of their VPCs and subnets to have access to Snowflake. This effectively allows Snowflake to function like a service that is hosted directly on the customer’s private network. Additionally, customers can access PrivateLink endpoints from their on-premise network via AWS Direct Connect, allowing them to connect all their virtual and physical environments in a single, private network. As such, Direct Connect can be used in conjunction with PrivateLink to connect customer’s datacenter to Snowflake. See the figure below.
Implementation
Enabling AWS PrivateLink
There are two options, one is contacting support with your AWS account ID and second one is self-service.
Personally, I always contact support with my account ID as self-service requires IAM User creation which I try to avoid at all cost. In this artical I will cover the steps for self-service.
Self-service steps
- After you exported AWS access keys for your IAM User run the following AWS CLI STS command and save the output. The output will be used as the value for the
federated_token
argument in the next step.
aws sts get-federation-token -name sfsam
2. Note that get-federation-token
requires either an identity and access management user in AWS or the AWS account root user.
Extract the 12-digit number in the "FederatedUserId"
value (truncated). For example, if your token contains:
{
...
"FederatedUser": {
"FederatedUserId": "111...:sfsam",
"Arn": "arn:aws:sts::111...:federated-user/sfsam"
},
"PackedPolicySize": 0
}
Extract 111.... This 12-digit number will be the value for the aws_id
in the next step.
3. As a Snowflake account administrator (i.e. a user with the ACCOUNTADMIN system role), call the SYSTEM$AUTHORIZE_PRIVATELINK function to authorize AWS PrivateLink for your Snowflake account:
select SYSTEM$AUTHORIZE_PRIVATELINK ( '<aws_id>' , '<federated_token>' );
- aws_id: The 12-digit identifier that uniquely identifies your Amazon Web Services (AWS) account, as a string.
- federated_token: The federated token value that contains access credentials for a federated user as a string.
4. To verify your authorized configuration, call the SYSTEM$GET_PRIVATELINK function in your Snowflake account on AWS. It uses the same arguements as Authorizate funtion. Snowflake returns Account is authorized for PrivateLink.
for a successful authorization.
If it is necessary to disable AWS PrivateLink in your Snowflake account, call the SYSTEM$REVOKE_PRIVATELINK function, using the same argument values.
5. Before we began our terraform configuration, make sure to get all the PrivateLink endpoint details from Snowflake. You can do it by running:
select SYSTEM$GET_PRIVATELINK_CONFIG()
Output:
{
"regionless-snowsight-privatelink-url": "<privatelink_org_snowsight_url>",
"privatelink-account-name": "<account_identifier>",
"snowsight-privatelink-url": "<privatelink_region_snowsight_url>",
"privatelink-internal-stage": "<privatelink_stage_endpoint>",
"privatelink-account-url": "<privatelink_account_url>",
"privatelink-connection-urls": "<privatelink_connection_url_list>",
"privatelink-ocsp-url": "<privatelink_ocsp_url>",
"privatelink-vpce-id": "<aws_vpce_id>"
}
Terraform Configuration
TF Locals
Starting with what we have, I added PrivateLink details into Terraform as locals:
privatelink_mc_dev = {
"regionless-snowsight-privatelink-url" = "app-mc-dev.privatelink.snowflakecomputing.com",
"privatelink-vpce-id" = "com.amazonaws.vpce.eu-west-2.vpce-svc-0000011111",
"snowsight-privatelink-url" = "app.eu-west-2.privatelink.snowflakecomputing.com",
"privatelink-account-url" = "mc12345.eu-west-2.privatelink.snowflakecomputing.com",
"regionless-privatelink-account-url" = "mc-dev.privatelink.snowflakecomputing.com",
"privatelink_ocsp-url" = "ocsp.mc12345.eu-west-2.privatelink.snowflakecomputing.com"
"privatelink_ocsp-url-account-name" = "ocsp.mc-dev.privatelink.snowflakecomputing.com"
}
Account name in my case is mc-dev, I run in London region eu-west-2, account id generated by snowflake is mc12345.
Private Route53 Hosted Zone
resource "aws_route53_zone" "snowflake_private_link" {
name = "privatelink.snowflakecomputing.com"
vpc {
vpc_id = "vpc-1234567890"
}
lifecycle {
ignore_changes = [vpc]
}
}
Creation of the private hosted zone for snowflake is essential for your application to resolve privately. Note the associated with the VPC which is also essential. Only applications within the associated VPC will be able to communicate with Snowflake over PrivateLink.
VPC Endpoints for S3 and Snowflake
data "aws_region" "current" {}
data "aws_vpc_endpoint_service" "s3" {
service_type = "Interface"
filter {
name = "service-name"
values = ["com.amazonaws.${data.aws_region.current.name}.s3"]
}
}
resource "aws_vpc_endpoint" "s3" {
vpc_id = "vpc-1234567890"
service_name = data.aws_vpc_endpoint_service.s3.service_name
vpc_endpoint_type = "Interface"
security_group_ids = [aws_security_group.vpc_endpoint.id]
subnet_ids = ["subnet-12345","subnet-67890"] # private subnet ids
private_dns_enabled = false
}
resource "aws_vpc_endpoint" "snowflake_privatelink" {
vpc_id = "vpc-1234567890"
# note service_name references a local variable set above
service_name = local.privatelink_mc_dev["privatelink-vpce-id"]
vpc_endpoint_type = "Interface"
security_group_ids = [aws_security_group.vpc_endpoint.id]
subnet_ids = ["subnet-12345","subnet-67890"] # private subnet ids
private_dns_enabled = false
}
resource "aws_security_group" "vpc_endpoint" {
name_prefix = "vpc-endpoint-sg-"
description = "Security Group used by VPC Endpoints."
vpc_id = "vpc-1234567890"
tags = {
"Name" = "vpc-endpoint-sg"
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_security_group_rule" "vpc_endpoint_egress" {
description = "Allow all egress."
security_group_id = aws_security_group.vpc_endpoint.id
type = "egress"
protocol = "-1"
from_port = 0
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "vpc_endpoint_self_ingress" {
description = "Self-Ingress for all ports."
security_group_id = aws_security_group.vpc_endpoint.id
type = "ingress"
protocol = "-1"
from_port = 0
to_port = 0
source_security_group_id = aws_security_group.vpc_endpoint.id
}
These are essential VPC endpoints to configure. Worth noting that S3 endpoint is required for Amazon S3 traffic from Snowflake clients to stay on the AWS backbone. The Snowflake clients (e.g. SnowSQL, JDBC driver) require access to Amazon S3 to perform various runtime operations.
If your AWS VPC network does not allow access to the public internet, you can configure private connectivity to internal stages or more gateway endpoints to the Amazon S3 hostnames required by the Snowflake clients.
Private Route53 Routes
resource "aws_route53_record" "snowflake_private_link_url_dev" {
zone_id = aws_route53_zone.snowflake_private_link.zone_id
name = local.privatelink_mc_dev["privatelink-account-url"]
type = "CNAME"
ttl = "300"
records = [aws_vpc_endpoint.snowflake_privatelink.dns_entry[0]["dns_name"]]
}
resource "aws_route53_record" "snowflake_private_link_ocsp_url_dev" {
zone_id = aws_route53_zone.snowflake_private_link.zone_id
name = local.privatelink_mc_dev["privatelink_ocsp-url"]
type = "CNAME"
ttl = "300"
records = [aws_vpc_endpoint.snowflake_privatelink.dns_entry[0]["dns_name"]]
}
resource "aws_route53_record" "snowflake_private_link_ocsp_url_account_name_dev" {
zone_id = aws_route53_zone.snowflake_private_link.zone_id
name = local.privatelink_mc_dev["privatelink_ocsp-url-account-name"]
type = "CNAME"
ttl = "300"
records = [aws_vpc_endpoint.snowflake_privatelink.dns_entry[0]["dns_name"]]
}
resource "aws_route53_record" "snowflake_private_link_regionless_account_url_dev" {
zone_id = aws_route53_zone.snowflake_private_link.zone_id
name = local.privatelink_mc_dev["regionless-privatelink-account-url"]
type = "CNAME"
ttl = "300"
records = [aws_vpc_endpoint.snowflake_privatelink.dns_entry[0]["dns_name"]]
}
resource "aws_route53_record" "snowflake_private_link_regionless_snowsight_url_dev" {
zone_id = aws_route53_zone.snowflake_private_link.zone_id
name = local.privatelink_mc_dev["regionless-snowsight-privatelink-url"]
type = "CNAME"
ttl = "300"
records = [aws_vpc_endpoint.snowflake_privatelink.dns_entry[0]["dns_name"]]
}
This completes our setup where we created all the required components. They are:
- Route53 internal hosted zone
- Route53 routes
- Route53 association with VPC
- VPC Endpoints for S3 and Snowflake
- VPC Endpoint security group
Conclusion
I hope this story will speed up your own process of integration AWS PrivateLink with Snowflake. It is not the easiest implementation especially when you consider Snowflake documentation.
Note that I am on purpose not using Snowflake terraform provider as it very flaky and risky to use. I had many issues with it, they included broken resources, resources removed from the state file, broken upgrades etc. I would stay out of it until Version 1.0.0 is released.
I highly recommend using AWS PrivateLink for Snowflake since that will allow all your traffic to flow privately. Any production data or system should be utilising it in my opinion.
Thanks for reading.
Sponsor Me
Like with any other story on Medium written by me, I performed the tasks documented. This is my own research and issues I have encountered.
Thanks for reading everybody. Marcin Cuber