VPC Endpoint for Amazon S3

A VPC Endpoint is a service that enables you to have selected AWS Services on your VPC. It has several advantages, as it allows finer-grained control access to your resources and avoids traffic through the Internet, which you’ll pay for.

In a typical Web Application, Amazon S3 is used to store static assets, such as images, CSS, to improve your site’s performance and modularity. It also allows you to store your assets on a highly durable and available Object Store (99.999999999%).

Types of VPC Endpoints

AWS provides two types of VPC Endpoints:

  • Interface Endpoints – it creates an ENI on your VPC, with a private IP. The service integrates with internal DNS resolution on your VPC, which allows you to reach the service through your subnets;
  • Gateway Endpoints – adds a gateway that can be used on your Route Tables. You may add a Gateway Endpoint for each VPC.

Test the new Setup

Before going into Live, you should add a new subnet on your current setup/VPC, to make sure your current scenario is tested before going into Production. (We are using Terraform version 0.11 on these samples)

We will start by creating a new subnet prv-subnet-1, on an existing VPC (vpc_id):

resource "aws_subnet" "prv-subnet-1" {
  vpc_id                  = "${var.vpc_id}"
  cidr_block              = "172.31.60.0/24"
  availability_zone       = "eu-west-1a"
  map_public_ip_on_launch = false
  
  tags = {
    Name        = "prv-subnet-1"
    Terraform   = "true"
  }
}

Now, let’s add a new Route Table, using an existing NAT GW (natgw_id):

# create the route table
resource "aws_route_table" "test" {
  vpc_id = "${var.vpc_id}"

  # add default gw
  route {
    cidr_block      = "0.0.0.0/0"
    nat_gateway_id  = "${var.natgw_id}"
  }

  tags = {
    Name = "prv-eu-west-1a-rtb"
  }
}

# associate with subnet
resource "aws_route_table_association" "assoc" {
  subnet_id      = "${aws_subnet.prv-subnet-1.id}"
  route_table_id = "${aws_route_table.test.id}"
}

You may now add a Gateway VPC Endpoint (vpce) to your new Route Table, which is now associated with prv-subnet-1.

resource "aws_vpc_endpoint" "vpce-s3" {
  vpc_id              = "${var.vpc_id}"
  vpc_endpoint_type   = "Gateway"
  service_name        = "com.amazonaws.eu-west-3.s3"
  route_table_ids     = ["${aws_route_table.test.id}"]
}

We will see that after some minutes you will have a new route entry pointing to a pl-123456ab (vpce) device, the Gateway VPC Endpoint.

At this time, you should check that your access to S3 is still valid.
The first issue you might encounter is that aws:SourceIp bucket policies, based on Public IPs, will no longer work. This is due to the fact that now you’ll be accessing S3 objects directly from your VPC, rather than using Public IP Address.

Let’s take a look at a sample ACL based on https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html, which uses a aws:SourceIp condition:

{
"Version": "2012-10-17",
"Id": "S3PolicyId1",
{
  "Version": "2012-10-17",
  "Id": "S3PolicyId1",
  "Statement": [
    {
      "Sid": "IPAllow",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::examplebucket/*",
      "Condition": {
         "IpAddress": {"aws:SourceIp": "54.240.143.188/32"},
      } 
    } 
  ]
}

The expected change here is to change the aws:SourceIp to the CIDR block of your VPC. However, that will not work! If you review the AWS Documentation it states:

You cannot use an IAM policy or bucket policy to allow access from a VPC IPv4 CIDR range (the private IPv4 address range). VPC CIDR blocks can be overlapping or identical, which may lead to unexpected results. Therefore, you cannot use the aws:SourceIp condition in your IAM policies for requests to Amazon S3 through a VPC endpoint.

So, you’re left with two options to allow/restrict access:

  • Restrict your policy to a VPC or a specific Gateway Endpoint, using aws:sourceVpc;
  • On the VPC side, only add the Gateway VPC Endpoint to the subnets that need access to.

Heads up: Even after raising the limits, you cannot have more than 255 gateway endpoints per VPC. (https://docs.aws.amazon.com/vpc/latest/userguide/amazon-vpc-limits.html#vpc-limits-endpoints).


Here are the changes you should make to the previous example:

{
"Version": "2012-10-17",
"Id": "S3PolicyId1",
{
  "Version": "2012-10-17",
  "Id": "S3PolicyId1",
  "Statement": [
    {
      "Sid": "IPAllow",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::examplebucket/*",
      "Condition": {
        "StringEquals": {
          "aws:sourceVpc": [
            "vpc-aabbccddeeff"
          ]
        }
    } 
  ]
}


Going into Production

Before going into Production, you should review all your bucket policies, to make sure you are using the right Bucket Policies, as explained earlier. To keep your configuration clean and reusable, we’ve created a Terraform module:

# main.tf

resource "aws_vpc_endpoint" "s3" {
  vpc_id              = "${data.aws_vpc.selected.id}"
  vpc_endpoint_type   = "Gateway"
  service_name        = "com.amazonaws.${var.aws_region}.s3"
  route_table_ids     = ["${distinct(data.aws_route_table.selected.*.route_table_id)}"]
}
# data.tf

data "aws_subnet" "selected" {
  count      = "${length(var.subnets)}"
  cidr_block = "${var.subnets[count.index]}"
}

data "aws_route_table" "selected" {
  count     = "${length(var.subnets)}"
  subnet_id = "${data.aws_subnet.selected.*.id[count.index]}"
}

data "aws_vpc" "selected" {
  tags = "${map("Name",var.vpc_name)}"
}
# variables.tf

variable "aws_region" {
  description = "Region to attach S3 VPC Endpoint"
}

variable "tags" {
  description = "Tags to the resources"
  type = "map"
  default = {}
}

variable "vpc_name" {
  description = "VPC Name where to attach the S3 VPC Endpoint"
}

variable "subnets" {
  description = "List of Subnets to add the VPC Endpoint"
  type = "list"
}

Here is a sample usage of the module:

# s3-vpc-endpoint.tf

module "vpc-attach" {
  source        = "modules/terraform-aws-s3-vpc-gateway"
  aws_region    = "${var.aws_region}"
  vpc_name      = "${var.vpc_name}"
  subnets       = ["172.31.50.0/24","172.31.51.0/24",""
172.31.52.0/24"]
  tags = {
    Terraform   = "true"
    Environment = "production"
  }
}

Select a Maintenance Window to apply this module, as the new endpoint will switch the network routes, and consequentely open TCP connections will be closed.

From now on, your S3 Bucket connections on your VPC Region will no longer use the Internet which reduces cost. And that is, apparently, important.

CI/CD in OpenShift with Gitlab and Terraform

We’re always searching new ways of implementing CI/CD at Eurotux, and in this post I’ll describe one of those by leveraging 3 components that we are using in our customers:

  • Gitlab
  • Terraform
  • Openshift

Application

The application we wanted to deploy is Fess Enterprise Search Server , (“Fess is Elasticsearch-based search server”) and we use it to scan our internal Wiki server and allow our teams to have a “google”-like search engine for that wiki. Fess supports all sorts of targets (file servers, web sites, databases) and it supports several authentication methods such as BASIC/DIGEST/NTLM/FORM (keep this in mind for the next minutes).

We use OpenShift which is a container based orchestration, so the first thing to do is to create a container for the application. Fortunately Fess already provides a base container image which I’ll use as the base container for the project and I will only improve on that. The first thing is to create a Dockerfile:

FROM docker.io/codelibs/fess:latest
RUN perl -i -p -e "s/crawler.document.cache.enabled=true/crawler.document.cache.enabled=false/" /etc/fess/fess_config.properties
ADD logo-head.png /usr/share/fess/app/images/logo.png
ADD osdd.xml /usr/share/fess/app/WEB-INF/orig/open-search/osdd.xml
ADD logo-head.png /usr/share/fess/app/images/logo-head.png
RUN apt-get update && apt-get -y install libjson-perl
COPY entrypoint.sh /usr/share/fess/run.sh

ADD insert.sh /insert.sh

# We are using oc 3.9 because the later ones require libcrypto.so.10 (see https://github.com/openshift/origin/issues/21061)
RUN wget https://mirror.openshift.com/pub/openshift-v3/clients/3.9.63/linux/oc.tar.gz && tar zxf oc.tar.gz -C /usr/bin && rm oc.tar.gz

ADD fess_config.properties /etc/fess/fess_config.properties

I did do some customization (like changing the logo to our company one), changing the entrypoint and adding the oc (openshift client) command. As one can easily understand, our internal wiki is password protected. It is a form-based username/password (you see why it is great that Fess supports form-based authentication) and I only need to provide the Fess server the username and password to access the wiki.

The entrypoint is changed so that when the container starts, will get the username and password from OpenShift secrets (that’s why the container installs the oc command), update the Fess server configuration and start indexing the wiki. As this is a stateless service, I don’t need to worry about saving state and using Persistent Volumes. If the container dies or gets redeployed, the search engine will re-index our wiki. This keeps this project simpler and cleaner. Here is a snippet of the insert.sh script:

if [ -z "$WIKIUSER" ]; then
    export WIKIUSER="`oc get secret wikiuser --template='{{.data.username}}' | base64 -d`"
fi
if [ -z "$WIKIPASS" ]; then
    export WIKIPASS="`oc get secret wikiuser --template='{{.data.password}}' | base64 -d`"
fi

curl -XPOST "http://localhost:9200/.fess_config.web_authentication/web_authentication" -H 'Content-Type: application/json' -d "
{
           \"webConfigId\" : \"$CONFIGID\",
           \"updatedTime\" : 1509224726193,
           \"hostname\" : \"wiki.eurotux.com\",
           \"password\" : \"$WIKIPASS\",
           \"updatedBy\" : \"admin\",
           \"createdBy\" : \"admin\",
           \"createdTime\" : 1509224726193,
           \"protocolScheme\" : \"FORM\",
           \"username\" : \"$WIKIUSER\",
           \"parameters\" : \"encoding=UTF-8\\nlogin_method=POST\\nlogin_url=https://wiki.eurotux.com/Special:UserLogin\\nlogin_parameters=username=\${username}&password=\${password}&auth_id=1&deki_buttons%5Baction%5D%5Blogin%5D=login\"
}"

Terraform

We use Terraform to bootstrap the infrastructure required for the deployment of this application, which is responsible for the following:

  • OpenShift Project (Namespace)
  • Secrets (wiki username and password)
  • Granting permissions to the container default service account to access the secret (so that the container can fetch that info)
  • Granting the gitlab runner service account to edit this namespace objects (so that the deployment pipeline can deploy to this namespace)
  • Adding the anyuid scc to the deployer service account. The Fess container runs several services (actually this is an anti-pattern in the container world), and requires to run as root inside the container (later on it changes the uid to another)

Unfortunately, the terraform kubernetes provider is somewhat lacking in features comparing to others (like aws or azure provider). Because of that, I use a mix of internal resources like the kubernetes_namespace and null_resource as a wrapper to the occommand:

# Create namespace
resource "kubernetes_namespace" "search" {
  metadata {
    annotations {
      name = "search-engine"
    }

    labels {
      owner = "npf"
    }

    name = "${var.namespace}"
  }

  lifecycle {
    # because we are using openshift, we have to ignore the annotations as openshift does add some annotations
    ignore_changes = ["metadata.0.annotations"]
  }
}
# This container requires root, so we need to allow anyuserid
resource "null_resource" "add-scc-anyuid" {
  provisioner "local-exec" {
    command = "oc -n ${kubernetes_namespace.search.id} adm policy add-scc-to-user anyuid -z deployer"
  }

  provisioner "local-exec" {
    command = "oc -n ${kubernetes_namespace.search.id} adm policy remove-scc-from-user anyuid -z deployer"
    when    = "destroy"
  }
}

As you can see, I use local-exec to spawn the oc command when there isn’t support for those features in the kubernetes terraform provider. As a result of a terraform apply:

Gitlab

At Eurotux we are using an internal gitlab server to house all our projects. As so, we make extensive use of its’ CI/CD capabilities. To implement the CI/CD I’ve created a .gitlab-ci.yml file to describe the pipeline:

image: $CI_REGISTRY/docker/base-builder

stages:
  - review
  - staging
  - production
  - cleanup

variables:
  OPENSHIFT_SERVER: https://oshift.install.etux:8443
  OPENSHIFT_DOMAIN: oshift.install.etux

.deploy: &deploy
  tags:
    - kubernetes
  before_script:
    - ci-bootstrap
  script:
    - "oc -n $CI_PROJECT_NAME get services $APP 2> /dev/null || oc -n $CI_PROJECT_NAME new-app fess --name=$APP --strategy=docker"
    - "oc -n $CI_PROJECT_NAME start-build $APP --from-dir=fess --follow || sleep 3s && oc -n $CI_PROJECT_NAME start-build $APP --from-dir=fess --follow"
    - "oc -n $CI_PROJECT_NAME get routes $APP 2> /dev/null || oc -n $CI_PROJECT_NAME create route edge --hostname=$APP_HOST --insecure-policy=Redirect --service=$APP"
......
......
staging:
  <<: *deploy
  stage: staging
  tags:
    - kubernetes
  variables:
    APP: staging
    APP_HOST: $CI_PROJECT_NAME-staging.$OPENSHIFT_DOMAIN
  environment:
    name: staging
    url: http://$CI_PROJECT_NAME-staging.$OPENSHIFT_DOMAIN
  only:
    - master

production:
  <<: *deploy
  stage: production
  tags:
    - kubernetes
  variables:
    APP: production
    APP_HOST: $CI_PROJECT_NAME.$OPENSHIFT_DOMAIN
  when: manual
  environment:
    name: production
    url: http://$CI_PROJECT_NAME.$OPENSHIFT_DOMAIN
  only:
    - master

The pipeline will create a review application when working on a git branch other than master so that I can review and fix things. When a merge (or a commit for that matter) occurs in master, it will deploy automatically to staging and then I can press play to deploy to production. Here is an example of the pipeline:

Here is a snippet of the pipeline running:

After that, i can browse to https://search.oshift.install.etux/ and I’m presented with the search engine webpage:

OpenShift

As you’ve figured out by now, all of this is running in our testing OpenShift cluster. We are using the 3.11 version of OpenShift, which features monitoring using Prometheus and Grafana (later on, I will detail some other interesting features, such as integration with Keycloak). OpenShift automatically provides some Grafana dashboards so that you can see what are the usage patterns:

One of the interesting things that these dashboards present is the lifecycle of the application (starting new containers and stopping the older ones).