Consul Connect with Envoy and Docker
Past few weeks I have been exploring the HashiCorp suite of tools as an alternative to come up with a strategy for moving to a cloud-native architecture at my workplace. The tool that piqued my interest was Consul, especially the new service mesh feature recently added called Consul Connect.
In this post, I want to share my learnings while trying to setup a Consul Connect service mesh with Envoy as data plane and all services started as Docker container.
I will use the official guide provided on HashiCorp website https://learn.hashicorp.com/consul/developer-mesh/connect-envoy (go read it if not done already) but with a key difference. The service that the service mesh protects with mTLS must be listening on loopback interface. Why is this important? If the actual service is listening on primary NIC interface (accessible from other machines with same NIC), then the purpose of service mesh mTLS identity protection is defeated. The service should be accessible to no one except its sidecar proxy.
As it turns out, achieving this with Docker is not as straightforward.
When dealing with Docker containers, loopback interface (127.0.0.1) is unique for each container. So a request to access a port on the loopback interface from within a container would never leave the container. So what’s the alternative? Remember, we cannot use the standard primary NIC interface as that’s being used for all usual traffic from other machines and load balancers/proxies.
One great option I found was the Dummy Interface solution which introduces a new interface 169.254.1.1
. Since this is a reserved link-local address, it will not be accessible from outside the machine. But it will be same for any Docker containers running on the machine. This is exactly what we need to make the service private on a machine while still accessible to all Docker containers running on that machine.
The linked blog explains how to add a dummy interface for any machine running systemd
. Once the interface is added, we are ready to proceed with the Consul setup.
Deploying Consul Server
The HashiCorp guide starts the Consul server in -dev
mode. This automatically enables the Connect feature. Here we will start the consul server in production mode (without -dev)
.
But first, we need a service configuration envoy_demo.hcl
as below:
connect {
enabled = true
}
services {
name = "client"
port = 8080
connect {
sidecar_service {
proxy {
local_service_address = "169.254.1.1"
upstreams {
destination_name = "echo"
local_bind_address = "169.254.1.1"
local_bind_port = 9191
}
}
}
}
}
services {
name = "echo"
port = 9090
connect {
sidecar_service {
proxy {
local_service_address = "169.254.1.1"
}
}
}
}
Note the highlighted part. Since we start Consul in production mode, connect must be explicitly enabled. The local_service_address
indicates to Consul that the sidecar proxy should locate its service on this address. By default it is set to 127.0.0.1
but with Docker containers it won’t work so we set that to our dummy IP address 169.254.1.1
. The local_bind_address
binds the upstream proxy listener on given IP instead of the default 127.0.0.1.
Let’s start the Consul server container with above configuration.
docker run -d --name=consul-server --net=host \
-v ${pwd}/envoy_demo.hcl:/etc/consul/envoy_demo.hcl \ consul:latest agent -server \
-config-file /etc/consul/envoy_demo.hcl \
-grpc-port 8502 \
-client 169.254.1.1 \
-bind HOST_IP_ADDRESS \
-bootstrap-expect 1 -ui
Let’s break down the above command:
--net=host
runs the docker container on the host machine. It is the recommended way by HashiCorp to run Consul container.
-server
runs Consul in server mode.
-grpc-port
tells Consul to start the gRPC server. This is needed because we want to use Envoy as the data plane.
-client
makes Consul bind the HTTP, DNS and gRPC server on this IP. If not provided, defaults to 127.0.0.1
which will not allow other containers to reach the Consul endpoints. The REST endpoint will be available on http://169.254.1.1:8500
and gRPC endpoint on 169.254.1.1:8502
.
-bind
makes Consul use this IP for all internal cluster communication.
-bootstrap-expect
tells Consul the number of nodes participating in the cluster, in this case 1.
-ui
because we want to see the shiny UI.
Running the Echo service
Next, let’s run the echo
service.
docker run --rm -d --dns 169.254.1.1 --name echo-service \
--net=bridge \
-p 169.254.1.1:9090:9090 \
abrarov/tcp-echo \
--port 9090
Note the container has been started in Docker bridge
network. This starts the echo
service on port 9090
within the container and maps the port on the host machine on dummy IP and host port 9090
. This ensures that the echo
service will not be accessible from any other machine in the network.
Running the Sidecar proxies
As described in the HashiCorp guide, we build the consul-envoy
Docker image before running the sidecar proxies. Create the Dockerfile
as below:
FROM consul:latest
FROM envoyproxy/envoy:v1.8.0
COPY --from=0 /bin/consul /bin/consul
ENTRYPOINT ["dumb-init", "consul", "connect", "envoy"]
Build the image with following command:
docker build -t consul-envoy .
Let’s start the sidecar proxies:
docker run --rm -d --dns 169.254.1.1 --name echo-proxy \
--network host \
consul-envoy \
-sidecar-for echo \
-http-addr http://169.254.1.1:8500 \
-grpc-addr 169.254.1.1:8502 \
-admin-bind 127.0.0.1:0 \
-- -l trace
This starts the sidecar proxy for echo
service with the correct Consul HTTP and gRPC endpoints. The -- -l trace
causes Envoy to log trace messages which is useful for debugging issues.
docker run --rm --dns 169.254.1.1 --name client-proxy \
--network host \
consul-envoy \
-sidecar-for client \
-http-addr http://169.254.1.1:8500 \
-grpc-addr 169.254.1.1:8502 \
-admin-bind 127.0.0.1:0 \
-- -l trace
This starts the sidecar proxy for client
service. The Envoy Admin API is started on localhost random port to avoid conflicting with other sidecar proxies.
Running the Client service
Let’s test the setup by connecting to the proxy upstream port 9191
.
$ docker run -ti --rm --network host gophernet/netcat 169.254.1.1 9191
Hello World!
Hello World!
^C
This confirms that our setup works as described in the HashiCorp guide while also securing the real service endpoints to be not routable from outside the machine.