Jaeger Backend: Collector & Query

This article describes how EverQuote uses the Jaeger Operator to provision the Jaeger backend.
Disclaimer: Just as in the previous article, this particular setup should not be construed as an endorsement or recommendation. Use your own judgment before adopting anything covered by this article.
Assumptions
I will assume that we are using the ElasticSearch setup described in the previous article.
Furthermore, I will assume that the Jaeger Operator has already been deployed since this is something that must be tailored to your particular environment.
We use a combination of Helm, Kustomize, Flux, and GitHub Actions to deploy all of our Kubernetes cluster services. You can read more about that in a future article.
RBAC
The Jaeger Operator Helm chart creates RBAC manifests for all possible deployment strategies of Jaeger. This creates a bunch of unnecessary “noise” which is why we opt to create the RBAC manifests ourselves.
Helm chart config:
rbac:
create: false
Since we are installing Jaeger into its own namespace, we simply create a RoleBinding
that makes the Jaeger Operator an admin for that namespace.
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: jaeger-operator-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: admin
subjects:
- kind: ServiceAccount
name: jaeger-operator
⚠ Note: The ClusterRole
“admin” is a built-in role that comes with every Kubernetes cluster. All of its permissions are scoped to the namespace of the RoleBinding
that binds to it.
The Jaeger Operator also installs custom resource definitions (currently just one actually) under the API group jaegertracing.io
.
In order for the operator to be allowed to manage the custom resources that it is responsible for, we also have to create a Role
and a RoleBinding
for that (since we don’t want to extend the permissions of the “admin” ClusterRole
).
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: jaeger-operator
rules:
- apiGroups:
- jaegertracing.io
resources:
- '*'
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: jaeger-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: jaeger-operator
subjects:
- kind: ServiceAccount
name: jaeger-operator
⚠ Note: The above RoleBindings
obviously assume that the Jaeger Operator is running as the ServiceAccount
“jaeger-operator”.
Docker Hub Credentials
Ever since Docker Hub started rate limiting image pulls for the free tier, we have been sure to outfit all of our Kubernetes workloads (or their associated ServiceAccounts
) with Docker Hub credentials for our paid subscription.
Otherwise a worker node upgrade or an availability zone failure could propel large portions of our Kubernetes workloads into an ImagePullBackOff
state.
We are therefore going to hand-create the ServiceAccounts
used by Jaeger’s backend components and supply them with the required image pull secrets.
Helm chart config:
serviceAccount:
create: false
apiVersion: v1
kind: ServiceAccount
metadata:
name: jaeger-collector
imagePullSecrets:
- name: dockerhub-credentials
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: jaeger-query
imagePullSecrets:
- name: dockerhub-credentials
⚠ Note: We use Kubernetes External Secrets to project these Docker Hub credentials into all of our namespaces. You can read more about that in a future article.
Deployment
Before we spin up the Jaeger backend we have to create a Secret that contains the username and password for ElasticSearch.
$ kubectl create secret generic jaeger-es-credentials \
--from-literal=ES_USERNAME=jaeger \
--from-literal=ES_PASSWORD=0penSes@me
$ history -c # don’t leave sensitive information in your shell history
Now we can use the Jaeger
resource to spin up the Jaeger backend.
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger-backend
spec:
strategy: production
collector:
maxReplicas: 6
resources:
limits:
cpu: 2000m
memory: 16Gi
requests:
cpu: 1000m
memory: 8Gi
options:
log-level: info
es:
use-aliases: true
annotations:
linkerd.io/inject: enabled
config.linkerd.io/skip-outbound-ports: "443"
serviceAccount: jaeger-collector
sampling:
options:
default_strategy:
type: probabilistic
param: 0.01 # 1%
query:
replicas: 3
resources:
limits:
cpu: 1000m
memory: 8Gi
requests:
cpu: 500m
memory: 4Gi
options:
log-level: info
es:
use-aliases: true
annotations:
linkerd.io/inject: enabled
config.linkerd.io/skip-outbound-ports: "443"
serviceAccount: jaeger-query
storage:
type: elasticsearch
options:
es:
create-index-templates: false
server-urls: https://jaeger-es
esIndexCleaner:
enabled: false
dependencies:
enabled: false
secretName: jaeger-es-credentials
ingress:
enabled: false
Unfortunately, the Helm chart installs a custom resource definition that is completely devoid of structure (which makes it hard to figure out how to create a valid Jaeger
resource). But you can find the full CRD spec in the jaeger-operator repo on GitHub.
I won’t explain all the config options for the Jaeger resource but there are a few things I’d like to point out.
1) The ElasticSearch config enables the use of index aliases and disables index initialization and lifecycle management. The reason for this has to do with our ElasticSearch setup. You can read about that setup here.
spec:
collector:
options:
es:
use-aliases: true
query:
options:
es:
use-aliases: true
storage:
options:
es:
create-index-templates: false
esIndexCleaner:
enabled: false
2) Despite the fact that we have Linkerd proxy sidecar injection enabled for the Jaeger namespace, we have to explicitly enable this injection here. That’s because the Jaeger Operator adds annotations to the Jaeger backend Deployments
to prevent the injection of Istio and Linkerd proxy sidecars. The reason for this behavior is unclear and not documented.
spec:
collector:
annotations:
linkerd.io/inject: enabled
config.linkerd.io/skip-outbound-ports: "443"
query:
annotations:
linkerd.io/inject: enabled
config.linkerd.io/skip-outbound-ports: "443"
3) We want our Nginx ingress controllers to restrict access to the Jaeger backend and use a different authentication method for the Query and Collector components. So we don’t want the Jaeger Operator to create an Ingress resource for us.
spec:
ingress:
enabled: false
4) We recommend to our users to use the “remote” sampling strategy. This strategy defers the selection of the actual sampling strategy upstream. This allows us to steer the sampling strategy for all workloads from a central location in case we have to respond to load spikes.
spec:
sampling:
options:
default_strategy:
type: probabilistic
param: 0.01 # 1%
Now the Jaeger backend is ready to rock. It’s time to expose it to the outside world so we can start using it.
Ingress
Let’s start by exposing the Jaeger Collector which is the backend component that receives traces and persists them to ElasticSearch in our case.
As mentioned in the previous section, we want to restrict access to our Jaeger backend components.
For the Collector (which is a write-only) endpoint we mainly want to prevent operator error. E.g. traces from our prod environment ending up in the staging instance of Jaeger.
Since Jaeger traces are sent to the Collector via GRPC, there’s really only one officially supported way to do client authentication: TLS.
In the next article, I will show how to provision a private CA that will allow us to easily create TLS client certs for our Jaeger Agent containers. For now I will simply assume that such a CA already exists.
Create a Secret
that contains the certificate(s) of the CA(s) you want to trust.
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: jaeger-auth-collector
stringData:
ca.crt: |-
-----BEGIN CERTIFICATE-----
MIIF5jCCA86gAwIBAgIJALndCNnLu42PMA0GCSqGSIb3DQEBCwUAMDAxLjAsBgNV
BAMMJU15IFByaXZhdGUgQ0EgQCBteS1rdWJlcm5ldGVzLWNsdXN0ZXIwHhcNMjEw
NzE1MjM0OTA5WhcNMzEwNzEzMjM0OTA5WjAwMS4wLAYDVQQDDCVNeSBQcml2YXRl
IENBIEAgbXkta3ViZXJuZXRlcy1jbHVzdGVyMIICIjANBgkqhkiG9w0BAQEFAAOC
Ag8AMIICCgKCAgEApckDrPxGSbtRNrFN6k5vlhqcM42n3RdU0XXD1PkexUvdHpi3
c8bIzjQhBLZQjmBbg3i8QhHbrl+A+lEWtUMKpNeQvCrYyLWY9786X2iM0tb6LkmS
rrBQ2u1+olygKL9U368CsJBuA0rtK+xGaZyEhhh+F4No6DBkHAgrM8k5OpLG/AyF
xdKtlyzF+Um6qUNMr1mNI5VD+LyjDAbzqRjZywnYTl30AWQLTx3UKwVD3CpTHW1q
rRJYtZO+OscpkbFeMSA2TUIH+ek4oV+YXyh5NEhMkmS2rYwQl2fKyQ22qq+kcyxG
kV+2IUMVergYCh8gBrfVzJ0SNdWmfe+Ojg5uMwbrRLPOoS7TA69Q7W+SMRmew8Za
l1/fF5m/ADn1ixiMNsXz3pbn173c+NLIgKSDQ7KTz0797CqOXZRK4wI/lLUfQY+1
QZf3owzTQEDRdd1i1X8XqyDbznHX5ohV0YqZQFQEX/CLXk3b2M0LIZCS0tv1qsQv
r5Rc2Me1KdNxFByEnfVmbkwMvl+RWjmbko8cyW5AuATL3wCVZj8pnOwUaB4qzjyz
0v2HGPh2eMfni6EWSqOqQJhFw1F/wKa0Jq9csuVEGxlKHA34yE+PXaaJXIWEsgc1
2n+7//4ALUJky9gblJbR2AOCFKWdn92LtWgfltkZr/M17mxuslCvkJLt+y8CAwEA
AaOCAQEwgf4wEgYDVR0TAQH/BAgwBgEB/wIBATAOBgNVHQ8BAf8EBAMCAQYwHQYD
VR0OBBYEFFylrKtVHqCOoxzaQnxm7sUJ6l9+MGsGCWCGSAGG+EIBDQReFlxZb3Vy
IGN1cmlvc2l0eSBzZXJ2ZXMgeW91IHdlbGwhIFdhbnQgdG8gd29yayBhdCBFdmVy
UXVvdGU/IFNlbmQgbWFpbCB0byBoNHgwcnpAZXZlcnF1b3RlLmNvbTBMBg4rBgEE
AYPBQQEBAYo5AQQ6FjggPj4gVTJsbmJpQjViM1Z5SUdWdFlXbHNJSGRwZEdnZ2RH
aHBjeUJsYlc5cWFUb2c4Situb0FvPTANBgkqhkiG9w0BAQsFAAOCAgEAg5lL1bja
/yGZQR6RocaS84u39AMzrxBQnQXp4Nx7jJCMhmMIDFTwJzP5mUtLTLA9fvv8FLmO
rTJHHpKPRSce4cpDjsQkFcCanOSlYEfhJxBdi8YEQ+1CIvE9S56Ifa4t1UbaCi2e
rQUW7EG/OoGQqB4hoIUcbYBq+mwvoYieJ1c1AmE9TlU/tXAXCUkjO2QTRHdRddnZ
ixfg4DttWEkx7Cf6kU49T5BWjXtJVTK2Rn/gHNtCvavR/+JlG/kq/yDbubwoJ/xD
RdedvP3LdvI6WGuzRJLdkaXtIX3hQ+8ah9uykJqSRfUrOPfdmt1Zohh+R4O6POYF
2bLtdjpKTcD7Ni1gxPtHRJeVFI4RIFSlM0QKfHo78q5URzdnRWTFs9KnDE8r0UmJ
Y1ZVtsAAgSbDZmZGb7L5JQcBJVqoq0NwKdfseFBfWrghjxd2cq+2qaU3IHlIIx5c
jYiLhxtjKxqievWy7E8bgLcX4mx4nHgT3U4dhKUjFWxyrSuaANspm1lwEVVjmP/T
bjqKOM++uzeSQtIFHTj3Z2HjMFbQ2W4gWcIeMNONNJPRGff3shW4I9XA/wMbUYwo
8wmLecfpdOBi/ZRDL9MKhNnhlPh0Kd4kmZaHBwI6HNTkGiE4twJxEOSXV9nMTqs2
1sdqe5UR6o/lo6G3mK0Wdxl96IVFeUnFaWU=
-----END CERTIFICATE-----
⚠ Note: At EverQuote we actually create a separate CA for each Kubernetes cluster. Simply concatenate all the CA certs together and add them to the ca.crt
key in the above Secret
. For non-Kubernetes workloads you can use an Amazon-hosted ACM Private CA (but that’s outside the scope of this article).
Now we can create an Ingress
that uses the above Secret for TLS client authentication.
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: jaeger-collector
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/auth-tls-verify-client: on
nginx.ingress.kubernetes.io/auth-tls-secret: jaeger/jaeger-auth-collector
nginx.ingress.kubernetes.io/auth-tls-verify-depth: "1"
nginx.ingress.kubernetes.io/backend-protocol: GRPC
spec:
rules:
- host: jaeger-collector
http:
paths:
- path: /
backend:
serviceName: jaeger-backend-collector
servicePort: 14250
tls:
- hosts:
- jaeger-collector
secretName: jaeger-collector-tls
⚠ Note: The nginx.ingress.kubernetes.io/auth-tls-secret
annotation requires you to specify the above Secret
in the format: <namespace>/<secret>
. I am using the namespace jaeger
in the above example. Be sure to adjust this to match your setup.
⚠ Note: We use cert-manager to automatically provision TLS certs in all our Kubernetes clusters. So the Secret
containing the TLS (server) cert for the above Ingress (jaeger-collector-tls)
will automatically be populated with a valid cert.
⚠ Note: Just as in the previous article, I am using an abbreviated, unqualified hostname here as a stand-in for an actual hostname: jaeger-collector
. Be sure to update this with the real DNS name for your Ingress
.
That’s it! We can now send traces to the Jaeger Collector.
But before we call it a day, let’s also expose the Jaeger Query component to the outside world.
For the Query endpoint we will use HTTP Basic Auth to restrict access. And, again, we will get our Ingress
to handle the authentication for us.
I recommend creating Bcrypt hashes which are resistant to brute force attacks due to their inherent computational complexity. However, unless you use a tool such as pwgen
to generate strong, random passwords these hashes should be considered sensitive.
$ htpasswd -nbB <username> <password>
Create a Secret
that contains your auth file.
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: jaeger-auth-query
stringData:
auth: |-
[email protected]:$2y$05$gibberish
[email protected]:$2y$05$gibberish
[email protected]:$2y$05$gibberish
⚠ Note: The Jaeger Query endpoint can be used as a Grafana data source. This enables you to integrate traces with logs and metrics for a fantastic observability experience. More on that in a future article.
Now we can create the Ingress
for our Jaeger Query component.
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: jaeger-query
annotations:
kubernetes.io/tls-acme: "true"
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/auth-secret-type: auth-file
nginx.ingress.kubernetes.io/auth-secret: jaeger/jaeger-auth-query
nginx.ingress.kubernetes.io/auth-realm: Jaeger Query
spec:
rules:
- host: jaeger
http:
paths:
- path: /
backend:
serviceName: jaeger-backend-query
servicePort: 16686
tls:
- hosts:
- jaeger
secretName: jaeger-tls
⚠ Note: The nginx.ingress.kubernetes.io/auth-secret
annotation does not require you to specify the above Secret
in the format: <namespace>/<secret>
. However, you can do so if you like. I am using this format here simply for consistency with the Jaeger Collector Ingress
.
⚠ Note: As above, I am using an abbreviated, unqualified hostname here as a stand-in for an actual hostname: jaeger
. Be sure to update this with the real DNS name for your Ingress
.
Voila! Our Jaeger backend is ready for use.
Metrics
A final note about using Prometheus to monitor the Jaeger backend (which allows you to create alerts using these Prometheus rules)...
There is currently some back-and-forth in the Jaeger project about whether the Jaeger Operator or its Helm chart should be in charge of creating ServiceMonitors
to get Prometheus to scrape Jaeger metrics.
ServiceMonitors
are custom resources used by Prometheus Operator to configure shared Prometheus instances in a Kubernetes cluster.
If you have Prometheus Operator deployed in your Kubernetes cluster, I would recommend that you create your own Jaeger ServiceMonitors
for now.
For the Jaeger Collector I would actually use a PodMonitor
instead because the Jaeger Operator currently creates two Services with identical labels for the Jaeger Collector:
jaeger-backend-collector
jaeger-backend-collector-headless
⚠ Note: The exact name for these Services
is derived from the name of the Jaeger
resource you create above.
Anyway, here’s what we’re using to get Prometheus to scrape Jaeger metrics.
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: jaeger-operator
spec:
podMetricsEndpoints:
- port: metrics
honorLabels: true
namespaceSelector:
matchNames:
- jaeger
selector:
matchLabels:
app.kubernetes.io/name: jaeger-operator
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: jaeger-collector
spec:
podMetricsEndpoints:
- port: admin-http
honorLabels: true
namespaceSelector:
matchNames:
- jaeger
selector:
matchLabels:
app.kubernetes.io/component: collector
app.kubernetes.io/instance: jaeger-backend
app.kubernetes.io/managed-by: jaeger-operator
app.kubernetes.io/name: jaeger-backend-collector
app.kubernetes.io/part-of: jaeger
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: jaeger-query
spec:
endpoints:
- targetPort: admin-http
honorLabels: true
namespaceSelector:
matchNames:
- jaeger
selector:
matchLabels:
app.kubernetes.io/component: service-query
app.kubernetes.io/instance: jaeger-backend
app.kubernetes.io/managed-by: jaeger-operator
app.kubernetes.io/name: jaeger-backend-query
app.kubernetes.io/part-of: jaeger