Jaeger Backend: Collector & Query

Jaeger Backend: Collector & Query

This article describes how EverQuote uses the Jaeger Operator to provision the Jaeger backend.

Disclaimer: Just as in the previous article, this particular setup should not be construed as an endorsement or recommendation. Use your own judgment before adopting anything covered by this article.

Assumptions

I will assume that we are using the ElasticSearch setup described in the previous article.

Furthermore, I will assume that the Jaeger Operator has already been deployed since this is something that must be tailored to your particular environment.

We use a combination of Helm, Kustomize, Flux, and GitHub Actions to deploy all of our Kubernetes cluster services. You can read more about that in a future article.

RBAC

The Jaeger Operator Helm chart creates RBAC manifests for all possible deployment strategies of Jaeger. This creates a bunch of unnecessary “noise” which is why we opt to create the RBAC manifests ourselves.

Helm chart config:

rbac:
  create: false

Since we are installing Jaeger into its own namespace, we simply create a RoleBinding that makes the Jaeger Operator an admin for that namespace.

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: jaeger-operator-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- kind: ServiceAccount
  name: jaeger-operator

⚠ Note: The ClusterRole “admin” is a built-in role that comes with every Kubernetes cluster. All of its permissions are scoped to the namespace of the RoleBinding that binds to it.

The Jaeger Operator also installs custom resource definitions (currently just one actually) under the API group jaegertracing.io.

In order for the operator to be allowed to manage the custom resources that it is responsible for, we also have to create a Role and a RoleBinding for that (since we don’t want to extend the permissions of the “admin” ClusterRole).

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: jaeger-operator
rules:
- apiGroups:
  - jaegertracing.io
  resources:
  - '*'
  verbs:
  - '*'

---

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: jaeger-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: jaeger-operator
subjects:
- kind: ServiceAccount
  name: jaeger-operator

⚠ Note: The above RoleBindings obviously assume that the Jaeger Operator is running as the ServiceAccount “jaeger-operator”.

Docker Hub Credentials

Ever since Docker Hub started rate limiting image pulls for the free tier, we have been sure to outfit all of our Kubernetes workloads (or their associated ServiceAccounts) with Docker Hub credentials for our paid subscription.

Otherwise a worker node upgrade or an availability zone failure could propel large portions of our Kubernetes workloads into an ImagePullBackOff state.

We are therefore going to hand-create the ServiceAccounts used by Jaeger’s backend components and supply them with the required image pull secrets.

Helm chart config:

serviceAccount:
  create: false
apiVersion: v1
kind: ServiceAccount
metadata:
  name: jaeger-collector
imagePullSecrets:
- name: dockerhub-credentials

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: jaeger-query
imagePullSecrets:
- name: dockerhub-credentials

⚠ Note: We use Kubernetes External Secrets to project these Docker Hub credentials into all of our namespaces. You can read more about that in a future article.

Deployment

Before we spin up the Jaeger backend we have to create a Secret that contains the username and password for ElasticSearch.

$ kubectl create secret generic jaeger-es-credentials \
    --from-literal=ES_USERNAME=jaeger \
    --from-literal=ES_PASSWORD=0penSes@me

$ history -c    # don’t leave sensitive information in your shell history

Now we can use the Jaeger resource to spin up the Jaeger backend.

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger-backend
spec:
  strategy: production
  collector:
    maxReplicas: 6
    resources:
      limits:
        cpu: 2000m
        memory: 16Gi
      requests:
        cpu: 1000m
        memory: 8Gi
    options:
      log-level: info
      es:
        use-aliases: true
    annotations:
      linkerd.io/inject: enabled
      config.linkerd.io/skip-outbound-ports: "443"
    serviceAccount: jaeger-collector
  sampling:
    options:
      default_strategy:
        type: probabilistic
        param: 0.01 # 1%
  query:
    replicas: 3
    resources:
      limits:
        cpu: 1000m
        memory: 8Gi
      requests:
        cpu: 500m
        memory: 4Gi
    options:
      log-level: info
      es:
        use-aliases: true
    annotations:
      linkerd.io/inject: enabled
      config.linkerd.io/skip-outbound-ports: "443"
    serviceAccount: jaeger-query
  storage:
    type: elasticsearch
    options:
      es:
        create-index-templates: false
        server-urls: https://jaeger-es
    esIndexCleaner:
      enabled: false
    dependencies:
      enabled: false
    secretName: jaeger-es-credentials
  ingress:
    enabled: false

Unfortunately, the Helm chart installs a custom resource definition that is completely devoid of structure (which makes it hard to figure out how to create a valid Jaeger resource). But you can find the full CRD spec in the jaeger-operator repo on GitHub.

I won’t explain all the config options for the Jaeger resource but there are a few things I’d like to point out.

1) The ElasticSearch config enables the use of index aliases and disables index initialization and lifecycle management. The reason for this has to do with our ElasticSearch setup. You can read about that setup here.

spec:
  collector:
    options:
      es:
        use-aliases: true
  query:
    options:
      es:
        use-aliases: true
  storage:
    options:
      es:
        create-index-templates: false
    esIndexCleaner:
      enabled: false

2) Despite the fact that we have Linkerd proxy sidecar injection enabled for the Jaeger namespace, we have to explicitly enable this injection here. That’s because the Jaeger Operator adds annotations to the Jaeger backend Deployments to prevent the injection of Istio and Linkerd proxy sidecars. The reason for this behavior is unclear and not documented.

spec:
  collector:
    annotations:
      linkerd.io/inject: enabled
      config.linkerd.io/skip-outbound-ports: "443"
  query:
    annotations:
      linkerd.io/inject: enabled
      config.linkerd.io/skip-outbound-ports: "443"

3) We want our Nginx ingress controllers to restrict access to the Jaeger backend and use a different authentication method for the Query and Collector components. So we don’t want the Jaeger Operator to create an Ingress resource for us.

spec:
  ingress:
    enabled: false

4) We recommend to our users to use the “remote” sampling strategy. This strategy defers the selection of the actual sampling strategy upstream. This allows us to steer the sampling strategy for all workloads from a central location in case we have to respond to load spikes.

spec:
  sampling:
    options:
      default_strategy:
        type: probabilistic
        param: 0.01 # 1%

Now the Jaeger backend is ready to rock. It’s time to expose it to the outside world so we can start using it.

Ingress

Let’s start by exposing the Jaeger Collector which is the backend component that receives traces and persists them to ElasticSearch in our case.

As mentioned in the previous section, we want to restrict access to our Jaeger backend components.

For the Collector (which is a write-only) endpoint we mainly want to prevent operator error. E.g. traces from our prod environment ending up in the staging instance of Jaeger.

Since Jaeger traces are sent to the Collector via GRPC, there’s really only one officially supported way to do client authentication: TLS.

In the next article, I will show how to provision a private CA that will allow us to easily create TLS client certs for our Jaeger Agent containers. For now I will simply assume that such a CA already exists.

Create a Secret that contains the certificate(s) of the CA(s) you want to trust.

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: jaeger-auth-collector
stringData:
  ca.crt: |-
    -----BEGIN CERTIFICATE-----
    MIIF5jCCA86gAwIBAgIJALndCNnLu42PMA0GCSqGSIb3DQEBCwUAMDAxLjAsBgNV
    BAMMJU15IFByaXZhdGUgQ0EgQCBteS1rdWJlcm5ldGVzLWNsdXN0ZXIwHhcNMjEw
    NzE1MjM0OTA5WhcNMzEwNzEzMjM0OTA5WjAwMS4wLAYDVQQDDCVNeSBQcml2YXRl
    IENBIEAgbXkta3ViZXJuZXRlcy1jbHVzdGVyMIICIjANBgkqhkiG9w0BAQEFAAOC
    Ag8AMIICCgKCAgEApckDrPxGSbtRNrFN6k5vlhqcM42n3RdU0XXD1PkexUvdHpi3
    c8bIzjQhBLZQjmBbg3i8QhHbrl+A+lEWtUMKpNeQvCrYyLWY9786X2iM0tb6LkmS
    rrBQ2u1+olygKL9U368CsJBuA0rtK+xGaZyEhhh+F4No6DBkHAgrM8k5OpLG/AyF
    xdKtlyzF+Um6qUNMr1mNI5VD+LyjDAbzqRjZywnYTl30AWQLTx3UKwVD3CpTHW1q
    rRJYtZO+OscpkbFeMSA2TUIH+ek4oV+YXyh5NEhMkmS2rYwQl2fKyQ22qq+kcyxG
    kV+2IUMVergYCh8gBrfVzJ0SNdWmfe+Ojg5uMwbrRLPOoS7TA69Q7W+SMRmew8Za
    l1/fF5m/ADn1ixiMNsXz3pbn173c+NLIgKSDQ7KTz0797CqOXZRK4wI/lLUfQY+1
    QZf3owzTQEDRdd1i1X8XqyDbznHX5ohV0YqZQFQEX/CLXk3b2M0LIZCS0tv1qsQv
    r5Rc2Me1KdNxFByEnfVmbkwMvl+RWjmbko8cyW5AuATL3wCVZj8pnOwUaB4qzjyz
    0v2HGPh2eMfni6EWSqOqQJhFw1F/wKa0Jq9csuVEGxlKHA34yE+PXaaJXIWEsgc1
    2n+7//4ALUJky9gblJbR2AOCFKWdn92LtWgfltkZr/M17mxuslCvkJLt+y8CAwEA
    AaOCAQEwgf4wEgYDVR0TAQH/BAgwBgEB/wIBATAOBgNVHQ8BAf8EBAMCAQYwHQYD
    VR0OBBYEFFylrKtVHqCOoxzaQnxm7sUJ6l9+MGsGCWCGSAGG+EIBDQReFlxZb3Vy
    IGN1cmlvc2l0eSBzZXJ2ZXMgeW91IHdlbGwhIFdhbnQgdG8gd29yayBhdCBFdmVy
    UXVvdGU/IFNlbmQgbWFpbCB0byBoNHgwcnpAZXZlcnF1b3RlLmNvbTBMBg4rBgEE
    AYPBQQEBAYo5AQQ6FjggPj4gVTJsbmJpQjViM1Z5SUdWdFlXbHNJSGRwZEdnZ2RH
    aHBjeUJsYlc5cWFUb2c4Situb0FvPTANBgkqhkiG9w0BAQsFAAOCAgEAg5lL1bja
    /yGZQR6RocaS84u39AMzrxBQnQXp4Nx7jJCMhmMIDFTwJzP5mUtLTLA9fvv8FLmO
    rTJHHpKPRSce4cpDjsQkFcCanOSlYEfhJxBdi8YEQ+1CIvE9S56Ifa4t1UbaCi2e
    rQUW7EG/OoGQqB4hoIUcbYBq+mwvoYieJ1c1AmE9TlU/tXAXCUkjO2QTRHdRddnZ
    ixfg4DttWEkx7Cf6kU49T5BWjXtJVTK2Rn/gHNtCvavR/+JlG/kq/yDbubwoJ/xD
    RdedvP3LdvI6WGuzRJLdkaXtIX3hQ+8ah9uykJqSRfUrOPfdmt1Zohh+R4O6POYF
    2bLtdjpKTcD7Ni1gxPtHRJeVFI4RIFSlM0QKfHo78q5URzdnRWTFs9KnDE8r0UmJ
    Y1ZVtsAAgSbDZmZGb7L5JQcBJVqoq0NwKdfseFBfWrghjxd2cq+2qaU3IHlIIx5c
    jYiLhxtjKxqievWy7E8bgLcX4mx4nHgT3U4dhKUjFWxyrSuaANspm1lwEVVjmP/T
    bjqKOM++uzeSQtIFHTj3Z2HjMFbQ2W4gWcIeMNONNJPRGff3shW4I9XA/wMbUYwo
    8wmLecfpdOBi/ZRDL9MKhNnhlPh0Kd4kmZaHBwI6HNTkGiE4twJxEOSXV9nMTqs2
    1sdqe5UR6o/lo6G3mK0Wdxl96IVFeUnFaWU=
    -----END CERTIFICATE-----

⚠ Note: At EverQuote we actually create a separate CA for each Kubernetes cluster. Simply concatenate all the CA certs together and add them to the ca.crt key in the above Secret. For non-Kubernetes workloads you can use an Amazon-hosted ACM Private CA (but that’s outside the scope of this article).

Now we can create an Ingress that uses the above Secret for TLS client authentication.

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: jaeger-collector
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/auth-tls-verify-client: on
    nginx.ingress.kubernetes.io/auth-tls-secret: jaeger/jaeger-auth-collector
    nginx.ingress.kubernetes.io/auth-tls-verify-depth: "1"
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
spec:
  rules:
  - host: jaeger-collector
    http:
      paths:
      - path: /
        backend:
          serviceName: jaeger-backend-collector
          servicePort: 14250
  tls:
  - hosts:
    - jaeger-collector
    secretName: jaeger-collector-tls

⚠ Note: The nginx.ingress.kubernetes.io/auth-tls-secret annotation requires you to specify the above Secret in the format: <namespace>/<secret>. I am using the namespace jaeger in the above example. Be sure to adjust this to match your setup.

⚠ Note: We use cert-manager to automatically provision TLS certs in all our Kubernetes clusters. So the Secret containing the TLS (server) cert for the above Ingress (jaeger-collector-tls) will automatically be populated with a valid cert.

⚠ Note: Just as in the previous article, I am using an abbreviated, unqualified hostname here as a stand-in for an actual hostname: jaeger-collector. Be sure to update this with the real DNS name for your Ingress.

That’s it! We can now send traces to the Jaeger Collector.

But before we call it a day, let’s also expose the Jaeger Query component to the outside world.

For the Query endpoint we will use HTTP Basic Auth to restrict access. And, again, we will get our Ingress to handle the authentication for us.

I recommend creating Bcrypt hashes which are resistant to brute force attacks due to their inherent computational complexity. However, unless you use a tool such as pwgen to generate strong, random passwords these hashes should be considered sensitive.

$ htpasswd -nbB <username> <password>

Create a Secret that contains your auth file.

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: jaeger-auth-query
stringData:
  auth: |-
    [email protected]:$2y$05$gibberish
    [email protected]:$2y$05$gibberish
    [email protected]:$2y$05$gibberish

⚠ Note: The Jaeger Query endpoint can be used as a Grafana data source. This enables you to integrate traces with logs and metrics for a fantastic observability experience. More on that in a future article.

Now we can create the Ingress for our Jaeger Query component.

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: jaeger-query
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret-type: auth-file
    nginx.ingress.kubernetes.io/auth-secret: jaeger/jaeger-auth-query
    nginx.ingress.kubernetes.io/auth-realm: Jaeger Query
spec:
  rules:
  - host: jaeger
    http:
      paths:
      - path: /
        backend:
          serviceName: jaeger-backend-query
          servicePort: 16686
  tls:
  - hosts:
    - jaeger
    secretName: jaeger-tls

⚠ Note: The nginx.ingress.kubernetes.io/auth-secret annotation does not require you to specify the above Secret in the format: <namespace>/<secret>. However, you can do so if you like. I am using this format here simply for consistency with the Jaeger Collector Ingress.

⚠ Note: As above, I am using an abbreviated, unqualified hostname here as a stand-in for an actual hostname: jaeger. Be sure to update this with the real DNS name for your Ingress.

Voila! Our Jaeger backend is ready for use.

Metrics

A final note about using Prometheus to monitor the Jaeger backend (which allows you to create alerts using these Prometheus rules)...

There is currently some back-and-forth in the Jaeger project about whether the Jaeger Operator or its Helm chart should be in charge of creating ServiceMonitors to get Prometheus to scrape Jaeger metrics.

ServiceMonitors are custom resources used by Prometheus Operator to configure shared Prometheus instances in a Kubernetes cluster.

If you have Prometheus Operator deployed in your Kubernetes cluster, I would recommend that you create your own Jaeger ServiceMonitors for now.

For the Jaeger Collector I would actually use a PodMonitor instead because the Jaeger Operator currently creates two Services with identical labels for the Jaeger Collector:

  • jaeger-backend-collector
  • jaeger-backend-collector-headless

⚠ Note: The exact name for these Services is derived from the name of the Jaeger resource you create above.

Anyway, here’s what we’re using to get Prometheus to scrape Jaeger metrics.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: jaeger-operator
spec:
  podMetricsEndpoints:
  - port: metrics
    honorLabels: true
  namespaceSelector:
    matchNames:
    - jaeger
  selector:
    matchLabels:
      app.kubernetes.io/name: jaeger-operator

---

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: jaeger-collector
spec:
  podMetricsEndpoints:
  - port: admin-http
    honorLabels: true
  namespaceSelector:
    matchNames:
    - jaeger
  selector:
    matchLabels:
      app.kubernetes.io/component: collector
      app.kubernetes.io/instance: jaeger-backend
      app.kubernetes.io/managed-by: jaeger-operator
      app.kubernetes.io/name: jaeger-backend-collector
      app.kubernetes.io/part-of: jaeger

---

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: jaeger-query
spec:
  endpoints:
  - targetPort: admin-http
    honorLabels: true
  namespaceSelector:
    matchNames:
    - jaeger
  selector:
    matchLabels:
      app.kubernetes.io/component: service-query
      app.kubernetes.io/instance: jaeger-backend
      app.kubernetes.io/managed-by: jaeger-operator
      app.kubernetes.io/name: jaeger-backend-query
      app.kubernetes.io/part-of: jaeger