Metrics
It is best to use Prometheus to monitor M3DB, M3 Coordinator and M3 Query using the Grafana dashboards.
Logs
Logs are printed to process output in JSON by default for semi-structured log processing.
Tracing
M3DB is integrated with opentracing to provide insight into query performance and errors.
Jaeger
To enable Jaeger as the tracing backend, set tracing.backend to “jaeger” (see also our sample local config:
tracing:
backend: jaeger # enables jaeger with default configs
jaeger:
# optional configuration for jaeger -- see
# https://github.com/jaegertracing/jaeger-client-go/blob/master/config/config.go#L37
# for options
...
Jaeger can be run locally with docker as described in https://www.jaegertracing.io/docs/1.9/getting-started/.
The default configuration will report traces via udp to localhost:6831; using the all-in-one jaeger container, they will be accessible at
http://localhost:16686
N.B.: for production workloads, you will almost certainly want to use sampler.type=remote with adaptive sampling for Jaeger, as write volumes are likely orders of magnitude higher than read volumes in most timeseries systems.
LightStep
To use LightStep as the tracing backend, set tracing.backend
to "lightstep"
and configure necessary information for
your client under lightstep
. Any options exposed in lightstep-tracer-go
can be set in config.
Any environment variables may be interpolated. For example:
tracing:
serviceName: m3coordinator
backend: lightstep
lightstep:
access_token: ${LIGHTSTEP_ACCESS_TOKEN:""}
collector:
scheme: https
host: my-satellite-address.domain
port: 8181
Alternative backends
If you’d like additional backends, we’d love to support them!
File an issue against M3 and we can work with you on how best to add the backend. The first time’s going to be a little rough–opentracing unfortunately doesn’t support Go plugins (yet–see https://github.com/opentracing/opentracing-go/issues/133), and Go’s dependency model means that adding dependencies directly will update everything, which isn’t ideal for an isolated dependency change. These problems are all solvable though, and we’ll work with you to make it happen!
Use cases
Note: all URLs assume a local jaeger setup as described in Jaeger’s docs.
Finding slow queries
To find prom queries longer than minDuration >= <threshold>
on
operation="GET /api/v1/query_range"
.
Sample query: http://localhost:16686/search?end=1548876672544000&limit=20&lookback=1h&maxDuration&minDuration=1ms&operation=GET%20%2Fapi%2Fv1%2Fquery_range&service=m3query&start=1548873072544000
Finding queries with errors
Search for error=true
on operation="GET /api/v1/query_range"
http://localhost:16686/search?operation=GET%20%2Fapi%2Fv1%2Fquery_range&service=m3query&tags=%7B%22error%22%3A%22true%22%7D
Finding 500 (Internal Server Error) responses
Search for http.status_code=500
.
http://localhost:16686/search?limit=20&lookback=24h&maxDuration&minDuration&operation=GET%20%2Fapi%2Fv1%2Fquery_range&service=m3query&start=1548802430108000&tags=%7B"http.status_code"%3A"500"%7D