Docker & Kernel Configuration

This document lists the Kernel tweaks M3DB needs to run well. If you are running on Kubernetes, you may use our sysctl-setter DaemonSet that will set these values for you. Please read the comment in that manifest to understand the implications of applying it.

Running with Docker

When running M3DB inside Docker, it is recommended to add the SYS_RESOURCE capability to the container (using the --cap-add argument to docker run) so that it can raise its file limits:

docker run --cap-add SYS_RESOURCE quay.io/m3/m3dbnode:latest

If M3DB is being run as a non-root user, M3’s setcap images are required:

docker run --cap-add SYS_RESOURCE -u 1000:1000 quay.io/m3/m3dbnode:latest-setcap

More information on Docker’s capability settings can be found here.

vm.max_map_count

M3DB uses a lot of mmap-ed files for performance, as a result, you might need to bump vm.max_map_count. We suggest setting this value to 3000000, so you don’t have to come back and debug issues later.

On Linux, you can increase the limits by running the following command as root:

sysctl -w vm.max_map_count=3000000

To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf.

vm.swappiness

vm.swappiness controls how much the virtual memory subsystem will try to swap to disk. By default, the kernel configures this value to 60, and will try to swap out items in memory even when there is plenty of RAM available to the system.

We recommend sizing clusters such that M3DB is running on a substrate (hosts/containers) such that no-swapping is necessary, i.e. the process is only using 30-50% of the maximum available memory. And therefore recommend setting the value of vm.swappiness to 1. This tells the kernel to swap as little as possible, without altogether disabling swapping.

On Linux, you can configure this by running the following as root:

sysctl -w vm.swappiness=1

To set this value permanently, update the vm.swappiness setting in /etc/sysctl.conf.

rlimits

M3DB also can use a high number of files and we suggest setting a high max open number of files due to per partition fileset volumes.

You may need to override the system and process-level limits set by the kernel with the following commands. To check the existing values run:

sysctl -n fs.file-max

and

sysctl -n fs.nr_open

to see the kernel and process limits respectively. If either of the values are less than three million (our minimum recommended value), then you can update them with the following commands:

sysctl -w fs.file-max=3000000
sysctl -w fs.nr_open=3000000

To set these values permanently, update the fs.file-max and fs.nr_open settings in /etc/sysctl.conf.

Alternatively, if you wish to have M3DB run under systemd you can use our service example which will set sane defaults. Keep in mind that you’ll still need to configure the kernel and process limits because systemd will not allow a process to exceed them and will silently fallback to a default value which could cause M3DB to crash due to hitting the file descriptor limit. Also note that systemd has a system.conf file and a user.conf file which may contain limits that the service-specific configuration files cannot override. Be sure to check that those files aren’t configured with values lower than the value you configure at the service level.

Before running the process make sure the limits are set, if running manually you can raise the limit for the current user with ulimit -n 3000000.

Automatic Limit Raising

During startup, M3DB will attempt to raise its open file limit to the current value of fs.nr_open. This is a benign operation; if it fails M3DB, will simply emit a warning.