Monitoring System Health

When you power up your DRAGEN system a daemon (dragen_mond) is started that monitors the card for hardware issues. This daemon is also started when your DRAGEN system is installed or updated. The main purpose of this daemon is to monitor DRAGEN Bio-IT Processor temperature and abort DRAGEN when the temperature exceeds a configured threshold. 

To manually start, stop, or restart the monitor, run the following as root:

sudo service dragen_mond [stop|start|restart]

By default, the monitor polls for hardware issues once per minute and logs temperature once every hour.

The /etc/sysconfig/dragen_mond file specifies the command line options used to start dragen_mond when the service command is run. Edit DRAGEN_MOND_OPTS in this file to change the default options. For example, the following changes the poll time to 30 seconds and the log time to once every 2 hours:

DRAGEN_MOND_OPTS="-d -p 30 -l 7200"

The -d option is required to run the monitor as a daemon. 

The dragen_mond command line options are as follows:

Option

Description

-m --swmaxtemp <n> Maximum software alarm temperature (Celsius). Default is 85.
-i --swmintemp <n> Minimum software alarm temperature (Celsius). Default is 75.
-H --hwmaxtemp <n> Maximum hardware alarm temperature (Celsius). Default is 100.
-p --polltime <n> Time between polling chip status register (seconds). Default is 60.
-l --logtime <n> Log FPGA temp every n seconds. Default is 3600. Must be a multiple of polltime
-d --daemon Detach and run as a daemon.
-h --help Print help and exit.
-V --version Print the version and exit.

To display the current temperature of the DRAGEN Bio-IT Processor, use the dragen_info -t command. This command does not execute if dragen_mond is not running.

% dragen_info -t
FPGA Temperature: 42C  (Max Temp: 49C, Min Temp: 39C)