掲題の件、少し探したらcheck_mysql_healthというプラグイン見つけた。
ConSol Labs - check_mysql_health
導入方法
以下のような方法で導入します。
cat <<EOF | /bin/bash -eu test ! -e /usr/local/nrpe && sudo mkdir /usr/local/nrpe cd /tmp wget https://labs.consol.de/assets/downloads/nagios/check_mysql_health-2.2.2.tar.gz tar xvf check_mysql_health-2.2.2.tar.gz cd check_mysql_health-2.2.2 ./configure \ --prefix=/usr/local/nrpe \ --with-nagios-user=nrpe \ --with-nagios-group=nrpe \ --with-mymodules-dir=/usr/local/nrpe/libexec \ --with-mymodules-dyn-dir=/usr/local/nrpe/libexec make sudo make install # NRPE Settings echo 'command[check_mysql_health]=/usr/local/nrpe/libexec/check_mysql_health --mode \$ARG1$ --warning \$ARG2$ --critical \$ARG3$' | sudo tee /etc/nrpe.d/check_mysql_health.cfg sudo systemctl restart nrpe EOF
Slow Queryの監視
slow queryの監視は slow-queries
modeを指定して行います。
前回の実行時から今までのslow queryの出現率に大して閾値を設定する形になります。
$ /usr/local/nrpe/libexec/check_mysql_health --mode slow-queries --warning 0.01 --critical 1 OK - 0 slow queries in 3 seconds (0.00/sec) | slow_queries_rate=0.00%;0.01;1 $ mysql -e 'select sleep(10);' +-----------+ | sleep(10) | +-----------+ | 0 | +-----------+ $ /usr/local/nrpe/libexec/check_mysql_health --mode slow-queries --warning 0.01 --critical 1 WARNING - 1 slow queries in 16 seconds (0.06/sec) | slow_queries_rate=0.06%;0;1
Nagios側設定
Slow Query出現率が0.01%超えたら警告、1%でCRITICALにする場合なら以下のような感じです。
確実に補足するためにも max_check_attempts
などを調整した方が良いかもしれないです。
define service{ ........ service_description MySQL SlowQuery check_command check_nrpe!check_mysql_health!slow-queries 0.01 1 max_check_attempts 1 }
他にも色々監視できます
Modeがあるくらいなので他にも色々と監視できます。
--mode the mode of the plugin. select one of the following keywords: connection-time (Time to connect to the server) uptime (Time the server is running) threads-connected (Number of currently open connections) threadcache-hitrate (Hit rate of the thread-cache) threads-created (Number of threads created per sec) threads-running (Number of currently running threads) threads-cached (Number of currently cached threads) connects-aborted (Number of aborted connections per sec) clients-aborted (Number of aborted connections (because the client died) per sec) slave-lag (Seconds behind master) slave-io-running (Slave io running: Yes) slave-sql-running (Slave sql running: Yes) qcache-hitrate (Query cache hitrate) qcache-lowmem-prunes (Query cache entries pruned because of low memory) keycache-hitrate (MyISAM key cache hitrate) bufferpool-hitrate (InnoDB buffer pool hitrate) bufferpool-wait-free (InnoDB buffer pool waits for clean page available) log-waits (InnoDB log waits because of a too small log buffer) tablecache-hitrate (Table cache hitrate) table-lock-contention (Table lock contention) index-usage (Usage of indices) tmp-disk-tables (Percent of temp tables created on disk) table-fragmentation (Show tables which should be optimized) open-files (Percent of opened files) slow-queries (Slow queries) long-running-procs (long running processes) cluster-ndbd-running (ndnd nodes are up and running) sql (any sql command returning a single number)