NagiosでMySQLのslow query監視しようとして見つけたcheck_mysql_healthが色々できる

掲題の件、少し探したらcheck_mysql_healthというプラグイン見つけた。

ConSol Labs - check_mysql_health

導入方法

以下のような方法で導入します。

cat <<EOF | /bin/bash -eu
test ! -e /usr/local/nrpe && sudo mkdir /usr/local/nrpe
cd /tmp
wget https://labs.consol.de/assets/downloads/nagios/check_mysql_health-2.2.2.tar.gz
tar xvf check_mysql_health-2.2.2.tar.gz
cd check_mysql_health-2.2.2
./configure \
--prefix=/usr/local/nrpe \
--with-nagios-user=nrpe \
--with-nagios-group=nrpe \
--with-mymodules-dir=/usr/local/nrpe/libexec \
--with-mymodules-dyn-dir=/usr/local/nrpe/libexec
make
sudo make install
# NRPE Settings
echo 'command[check_mysql_health]=/usr/local/nrpe/libexec/check_mysql_health --mode \$ARG1$ --warning \$ARG2$ --critical \$ARG3$' | sudo tee /etc/nrpe.d/check_mysql_health.cfg
sudo systemctl restart nrpe
EOF

Slow Queryの監視

slow queryの監視は slow-queries modeを指定して行います。

前回の実行時から今までのslow queryの出現率に大して閾値を設定する形になります。

$ /usr/local/nrpe/libexec/check_mysql_health  --mode slow-queries --warning 0.01 --critical 1
OK - 0 slow queries in 3 seconds (0.00/sec) | slow_queries_rate=0.00%;0.01;1

$ mysql -e 'select sleep(10);'
+-----------+
| sleep(10) |
+-----------+
|         0 |
+-----------+

$ /usr/local/nrpe/libexec/check_mysql_health  --mode slow-queries --warning 0.01 --critical 1
WARNING - 1 slow queries in 16 seconds (0.06/sec) | slow_queries_rate=0.06%;0;1

Nagios側設定

Slow Query出現率が0.01%超えたら警告、1%でCRITICALにする場合なら以下のような感じです。 確実に補足するためにも max_check_attempts などを調整した方が良いかもしれないです。

define service{
........
        service_description             MySQL SlowQuery
        check_command                   check_nrpe!check_mysql_health!slow-queries 0.01 1
        max_check_attempts              1
        }

他にも色々監視できます

Modeがあるくらいなので他にも色々と監視できます。

    --mode
       the mode of the plugin. select one of the following keywords:
       connection-time          (Time to connect to the server)
       uptime                   (Time the server is running)
       threads-connected        (Number of currently open connections)
       threadcache-hitrate      (Hit rate of the thread-cache)
       threads-created          (Number of threads created per sec)
       threads-running          (Number of currently running threads)
       threads-cached           (Number of currently cached threads)
       connects-aborted         (Number of aborted connections per sec)
       clients-aborted          (Number of aborted connections (because the client died) per sec)
       slave-lag                (Seconds behind master)
       slave-io-running         (Slave io running: Yes)
       slave-sql-running        (Slave sql running: Yes)
       qcache-hitrate           (Query cache hitrate)
       qcache-lowmem-prunes     (Query cache entries pruned because of low memory)
       keycache-hitrate         (MyISAM key cache hitrate)
       bufferpool-hitrate       (InnoDB buffer pool hitrate)
       bufferpool-wait-free     (InnoDB buffer pool waits for clean page available)
       log-waits                (InnoDB log waits because of a too small log buffer)
       tablecache-hitrate       (Table cache hitrate)
       table-lock-contention    (Table lock contention)
       index-usage              (Usage of indices)
       tmp-disk-tables          (Percent of temp tables created on disk)
       table-fragmentation      (Show tables which should be optimized)
       open-files               (Percent of opened files)
       slow-queries             (Slow queries)
       long-running-procs       (long running processes)
       cluster-ndbd-running     (ndnd nodes are up and running)
       sql                      (any sql command returning a single number)