一个检测程序nan的脚本

程序中出现nan意味着崩溃,不对其进行判断则程序继续运行,浪费cpu。而如果程序中每次都检测,代价太大。折中之下另外写了一个检测脚本,既能及时发现程序崩溃,又无需原来的程序做额外操作,保证了性能。

脚本原理:

  1. 使用重定向、tee等将标准输出写入日志文件;
  2. 脚本定时(10s)用tail查看最新输出,发现nan则杀死程序,脚本退出。

脚本用法:

  1. 执行程序,将屏幕输出写入文件。例如:nohup ./ttt > ~/log.txt 2>&1 &,或用tee重定向:./ttt | tee log.txt
  2. ps命令找到程序的pid: ps aux | grep ttt | grep -v grep | awk '{print $2}'
  3. 执行监控:./checkNAN.sh pid log.txt

脚本内容:

#!/bin/bash
# author: tlanyan
# link: <https://ssrvps.org/archives/4030>
set -e

usage() {
    echo "Usage: ./checkNAN pid logfile"
}

argc=$#
if [ $argc -lt 2 ]
then
    usage
    exit 1
fi

PID=$1
LOGFILE=$2

COMMAND=`ps -ef | grep $PID | grep -v grep | grep -v checkNAN| head -n 1 | awk '{print $8}'`
if [ "$COMMAND" = "" ]; then
    echo "unknow pid: $PID"
    exit 1
fi

if [ ! -e "$LOGFILE" ]; then
    echo "non-exists log file: $LOGFILE"
    exit 1
fi

echo "watch pid: $PID($COMMAND) for log file: $LOGFILE"

count=0
while true
do
    ret=`ps -ef | grep $PID | grep -v grep | grep -v checkNAN| head -n 1 | awk '{print $8}'`
    if [ "$ret" = "" ]; then
        echo "process quit!"
        exit 0
    fi
    ret=$(tail $LOGFILE | grep -i nan|wc -l)
    if [[ $ret -ne 0 ]]; then
        echo "nan checked!"
        tail $LOGFILE | grep nan
        echo "kill process"
        kill -9 $PID
        echo "watch exit"
        exit 0
    fi

    count=$((count+1))
    if [[ $(($count%6)) -eq 0 ]]; then
        date=$(date +'%Y-%m-%d %H-%M-%S')
        echo "$date: no nan checked..."
    fi

    sleep 10
done
留言评论

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注

Captcha Code

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select * from vps_autoblog_queue where is_running >0

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select count(*) from vps_autoblog_queue where task_id=6 and is_running>0

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select id,fetched,is_running,last_date_time from vps_autoblog_queue where task_id=6 and source_url_key='85f8e27ce55bd78fd33f47d2b1409198'

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
insert into vps_autoblog_queue(task_id,source_url,source_url_key,create_date_time,not_check_stoped,post_interval) values(6,'https://www.hijk.pw/post-sitemap.xml','85f8e27ce55bd78fd33f47d2b1409198',1750368435,0,0)

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select count(*) from vps_autoblog_queue where task_id=4 and is_running>0

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select id,fetched,is_running,last_date_time from vps_autoblog_queue where task_id=4 and source_url_key='54b1dccf98c13c81f546faf82132bc77'

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
insert into vps_autoblog_queue(task_id,source_url,source_url_key,create_date_time,not_check_stoped,post_interval) values(4,'https://ssr.tools/sitemap-posttype-post.2020.xml','54b1dccf98c13c81f546faf82132bc77',1750368435,0,0)

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select count(*) from vps_autoblog_queue where task_id=2 and is_running>0

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select id,fetched,is_running,last_date_time from vps_autoblog_queue where task_id=2 and source_url_key='91372ddcd9e4809b115feaa5b6f00815'

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
insert into vps_autoblog_queue(task_id,source_url,source_url_key,create_date_time,not_check_stoped,post_interval) values(2,'https://tlanyan.me/post-sitemap.xml','91372ddcd9e4809b115feaa5b6f00815',1750368435,0,0)

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select count(*) from vps_autoblog_queue where task_id=7 and is_running>0

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select id,fetched,is_running,last_date_time from vps_autoblog_queue where task_id=7 and source_url_key='569e17863f7693f1afa9e9256af9dcff'

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
insert into vps_autoblog_queue(task_id,source_url,source_url_key,create_date_time,not_check_stoped,post_interval) values(7,'https://aisoa.cn/sitemap.xml','569e17863f7693f1afa9e9256af9dcff',1750368435,0,0)

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select count(*) from vps_autoblog_queue where task_id=9 and is_running>0

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select id,fetched,is_running,last_date_time from vps_autoblog_queue where task_id=9 and source_url_key='cc9416f8df40d8c3a48e66856b1cd6fd'

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
insert into vps_autoblog_queue(task_id,source_url,source_url_key,create_date_time,not_check_stoped,post_interval) values(9,'https://www.v2rayssr.com/sitemap.xml','cc9416f8df40d8c3a48e66856b1cd6fd',1750368435,0,0)

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select count(*) from vps_autoblog_queue where task_id=10 and is_running>0

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select id,fetched,is_running,last_date_time from vps_autoblog_queue where task_id=10 and source_url_key='74714f67855820f0c701ce905b231f27'

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
insert into vps_autoblog_queue(task_id,source_url,source_url_key,create_date_time,not_check_stoped,post_interval) values(10,'https://doubibackup.com/sitemap.xml','74714f67855820f0c701ce905b231f27',1750368435,0,0)

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select id,is_running,last_post_time,post_interval,not_check_stoped from vps_autoblog_queue WHERE is_running>0 and fetched=0 order BY id ASC

WordPress数据库错误: [Table 'ssrvps.vps_autoblog_queue' doesn't exist]
select id,sn_id from vps_autoblog_queue WHERE is_running=0 and fetched=0 order BY id ASC LIMIT 1