When a linux server reboots, it records data to a couple of different logs in
the process. The first is the last log, this log records all logins and from
where a user logs in from as well as all restarts of the system. The second
is the messages log, this is the main system log for the server. $USER is used instead of actual user name in this article.
Checking the last log:
You will need to SSH to the server and run last | grep reboot, this will output
all of the reboots a server has done. The newest is at the top of the list.
The output will look something like this:
[adminuser@AFS-SL-OSCAD01 ~]$ last | grep reboot
reboot system boot 2.6.32-220.7.1.e Thu May 17 13:43 – 04:59 (76+15:15)
reboot system boot 2.6.32-220.7.1.e Tue Apr 10 18:50 – 13:42 (36+18:52)
reboot system boot 2.6.32-220.7.1.e Thu Mar 22 13:42 – 18:50 (19+05:07)
reboot system boot 2.6.32-131.21.1. Tue Dec 20 15:42 – 13:42 (92+20:59)
reboot system boot 2.6.32-131.21.1. Tue Dec 20 00:48 – 15:34 (14:45)
reboot system boot 2.6.32-131.21.1. Tue Dec 20 00:17 – 00:27 (00:10)
reboot system boot 2.6.32-131.21.1. Mon Dec 19 23:44 – 00:08 (00:23)
reboot system boot 2.6.32-131.21.1. Mon Dec 19 23:35 – 23:42 (00:07)
reboot system boot 2.6.32-131.21.1. Mon Dec 19 23:24 – 23:27 (00:03)
reboot system boot 2.6.32-131.21.1. Thu Dec 15 17:53 – 23:23 (4+05:30)
reboot system boot 2.6.32-131.0.15. Thu Dec 15 17:32 – 17:52 (00:19)
Lets look at the May 17th reboot. If I run just the command last, it will output
the entire log, this is relatively short and easier to follow than most logs.
[adminuser@AFS-SL-OSCAD01 ~]$ last
mteng pts/1 xxx.xxx.xxx.xxx Thu May 17 23:05 – 23:37 (00:32)
$USER pts/0 xxx.xxx.xxx.xxx Thu May 17 13:43 – 13:45 (00:01)
reboot system boot 2.6.32-220.7.1.e Thu May 17 13:43 – 05:03 (76+15:19)
$USER pts/1 xxx.xxx.xxx.xxx Thu May 17 13:41 – down (00:00)
I trimmed to only show the data around the 17th. The lines bolded are tied to the
reboot, in this case the user $USER rebooted the server. The – down typically
indicates a reboot by that user, you can also see the user logged in from the ip of
xxx.xxx.xxx.xxx. You can use that to identify the person that did the reboot. To confirm,
you can check the users bash history and see what command they typed in.
To access the users bash history, you cd /home/$USER (or what the username is) and look for .bash_history:
[root@AFS-SL-OSCAD01 adminuser]# cd /home/$USER/
[root@AFS-SL-OSCAD01 $USER]# ls -al
total 298096
drwxr-xr-x. 16 $USER $USER 4096 Jul 26 19:15 .
drwxr-xr-x. 19 root root 4096 Apr 11 16:06 ..
dr-x——. 2 $USER $USER 4096 Jan 18 2012 akamai
drwxrwxr-x. 2 $USER $USER 4096 Mar 13 13:47 akamai-test
drwxrwxr-x. 2 $USER $USER 4096 Jul 10 11:42 akamai-test-backup
drwxrwxr-x. 6 $USER $USER 4096 Jul 26 19:19 backup
-rw——-. 1 $USER $USER 24865 Aug 1 18:26 .bash_history
-rw-r–r–. 1 $USER $USER 18 Dec 20 2011 .bash_logout
-rw-r–r–. 1 $USER $USER 253 Dec 20 2011 .bash_profile
As you can see the log has some size to it, you can open the file on a page by
page basis by doing less .bash_history or you can cat .bash_history | grep boot
to just get anything with the word boot out of the log:
[root@AFS-SL-OSCAD01 $USER]# cat .bash_history | grep boot
sudo reboot
You can see the user rebooted the server using the reboot command.
However, if the server was restarted as a result of a crash you will see the following
in the last log:
$USER pts/1 xxx.xxx.xxx.xxx Thu May 17 13:41 – crash (00:00)
If you see that, you will need to go through the messages file. You will need to look
at the time before the restart to identify any possible errors in the OS layer that could
indicate the cause. The best method is to run less /var/log/messages to display page by page. You will want to look for any errors in the logs leading up to the reboot time. Below are some grep commands you can try that may shorten the search:
cat messages | grep error
cat messages | grep panic
cat messages | grep fail
These can all output possible causes to the issue. Common causes are tied to out of memory
issues from java due to a memory leak caused by the code.
If the server does not appear to have rebooted checking the last log, you can confirm by
running the uptime command. This will tell you the time the server has been up since the
last restart:
[root@AFS-SL-OSCAD01 $USER]# uptime
05:37:51 up 76 days, 15:54, 1 user, load average: 0.00, 0.00, 0.00
Check the last log to verify if it was a crash or manual restart.
If it was manually restarted, take the appropriate action.
I hope this helps you to diagnose a crash in linux. Let me know if it works for you.