I am looking for opinions on server reboots.
I have spoken with several people about the subject and have come to the concluion that rebooting a server on a periodic basis is just about a must in order to keep the system clean. By clean, I mean fsck, removing zombies and generally improving system performance.
Please tell me what you think about this and if you agree, what timeframe would be best.
Thanks,
Craig
Note: If you are the author of this question and wish to assign points to any of the answers, please login first.For more information on assigning points ,click
here
Yes, it is something you should do.
Time....well I'd like it if I could do it once a week on every box....I'm amazed if I can reboot once every couple weeks...and I have learned to wait patiently (until something fails that is...) and try to reboot at least once a month.
I can't say I reboot on any periodic basis. My shop is 7x24 LIMITED downtime, etc. and downtime is viewed with anathema.
Depending on the applications running, I may periodically search-and-destroy orphan processes. I had a case recently where I could not close a destroy a socket (on 10.20) and therefore could not reuse its port number until a reboot. Rather than reboot it was easier to assign the application another port.
While I do look for opportunities to reboot (for reasons of patch application, etc.), I do not schedule periodic reboots.
Most HP internal projects have a regular maintenance period of once per quarter. So every 3 months we would be allowed to have a server for a weekend to install say a new quarterly GR bundle, just do a reboot to clean things up, or other maintenance.
Ive seen servers which have been cloned - thus very tightly controlled - which have been up and running fine for over 3 years (10.20) so you certainly dont NEED to reboot regularly. But as HP release patch bundles every quarter rebooting every quarter to install them (which you should doing to keep uptodate) seems like a logical timeframe.
I have one weekend a month that I get the machines and do maintenance (patches, db work, etc.) and do at least one reboot. I have gone 2 months without a reboot, but I prefer to reboot the machines once a month whether they need it or not.
Im would be happy if I could do a monthly maintenance (...) I can still dream, the only cases I have are:
System panic (very very rare)
New kernel
Patches
But then when I say it needs rebooting after the patch install, usually I get for reply: install the patch ONLY if we are in trouble
So as you can see, if you can decide, Id say once a month with full diags behind could be a good thing, once a week just a simple reboot would initialize all the logs again, cleanup...
I reboot my servers once a month on a set schedule. I have negotiated with all my users for a half hour on a particular day (usually Sunday morning). I have 30 servers and stagger the reboots over the month -- 1 server reboots at 1:30 the 1st Sunday of the month; another server at 2:30 the 1st Sunday; another at 1:30 the 2nd Sunday, etc. It works out pretty well. Depending on your application, I find once a month is fine.
Server reboots are something we all seem to like to do and as said before it gives the system a chance to clean itself up.
Being in a 24/7 environment means that I get appox 2 hours/month to patch, fix and reboot, although last year one of my production servers was up and live for over 11 months.
We then had the Spanish inquisition as to why I wanted to reboot it.
Most Sys Admins would love to have more time offline but we do not always get it.
Providing you are watching what is going on and controlling /trapping /killing where necessary.
Also if you can use one machine to monitor another then do not panic.
Hey, we's all love to have regular downtime on every server. It makes our jobs easier. It also prevents our users from working.
I have negotiated a monthly reboot & PMC window on most servers in my present environment. But if processing needs overide, then the reboot doesn't happen. Some problems require a reboot to correct, many others are easier to correct with a reboot, but on a well-managed box you should not *need* to reboot frequently.
Of course, this assumes that your applications programmers are capable of writing highly-specialized code like "free()". Bad code ==> downtime. Always. There's only so much an admin can do.
I believe if a systems uptime is high the application and OS are functioning pretty good, and it also depends on the service level agreement i.e. 24x7 etc.. therfore we should have relevant scripts running on the servers to monitor for rogue procesess, disk usages etc.. , or unless we are applying a patch which requires a reboot and if we have internal predict modem which is locked and to unlock we have to reboot the server .
If we do have a problem on the server the easiest option is to reboot , but I believe we must avoid this unless necessary and find out the source of the problem ( how many times do we reboot ,)
We must monitor the system at all times and be proactive .
A well managed (read: a busy cron system) HP-UX system can run for years without a reboot. I have run a mail server with 500 users for about 7 years, the longest time between reboots was 3 years.
Zombies are caused by bad programming so when they occur, I look to the owner of the processes to fix their code. Log files always have to be trimmed, runaway procsses must be monitored for CPU time and/or RAM, and so on.
PC users have created more problems for Unix sysadmins because they tend to crash and leave orphaned processes (especially Xwindows) which consume CPU time *AND* LAN bandwidth. Unfortunately, there are no unique characteristics that identify an orphaned Xwindow process so I look for CPU accumulation over several time periods and filter out root and other special user processes.
Essentially, I look for repetitive tasks that require intervention and write a cron script/program to take care of the problem. 24x7 is not that unusual but most of the machines have very limited tasks and logins. That makes sysadmin easier too.
I see that I am one of the fortunate few that has the luxury of a two hour window in the evening on a daily basis for system maint.
I have to give credit to my DBA for that!
As we are not a 24x7 shop (yet) I have decided to reboot the system monthly on the first Sunday. We run Informix and Peoplesoft and this combination can take its toll on the system so the periodic reboots should make my life a bit easier.
Man did you get some varied responces to that question, or what? I just wanted to add two cents and give my spin on the subject. I boot my two K370s weekly on Saturday at 18:00 with a cron command. I have a text pager the I send a message to through a mailx in the script when the machine comes down and then back up. This is also handy if the machine should boot after hours for some reason. The cron job looks like
1 18 * * 6 /usr/sbin/shutdown.sh >> /dev/null 2>&1
one minute after, 6 PM, all days, all month days, Saturday, the pipe kills the output.
The script shutdown.sh contains
/usr/sbin/shutdown -r -y now
/usr/bin/mailx -s "ServerX is comming down" pager#@something.somewhere.com
< /dev/null
run the shutdown command with the -r, reboot option -y, to continue and now! Run a script during boot to send the message:
/usr/bin/mailx -s "ServerX is up" pager#@something.somewhere.com
< /dev/null
This could be an email address as well. Hope this helps!