Good administrators know their systems require regular maintenance intervals in order to avoid costly or embarrassing downtime. But what’s the worst case scenario if the maintenance doesn’t get done – might people die? In the following case, it almost happened…
On 9/15/04, the Associated Press reported that a critical FAA radio facility providing control services for California, Arizona, and Nevada airspace shut down. The result was grounded planes and inconvenienced travelers all over the West Coast but perhaps more frightening, several near collisions.
“Where were the backup systems?” you might rightly ask? According to the FAA they weren’t configured properly to take over in the event of a primary system failure. The primary system failed because a 30-day scheduled maintenance task wasn’t performed and the system is configured to shut down automatically if the maintenance interval passes!
Granted, if the systems most of us manage fail no one is likely to suffer physical harm. However, it could gravely affect thousands and perhaps endanger your job.
Ask yourself:
1. Do your primary systems have backups? Have you tested them – ever?
2. Do you have a policy that dictates how often they are tested? Do you log these tests?
3. Do you use daily/weekly/monthly run sheets to ensure continuity?
4. How often do you patch your systems, review and rotate system and security logs?
5. Do you keep system change logs?
6. Have you incorporated remote system health monitoring with proactive alerts into your network planning?
7. Where are your areas of weakness, and have you planned to address them?
Every network, no matter how small, has room for improvement. Don’t have the budget for some of these tasks? Invest in your own career and learn some basic scripting in .WSH, Python or PERL and build the tools you need, or crawl the open source community for help. Regularly testing of backup systems, maintenance and restore plans has become a primary focus at WESCO Net.
Remember, there’s always room for improvement, and it could be argued the journey never ends. But it’s your responsibility as a network administrator to get the ball rolling. Build the plan, then execute on it. One step in the right direction is one step further than you are right now, and your company and career will thank you.