|
Shouldn't Happen to a Dog There is a saying in Washington about Washington: If you want a friend, get a dog. Ha! We system managers should be so lucky. We cant even be our own best friend. Its sad but true: we system managers wont cut ourselves any slack. We repeatedly put ourselves in jeopardy, often making the same mistakes time after time. We even break all the rules we impose on others. Dont believe me? See if you recognize any of these examples. 1. Hand crafted system management Ah yes, the good old days. Peace, love and tear gas (I never inhaled). But heres a news flash, sunshine: for system managers, the 60s are dead. Predictable, repeatable tasks can and should be automated. If you can script it, you can schedule it. And if you can schedule it, you can automate it. So what are you waiting for? Do you like (take your pick): streaming jobs by hand; adjusting fences and priorities by hand; reading $STDLISTs; staring at the console waiting for that one important message? For this you went to college? And yet, we (or our management) come up with lots of lame excuses for running a stone-age operation. Cant afford the automation products, dont trust automation, cant trap every error, blah blah blah. Those excuses may fly when youre small, but suddenly you have more systems, bigger systems and manual management turns your shop into burn-out central. Now theres turnover costs, downtime costs, opportunity costs. Oh, and by the way, its much more expensive to implement automated management in a large, busy environment than it is to grow automated management from a smaller environment. Perhaps some of us are just adrenaline junkies, or we fear not being needed. Get over it and automate already. 2. The disappearing act A close personal friend of mine okay, it was me once made a change to Security/3000s SECURCON file, then left for an all-day meeting (SIG-SYSMAN) about 40 miles away. Guess what? None of the application users could log on after my change. My pager almost vibrated off my belt from that one. And it made for some interesting meetings when I got back. I have seen lots of cases where a system manager made a configuration change, installed a patch, or fussed with SYSSTART or UDCs, then immediately went home. Big mistake. If youre lucky, you live near your data center and can zip right back to repair the carnage that was discovered right away. If youre not lucky, first you dont discover your mistake until the worst possible moment say, around the heaviest usage period the next day and then youre forced to take the system down to fix the problem. Ouch. 3. A lack of planning on my part does constitute an emergency on your part A variation on No. 1. We are the eternal optimists. No matter how invasive the procedure, everything will work out perfectly, right? How many PowerPatches must we install before we realize we must leave adequate time for testing the patched system and perhaps back that sucka out? No really, this time HP (or your favorite vendor) has learned from past mistakes and has a bullet-proof update. No need to leave a cushion for collateral damage. Right. Every decent system administration book offers the same advice: Dont do anything you cant undo. Make a backup copy of whatever youre changing. Keep track of the steps you followed. Be prepared to back out whatever youre doing. Because that contingency time can inflate your update schedule by hours, its unlikely you can safely make a system change at any time other than weekends or holidays. So what do you call an HP 3000 system manager who insists on doing system maintenance in the middle of the work week? An MCSE. 4. Ive got a secret You make changes but dont tell anyone about them. Lets be charitable and say your changes worked as planned. Unfortunately, nobody knew you were going to make the change. I have seen a change as innocuous as modifying the system prompt have unintended consequences (Reflection scripts looked for the old prompt and now wouldnt work). The term system implies interrelationships. Anything we do has a ripple effect. When we dont tell others that were about to make a change they wouldnt let me do it if I told them! we dont do ourselves any favors. I would love to hear other war stories under this category (hint, hint). 5. Trust no one This probably explains all the peripherals youve bought that dont work with your HP 3000. But isnt the HP 3000 the most open system in the universe? A disk drive is a disk drive, right? The vendor told me the printer would work (and it costs much less than that HP printer). We do love our work, dont we? And we do get excited by all the possibilities of the technology. But sometimes most times? when the opportunity looks too good to be true, it is. And what a hassle it is when were stuck with a device, bought and paid for, that we must get to work with our system. Now. Because were out of space. Because the CFO doesnt like spending $25K for a big paperweight. Another aspect of this issue arises with replacement parts. No names please, but I have seen systems with non-certified disk drives. Sure they work until theres a power failure. The customer didnt know they had this exposure because their maintenance company didnt think it was worth mentioning. Do your homework, and watch out for little green men with maintenance kits. And last, but not least, is taking expert information at face value. My first experience on the HP rack (running a Series 70) was with an SE who told me how to shortcut an OS update. Sounded good, I could use the extra time because I was updating on a Wednesday night (see No. 3). Before I knew it, I was staring at this message on the console: Volume table destroyed, must reload. After that, I dropped SE support, figuring I was quite capable of destroying my system without high priced assistance. These days, a reasonableness check should be applied to any advice from the HP Response Center. If you dont feel confident about what youve been told, post to the 3000-L Internet newsgroup (comp.sys.hp.mpe) and see what your peers have to say. 6. The odd couple For every system management Oscar Madison, leaving old files around to clog up and slow down his system or creating his own collection of foo, temp, K or Q files, there is a Felix Unger counterpart out there, obsessively tidying up. Both personality types have been known to shoot themselves in the foot. The slobs make their lives miserable by never archiving files, which eventually bites them when they run out of space and the backup takes ever-longer. They also suffer from having multiple versions of all kinds of things on disk, running the risk of executing the wrong version or accessing the wrong file. And of course there are performance and security penalties for a messy system. But the fastidious system manager also has issues. For one thing, being too diligent about cleaning up can result in missing files. Here is a case where automation can be a negative. Jobs that run every so often, archiving files that havent been accessed for a certain amount of time, can wind up archiving a file just before you need it. Or, in my case, I once archived a file in the VESOFT account that hadnt been accessed in years, only to discover it was some kind of special file that had to be there, even though it was never accessed (go figure). Yes, its still good to be conscientious about keeping your system tidy. Just dont overdo it. You deserve a break today If we can just step back and catch ourselves in dysfunctional behavior, we can start giving ourselves a break. We should not need to carry a pager, cell phone and laptop with us on vacation for those brave enough to take a vacation, that is. We should not spend most of our time at HP World on the phone explaining how to recover our systems or where critical files are hidden. We should not expect to get raises when we spend so much of our professional time performing tasks that an entry-level employee can handle. By cleaning up our acts, we can stop reacting to self-inflicted busy work, which will free up time for more important tasks like reading the NewsWire.
Copyright The 3000 NewsWire. All rights reserved. |