Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

A sysadmin's top ten tales of woe
The Register ^ | 14 June 2011 | Trevor Pott

Posted on 06/16/2011 11:45:11 AM PDT by ShadowAce

click here to read article


Navigation: use the links below to view more comments.
first previous 1-2021-4041-57 next last
To: kevkrom
An unrelated story had do do with the mechanical folks trying to figure out how much the software weighed...

LOL! That's just awesome.

21 posted on 06/16/2011 12:38:31 PM PDT by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 20 | View Replies]

To: proxy_user; roamer_1; ShadowAce
The first time we ran the program in production, it overwrote the root filesystem, making our powerful Sun box with 28 processors and 28 gigs of memory worthless.

Was the application running under an account with root privileges, or was the root file system open to accounts with non-root privileges?

We immediately went into disaster recovery mode, and brought up production on the UAT server. Of course, the first thing they did was run the same program, which wiped out that machine as well.

ROFL

22 posted on 06/16/2011 12:48:02 PM PDT by rabscuttle385 (Live Free or Die)
[ Post Reply | Private Reply | To 6 | View Replies]

To: proxy_user

What you didn’t get the memo from management? The software testing budget has been cut by 50%, blame the programmers.


23 posted on 06/16/2011 12:58:32 PM PDT by ImJustAnotherOkie (zerogottago)
[ Post Reply | Private Reply | To 9 | View Replies]

To: ShadowAce
"We just acquired an implementation company that also did installs of our largest competitor's software....

...

"...what do you mean the acquired company's guys were using their inside knowledge to access our competitor's confidential information??"

(actually, this is now more of a tale of woe for Legal)

24 posted on 06/16/2011 1:00:34 PM PDT by martin_fierro (< |:)~)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce
I just makes my stomach hurt and I found myself reaching for asprin, rolaids, and whiskey.

/johnny

25 posted on 06/16/2011 1:09:53 PM PDT by JRandomFreeper (Gone Galt)
[ Post Reply | Private Reply | To 4 | View Replies]

To: ShadowAce

A buddy and his programming staff were given two weeks notice after meeting a deadline with code the met the specification. They had been expecting a ‘atta boy’ or a ‘congrats’ not a pink slip. Management takes code to customers who love the new software but ‘could you make it do X & Z too!?’

Management goes to my buddy with the request and he tells them it would take little effort to make those enhancements but all the new programmers they would have to hire would a few months to get up to speed on the code before they could tackle the changes. ‘What about you and your staff?’ ‘Sorry, but we all have new jobs and all of us leave tomorrow. Why did you get rid of us all?’

They confessed that they wanted to get rid of all those expensive programmers to save money and look smart to their managers.


26 posted on 06/16/2011 1:29:31 PM PDT by pikachu (After Monday and Tuesday, even the calender goes W T F !)
[ Post Reply | Private Reply | To 1 | View Replies]

To: rabscuttle385

No, it was a bug in Solaris. It just produced a core dump bigger than 2 gigs when it failed, so Solaris interpreted the size as a negative number and wrote it backwards in the filesystem.

The Sun guys said oh, you should have installed the OS patch for that.


27 posted on 06/16/2011 1:37:29 PM PDT by proxy_user
[ Post Reply | Private Reply | To 22 | View Replies]

To: ShadowAce

bookmark


28 posted on 06/16/2011 1:53:21 PM PDT by FourPeas ("Maladjusted and wigging out is no way to go through life, son." -hg)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

I heard of a case back in the bad old days of removable platter drives where the admin got a call at home in the middle of the night to inform him that the primary copy of their data had failed. Not too concerned, he asked if they had mounted the backup copy, and was told that they had done so, only to find out that the problem was in the drive, when it destroyed the backup. Don’t know if they also had tape for a second layer of backup.


29 posted on 06/16/2011 2:00:46 PM PDT by Still Thinking (Freedom is NOT a loophole!)
[ Post Reply | Private Reply | To 1 | View Replies]

To: kevkrom
An unrelated story had do do with the mechanical folks trying to figure out how much the software weighed...

That one smells like an urban legend to me!

30 posted on 06/16/2011 2:04:52 PM PDT by Still Thinking (Freedom is NOT a loophole!)
[ Post Reply | Private Reply | To 20 | View Replies]

To: ShadowAce

Electrical company comes in to swing over facility power from an aging UPS to a new 50kW unit, big monster. This is in the middle of June in west central Florida a few years ago, so inevitably, the afternoon thunderstorms start to pop up around 6 PM and last sometimes late into the night, depending on the atmosphere.

Well apparently the electrical company tech didn’t want to be working on an indoor UPS, unplugged, at 10 PM during a lightning storm, and one Hell of a lightning storm it was! I lost power at my home, and I’m 25 miles from the DC. My pager starts going off at 11 PM, and I call in to our DR incident command center to find out our entire DC is black.

I rush over in the pouring rain to find out that the electrical “engineer” left the neutral and the ground unconnected on the new UPS and when a lightning strike hit directly to our ground looped rod, millions of volts of electricity streamed through the live wire, blew up 51 batteries in UPSes chained to the new one, and melted every single transformer in the building.

Needless to say we stopped doing business with that company. It took us 28 hours to bring the entire DC back online and found out that not only were our tape backups not functioning due to magnetic interference during the storm, many of the servers deployed for our finance department were on RAID0 and over 8 years old (talking dishwasher Compaq 5500s here); you know the rest of the story.

Good list.


31 posted on 06/16/2011 2:08:36 PM PDT by rarestia (It's time to water the Tree of Liberty.)
[ Post Reply | Private Reply | To 4 | View Replies]

To: ShadowAce

Reminds me of the time our data center overheated at a company I worked for. Everything shut down now. No warning, nothing.

Of course, the datacenter overheated after hours. Yours truly was on call.

The company was too cheap to install temperature sensors and alarms.

After the data center was down for over 24 hours, the company decided to spend a few dollars on appropriate monitoring and alarms.

Not only that, but the company had also fired the company that maintained and monitored our mainframe shortly before the disaster.

$200 million a year company and they wouldn’t spend a few thousand to maintain data integrity.

I left as soon as I could. I got tired of dealing with that kind of crap.


32 posted on 06/16/2011 2:12:32 PM PDT by stylin_geek (Never underestimate the power of government to distort markets)
[ Post Reply | Private Reply | To 12 | View Replies]

To: Jack Hammer

Most of these boil down to the software equivalent of a spare tire nobody’s checked the air level on since ever. Backup systems are nice, but you need to make sure they actually work. The other 10% are about making sure that whatever killed the primary doesn’t daisy chain to the backup, basically don’t change tires in the middle of the patch of stuff that popped the first one.


33 posted on 06/16/2011 2:12:47 PM PDT by discostu (Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn)
[ Post Reply | Private Reply | To 3 | View Replies]

To: ShadowAce

Forgot to add, I warned the company there were issues with various system, but they wouldn’t listen.

I got tired of getting blamed for stuff I’d warned them about.


34 posted on 06/16/2011 2:18:15 PM PDT by stylin_geek (Never underestimate the power of government to distort markets)
[ Post Reply | Private Reply | To 12 | View Replies]

To: ShadowAce
The cleaner unplugged it

Pay attention to log files. More than once I have seen perfectly planned and executed offsite failovers felled because nobody realised the cleaner at the backup site was liable to unplug the servers, for example to charge an iPod. This is not an urban legend.

Then there was the manager of the building containing the mission-critical mainframe processing real time test data. He conducted a tour of his facility for some visitors and at one point in the tour he pointed out the main power switch to the mainframe - and cycled the switch off and back on!!! Scratch one expensive test, and scratch (quite literally) all the big, expensive hard disks supporting the operation.

Sigh . . .

On a smaller scale, there was the large computer which would go crazy every now and then.

Who knew that the steel wool pad on the floor cleaning machine would put iron filings in the air, or that they would randomly short out whatever printed circuit they settled on? Certainly not the janitor!


35 posted on 06/16/2011 2:53:01 PM PDT by conservatism_IS_compassion (DRAFT PALIN)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Hurricane Katrina hit landfall directly over our manufacturing plant along the Gulf Coast in Mississippi. The computer center there was flooded to almost ceiling level. Our Dell storage array network with all the local servers, disk drives, etc. was completely submerged in a stinking, muddy mess.

Come to find out, our fancy ‘distributed’ document management system is a combination of central and local storage. Whenever someone ‘local’ would access a blueprint file in edit mode, the system would move the file from ‘central’ storage to ‘local’ storage - this to improve the speed of accessing the file.

The ‘local’ files were in a Raid5 configuration with weekly full and nightly incremental disk-to-disk backup... plus tape backups stored in the datacenter - now all ruined.

The off-site, month-old backup was in a local bank deposit box. But the bank did not open for about a month after Katrina. The bank-located backups recovered just fine to a sister plant located in TN. But within the lost month, some of the company’s blueprint files had been moved to local storage. In all, a few dozen critical blueprints from across the company existed only on the muck encrusted data disks.

Luckily, a company specializing in recovering data from damaged disks were able to retrieve all the lost engineering files. But not after several weeks and over $100k spent...


36 posted on 06/16/2011 4:11:39 PM PDT by cheee (Good, Fast, Cheap ... you can only pick two...)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Between stupid users, and self inflicted pain, my horror stories are so numerous, I just don’t know where to begin. Lol


37 posted on 06/16/2011 4:17:48 PM PDT by KoRn (Department of Homeland Security, Certified - "Right Wing Extremist")
[ Post Reply | Private Reply | To 1 | View Replies]

To: Lonesome in Massachussets; Matchett-PI; MinuteGal; mcmuffin; Bob Ireland
I feel your pain, pal.
I owned nine computer systems in my Corporation before those PEOPLE talked me into buying one of these evil things ...

It's been a long road, bud, but I made some good friends

38 posted on 06/16/2011 4:44:40 PM PDT by gonzo ( Buy more ammo, dammit! You should already have the firearms .................. FRegards)
[ Post Reply | Private Reply | To 5 | View Replies]

To: proxy_user
However, if the database becomes corrupt, and you are using physical mirroring, you now have two copies of a corrupt database in two data centers.

That depends on the software and array you are using. The company I work for uses proprietary protocols for several mirroring software systems that prevent that sort of thing. Well, not so much prevent as make it easily recoverable to point-in-time when the corruption happens.

39 posted on 06/16/2011 7:05:00 PM PDT by Bloody Sam Roberts (If you think it's time to bury your weapons.....it's time to dig them up.)
[ Post Reply | Private Reply | To 9 | View Replies]

To: taxcontrol
RTO and RPO were both less than 2 hrs.

The company I work for has a customer who doesn't care what the cost is to protect the data for their enterprise. Their RTO and RPO are both zero. They demand it. They'll want operational overview but the details aren't relevant nor is the cost. Whatever it takes to accomplish that, do it. Any number above zero costs them millions per minute.

This account has some very stressed but very wealthy Account Sales Reps.

40 posted on 06/16/2011 7:14:16 PM PDT by Bloody Sam Roberts (If you think it's time to bury your weapons.....it's time to dig them up.)
[ Post Reply | Private Reply | To 16 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-2021-4041-57 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson