Monday was my first day back from a week off, so I expected to be busy and I was but not a big deal. I spent the day handling all the little crap that normally goes on and at the end of the day uttered the infamous words of doom, "Wow, that wasn't bad, I expected much worse". Oops.
I get home around 7 p.m. and start seeing emails that all of our servers are dropping offline. This isn't good as it means we either lost power for a long time and all of them shut down or we lost internet. I was hoping and praying for the internet but I was able to ping the router so I knew we were screwed. There wasn't much I could do at that point so I just hung out and waited for notification of power restoration.
Around 10 p.m. I got word that it was back on so I headed in to simply (I thought ) boot everything back up and go back home. I get there and start booting everything back up. I start with both domain controllers and while they start to boot I move on to the others. Right off I notice our content filter won't boot which is bad but easy to bypass if it comes down to it. Then the file server comes up with 2 drives bad. I quickly realize this wasn't a simple power outage.
I go back to the DC's and our primary comes up with several errors. Oh shit. I log in and move to the backup. It won't come up at all. OH SHIT. Go back to the primary and start looking around, AD is still there but wiped completely out. DNS is wiped, same with DHCP. Now I realize just how screwed I am, but want to know what on earth happened. After looking through the event viewer it looks like power was restored and everything started to power on by itself (they are supposed to but we still go up after an outage just to make sure nothing screwed up booting, thus my trip). Halfway through booting we took a direct lightning strike to the building causing the UPS's to shut completely off rather than risk something getting through. Normally that would be good, but not in the middle of a boot. So, it turns out that I have lost both DC's, content filter, one of 2 terminal servers, and a days worth of edits on our file server.
By now it is 1 a.m. and I have to have a DC back online by 7 a.m. so I get to work. I try a system state restore from backup but the management server and tape library are farked as well. I happened to have a spare virtual server already built without anything installed so I decide to go with that as a temporary fix. I add the roles and promote it. Seems fine, except everything is taking longer than it should. Start replication and start building DHCP. It won't activate.
I spend an hour trying everything I know and everything my google-fu can turn up. Nothing. I decided to forget it for a minute and check out replication. 5 errors, and the service has stopped. At this point I began deciding whether this job was really worth the amount of work I had ahead of me, and decided it was so I kept on. I finally get enough replication that it will sort of work and DHCP activates but for some reason keeps deciding to deactivate. 7 a.m. rolls around and I don't have a true domain controller because it won't replicate, DHCP doesn't work right, and all clients have only the primary and backup DC's for DNS so they have no network access. They can get to our DR site because it is in the router so they can work, but no one has used it in three years.
CIO shows up and I am expecting an 'attaboy for working all farking night but rather get a "WTF, WHY ISN'T THIS UP! THIS IS A NIGHTMARE! WE ARE F*&%ED! WHY DIDN"T YOU GET THIS WORKING!" The thoughts that ran through my head are not appropriate, so I will not share them but needless to say I wasn't happy. I worked for two more hours with everyone and their dog stopping by to tell me that the internet was down. Sorry you can't get to facebook right now, f*&% off.
Boss tells me to go home and sleep and come back asap. I go home and sleep for two hours then come back. I start by showing every person how to log in to the DR site, and then how to work from there (click the friggin icon like you do on your own computer moron). Around 5 p.m. I am able to actually start working on our shitstorm of data corruption.
Over the next three days I had to rebuild our terminal server, restore the file server, restore the management server for backups (really fun when you are trying to restore from backup the server that manages the backups) and build a secondary DC that we turned into primary because the other refused to work. By Friday I had everything online, held together with duct tape and hopeful thinking. This week has blown, to say the least and I am not done with it all yet. Ugh. I need several drinks.
I get home around 7 p.m. and start seeing emails that all of our servers are dropping offline. This isn't good as it means we either lost power for a long time and all of them shut down or we lost internet. I was hoping and praying for the internet but I was able to ping the router so I knew we were screwed. There wasn't much I could do at that point so I just hung out and waited for notification of power restoration.
Around 10 p.m. I got word that it was back on so I headed in to simply (I thought ) boot everything back up and go back home. I get there and start booting everything back up. I start with both domain controllers and while they start to boot I move on to the others. Right off I notice our content filter won't boot which is bad but easy to bypass if it comes down to it. Then the file server comes up with 2 drives bad. I quickly realize this wasn't a simple power outage.
I go back to the DC's and our primary comes up with several errors. Oh shit. I log in and move to the backup. It won't come up at all. OH SHIT. Go back to the primary and start looking around, AD is still there but wiped completely out. DNS is wiped, same with DHCP. Now I realize just how screwed I am, but want to know what on earth happened. After looking through the event viewer it looks like power was restored and everything started to power on by itself (they are supposed to but we still go up after an outage just to make sure nothing screwed up booting, thus my trip). Halfway through booting we took a direct lightning strike to the building causing the UPS's to shut completely off rather than risk something getting through. Normally that would be good, but not in the middle of a boot. So, it turns out that I have lost both DC's, content filter, one of 2 terminal servers, and a days worth of edits on our file server.
By now it is 1 a.m. and I have to have a DC back online by 7 a.m. so I get to work. I try a system state restore from backup but the management server and tape library are farked as well. I happened to have a spare virtual server already built without anything installed so I decide to go with that as a temporary fix. I add the roles and promote it. Seems fine, except everything is taking longer than it should. Start replication and start building DHCP. It won't activate.
I spend an hour trying everything I know and everything my google-fu can turn up. Nothing. I decided to forget it for a minute and check out replication. 5 errors, and the service has stopped. At this point I began deciding whether this job was really worth the amount of work I had ahead of me, and decided it was so I kept on. I finally get enough replication that it will sort of work and DHCP activates but for some reason keeps deciding to deactivate. 7 a.m. rolls around and I don't have a true domain controller because it won't replicate, DHCP doesn't work right, and all clients have only the primary and backup DC's for DNS so they have no network access. They can get to our DR site because it is in the router so they can work, but no one has used it in three years.
CIO shows up and I am expecting an 'attaboy for working all farking night but rather get a "WTF, WHY ISN'T THIS UP! THIS IS A NIGHTMARE! WE ARE F*&%ED! WHY DIDN"T YOU GET THIS WORKING!" The thoughts that ran through my head are not appropriate, so I will not share them but needless to say I wasn't happy. I worked for two more hours with everyone and their dog stopping by to tell me that the internet was down. Sorry you can't get to facebook right now, f*&% off.
Boss tells me to go home and sleep and come back asap. I go home and sleep for two hours then come back. I start by showing every person how to log in to the DR site, and then how to work from there (click the friggin icon like you do on your own computer moron). Around 5 p.m. I am able to actually start working on our shitstorm of data corruption.
Over the next three days I had to rebuild our terminal server, restore the file server, restore the management server for backups (really fun when you are trying to restore from backup the server that manages the backups) and build a secondary DC that we turned into primary because the other refused to work. By Friday I had everything online, held together with duct tape and hopeful thinking. This week has blown, to say the least and I am not done with it all yet. Ugh. I need several drinks.
Comment