Announcement

Collapse
No announcement yet.

Everything went down (long)

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Everything went down (long)

    I actually submitted this big wall of text to TDWTF, but I'll post it here too for you.

    I just had an awesome day at work. The sarcastic, grey-haired kind of awesome.

    A little background: We are what's called a "Family Entertainment Center". Basically we have attractions like go-karts, laser tag, etc. built around a small kitchen and a gameroom (ticket machines, usually some arcade games). Our entire facility is built around a system, I'll call it Impale. At the heart of this system is a 20-some port 10/100 hub and an excessive-overkill Dell rackmount enterprise server that looks very out of place on its homemade wooden rack. This server is the backend for just about everything in the facility, but most importantly, it does all the heavy lifting in POS transactions and directly runs credit cards using PC Charge. Which needs Internet access. As for the games and attractions, they're all fitted with cardswipe units. You get a card, and you swipe it at a machine, it debits your card and starts the game. This all depends on a central infrastructure I'll get into shortly.

    The server has four NICs, only two of which are actually used. one goes to our "office" network, where our print server (read: $200-new eMachine with a USB inkjet attached to it) and business-y computers reside (mostly employee-owned laptops on a private Wi-Fi network). The other NIC goes out to the Impale network, which from what I understand, is supposed to be sealed from the office network, to protect the machines therein (so they don't have to deal with Windows updates, I suspect). The problem is, the server is bridging the two -- the office network most definitely can see the Impale network, and vice versa. This becomes important later.

    So the topology goes something like this:
    The Internet line comes into the Main Router.
    Main Router is connected to a simple 8-port switch, and distributed between the switch and the router's ports are: 2 office machines, the security DVR, the Impale server, and Public Router
    Public Router is just for our customers to access the Internet over our connection via Wi-Fi. That's all it does. There's nothing connected to it other than the line to Main Router.
    And of course, on the other side of Impale server is the Impale network, a pretty grand and mysterious contraption in and of itself. It's enough to say that Impale server and its lackey, known as Game Comm, run the whole show. Without them, NOTHING happens.

    We had been planning a move to ISP-provided VoIP phones, combined with replacing our schizophrenic RF mess which we used to use for Internet access with a dedicated T1 right to the building. (We're far enough out of town to not even get cable TV, and a prohibitive distance from any telephone COs. It's dedicated lines or ugly wireless kludges. Or satellite.) The installation and changeover was to happen today. The ISP's installers were responsible for the phones, but as for our Internet access, they were to give me a single cat5 running from the T1 cabinet to our main router, and I was to do the rest, being our one-and-only IT guy. Simple, just swap one cable out and swap in another, then change the router settings, right?

    Well, as it turns out, Main Router wasn't ours. It belonged to the ISP, and was part of their wireless package -- that is, now that we weren't using their radiowaves and instead had a direct line, we had to give it back and use our own router. One of my bosses, I call him Young Boss, decided we'd just pick up another router at Best Buy. I didn't like their offerings, but I did like the router we were using for the public wifi, so I suggested buying a cheap router for the customers, and using the nice one we have for ourselves. We went with that.

    So I get back, unhook the public router, get it configured on the new network. Everything's going swimmingly; it took the static IP, no problem, internet access is there, speedtest.net shows typical T1 speeds (about 1.4mbit after overhead, in both directions). So, I make the switch, pulling everything out of the ISP's router and plugging it into ours. Everything looks right, though there's now a lot of blinking lights that weren't blinking before. In particular, the big Impale switch is abuzz with activity, every light flickering menacingly. Bizarre, but I'll look into it later, I thought. Bad move.

    As I'm going over to the server to make sure that it can see out, one of the managers, I'll call her Preggers, walks in. "Our registers are frozen! What did you do?" Alarm bells. I check out the server, it seems fine. I go out to see what the registers are doing, and sure enough the POS program is timing out, giving errors, freezing, anything but be useful. Not good, we now don't have registers Period.

    I go back into the office and immediately call Impale support. Preggers is sitting at the server console, trying to do something, and she complains that it's unresponsive. Like, completely unresponsive. I stick to my call, relay what's going on with the registers, and then it happens.

    The Impale server, normally almost silent, kicked its fans into full gear. You might expect that kind of clatter in a datacenter, but in a quiet office like hours... it's like an alarm going off. About the right decibel level too. The server is now sitting on a frozen screen, fans screaming. the front LCD that normally shows off its service tag has now gone BLANK. I'm in a full-on panic: Our pretty $5000 server, the one piece of kit most critical to the business, has just gone completely apeshit, and I haven't a damned clue why. My first impulse was to push the power button, hoping a good shutdown jolt would snap it out of its haze, and it does manage to shut down more-or-less cleanly.

    It gets worse, though. When the server went AWOL, and later shut down, Game Comm lost its shit. All five zones were completely bone-up stopped. Every cardswipe across the entire building shut down. Our entire business, from the top all the way down, ground to a complete halt, in two minutes flat.

    Desperate to get some semblance of functionality back into the facility, I unhooked Impale server from the office network completely and did my best to restart everything that went down. Impale server hesitated, but it came up and got back to work. Game Comm got goofed up enough that there was a process in the way that wouldn't die, so it got the full reboot treatment too... and booted right up like nothing ever happened. And that's when I noticed it: the lights on the Impale hub had calmed down. Lots of steady lights, little activity. The link began to form in my head... I look over to the office routers.

    Sure enough, the activity indicators are going absolutely berzerk. Something over there is flooding the network with... something... I gotta find out what it is. I start by inventorying everything that's connected to the router and switch, and that's when I notice it.

    There's two cables going from the router to the switch!

    Yes, such an elementary and trivial mistake brought our entire network to its knees! I have no idea how, but it was the culprit. After pruning the extra link and putting everything back together, the setup went perfectly smoothly. Impale server saw outside, PC Charge started flowing again. Impale support dialed into the server, did some diagnostic checks, spoke briefly with Dell and told us the server should be undamaged. Thank goodness for that.

    That was one well-deserved lunchbreak. I took a leisurely stroll up the street to a family oriented eatery, they have a really delicious pizza buffet.

  • #2
    I'd have had a HEART ATTACK. *offers drinks after work* Holy hell. Glad the system isn't spazzing anymore, but good grief I'd have panicked much worse than you did. LOL

    Comment

    Working...
    X