Announcement

**sms001** · 09-29-2013, 01:18 AM

Wow. Cliffhanger City! Don't know whether congrats are in order or not. Guess I'll wait for the other shoe to drop, although I'm hopeful, given that this is GWC, not MiM.

**sstabeler** · 09-29-2013, 01:11 PM

"If only I knew what |I was getting into" does not inspire confidence. I'm guessing at a minimum, you have clueless staff on the school districts' side of things.

**TheSHAD0W** · 09-29-2013, 02:09 PM

Hang on, hang on... Let me nuke some popcorn...

**RichS** · 09-29-2013, 03:06 PM

Part 2 - Where the network says "Hi!"

...Well, not so much as it gives me a wedgie and takes my lunch money (keeping with the school theme!

)

My first day starts out well - OTD takes me over to 'S' school to meet the office staff, including the superintendent. He's actually a great guy, doesn't stress over any problems, which is great for me.

With that, OTD directs me to what is my office, where he shows me what he knows from the previous tech person. Servers are all virtualized, VoIP phones and website are internal, proxy server for Internet, internal e-mail. For my first task, he has me go through and straighten things out in the office while he gets equipment ordered for a new wireless network.

The second day is when everything goes south. I arrive to find that the VoIP phones not working, and no network/Internet connection. I'm able to get onto the virtualization program, where it shows all servers as Unknown.

Go to the server room, and everything checks out physically...except for an amber blinking light on one of the server's hard drives.

I call OTD, because I've never worked with virtualization before, not like this. We go through and try a few things to get the network at least working, but aren't successful. We decide to meet with the superintendent, where I find a few details out:

1) The previous tech spent quite an amount of cash implementing this virtualization,
2) Which he actually didn't implement: he had a company come in and install, and
3) The 3 servers making up the virtualization work as a cluster.

A bit of explaining: the servers working as a cluster is like individuals working as a team; when they work together, they work well, but if one piece fails (especially with no fallback, as this cluster has...

) then the whole thing goes down.

Another note from above: the not being told about information that I need, that's going to be a running theme throughout these posts.

The decision is made: call the virtualization company, pay thousands more for 1 year support, and get the cluster up again. I'm able, with assistance from the company's tech support person, to get the cluster working and servers to be recognized, back up, and running. Luckily, I find from the virtualization company that the servers still had a few months left on their warranties; I contact the manufacturer to get the failing hard drive replaced, which comes a couple of days later.

These are hot-swappable (don't have to shut down the server to replace), so I take out the bad drive from the server and it's chassis, and put the replacement in the chassis and the server. I wait, and...the array isn't rebuilding.

I have to shutdown the server, which means with the cluster, I have to shut down the whole thing: go through the virtualization program and shut down as many virtual servers as posssible, then shut down the physical servers. I start the server with the replacement drive, and boot into the drive controller, which isn't showing the replacement drive.

Thankfully, I remove and reinstall the replacement drive, it's recognized, and rebuilds the array. Of course, when I get the cluster going, the servers show as Unknown again...

I did take notes, but there's a complication where I have to call the company's tech support again to get it going.

After all of this, I remember to revise my notes as to how to get the cluster going in case I get into this situation again (Hint: I'll be using it.

).

So, the first disaster averted, phones and network are down for a few hours, but it's at least not during the school year, right? Believe me, that'll change...

Part 3 coming as soon as I can!

**RichS** · 09-29-2013, 06:20 PM

First, an update on Part 2: I'd have to use those instructions 5 more times the last school year. 2 for power outages and one of our UPS systems not holding any juice, and 4 more hard drive failures. The hard drive failures cost us between 1-4 days of outage each. At least, with the 1 year support, the drives were covered by the virtualization company. No outages this year, knock on wood.

Part 3 - Terms of Employment

So, the network is back up and running, the office is straightened out. OTD has plans of getting laptops to replace the teacher's desktops; we can use those desktops to replace the older systems that are used in labs and classrooms. We're also implementing a new e-mail system - like other schools, we went with Google Apps for Education. This way, if the network went down, the teachers would still have access to e-mails and documents.

I'm working 8 hours/day, 5 days a week. For the first few weeks, I've heard no complaints. After almost a month, the OTD tells me that I have to cut my hours due to the contract.

I go to the superintendent, and it's confirmed.

As part of the tech contract between the schools, there's an addendum which dictates my hours, and how long and when I can work - sounds familiar, don't it? The situation I most did not want to get into again...and this affects me more. It's going to limit what I can do as far as implementing all of these changes.

Did I mention the superintendent is a great guy? He sees the amount of work that has to be done, and allows me to work as long as it takes to get it done. Still, it's a bit aggravating that I get told at least "37-40 hours" and find out different (remember the 'not being told information I need').

Now, to implement the laptops, wireless networks, and Google Apps...

**wolfie** · 09-29-2013, 06:52 PM

Quoth RichS View Post

3) The 3 servers making up the virtualization work as a cluster.

A bit of explaining: the servers working as a cluster is like individuals working as a team; when they work together, they work well, but if one piece fails (especially with no fallback, as this cluster has...

) then the whole thing goes down.

In other words, when someone says "These servers are configured as a cluster", they've censored themselves to only say what can be said in polite company (with the other - military - option for polite company being "These servers are configured as a Charlie Foxtrot").

**sms001** · 09-30-2013, 09:39 AM

Quoth RichS View Post

and 4 more hard drive failures.

I assume you know at this point that hd's from the same production run often suffer similar faults. Time to check those serial numbers.

Quoth wolfie View Post

"These servers are configured as a cluster", they've censored themselves to only say what can be said in polite company ..."

Maybe the story's apocryphal, but wasn't there a harassment lawsuit brought on because of one of the possible meanings of "F" in snafu? So maybe this is a ~~CYA~~ ~~CYH~~ ~~CYB~~ "hedge your bets" sort of thing.

**Chromatix** · 09-30-2013, 11:51 AM

It sounds as though whoever implemented that system had no sodding clue. Not only should a virtualised system be able to migrate services from a dead or degraded machine to a live one when required (thus keeping the system as a whole afloat), but a single HD failure should only put one machine into degraded mode, not kill the whole system.

Also, it's generally a good idea to keep at least one spare drive of the correct type around, to reduce downtime when (NOT if) a failure occurs.

**sms001** · 09-30-2013, 05:04 PM

Quoth Chromatix View Post

It sounds as though whoever implemented that system had no sodding clue.

My cynical side says the answer to that is:

Quoth RichS View Post

The decision is made: call the virtualization company, pay thousands more for 1 year support, and get the cluster up again.

**RichS** · 09-30-2013, 05:58 PM

Quoth wolfie View Post

In other words, when someone says "These servers are configured as a cluster", they've censored themselves to only say what can be said in polite company (with the other - military - option for polite company being "These servers are configured as a Charlie Foxtrot").

The OTD referred to it as "the cluster

"

I've been promised an extra drive this year, but it hasn't materialized yet...

Since I've been battling phones all afternoon, I'm going to skip ahead a bit temporarily, and this is pretty short.

Part 4 - You probably can't hear me now...

So, VoIP phones, great in theory, probably in practice with some sort of backup solution...

Here? Our VoIP is open-source, not supported anymore, and if the cluster goes down, everything goes, including the phones.

Our backup plan is to call the SIP provider and forward the phones to fax lines until the problem is fixed. Of course, I didn't find out about this until the third time the network went down.

Not only is this aggravating, it's a serious security issue - and it hasn't been fully taken care of yet. I believe this last outage was the proverbial straw. We're now looking at setting up separate physical servers with redundancy with a still open-source but fully supported program and assistance in configuration and training, to be completed ASAP.

The sooner, the better, if you ask me. One virtual server gone soon, several to go...

**hjaye** · 10-02-2013, 06:17 PM

I'm confused about your cluster setup?? The whole idea of a cluster is fault tolerance. If one server goes down then it should fail over to another server. Are all three nodes active?

Is it just being used to distribute the workload to speed up user access to the applications?

**RichS** · 10-03-2013, 11:04 PM

Quoth hjaye View Post

I'm confused about your cluster setup?? The whole idea of a cluster is fault tolerance. If one server goes down then it should fail over to another server. Are all three nodes active?

Yes, all 3 are active, and I've seen the fault tolerance, when it works. It just doesn't seem to work when it's needed, like when a drive is failing on one of the nodes. With power outages, all 3 are going down sooner or later anyway, and one goes down right away - the UPS that both of its power supplies are plugged into are on a UPS with spent batteries where the other 2 are half and half. One power supply is plugged into a good UPS, one in the bad. It's one thing I'm going to fix when we redo this server config. With the phone edict, I'm hoping it's pretty soon - if not, then I'll do it the next extended weekend.

Sorry for the delay, I've been doing a lot of catch-up at work.

Part 5 - Implementation, part 1: Imaging and wireless

Now that the office is straightened out, the next task OTD has me on is an imaging server. He wanted to have it the same as the main school - they use a free open-source imaging solution. He gives me a spare older rack server, and sets me off. I did really like to set this up, and I learned to do it all on my own.

Of course, it took me 3 tries to perfect it.

It runs great after, and it's been running for almost 1 year now. I just need to get it out of my office...

The new wireless equipment came next. I spent a lot of time in the ceilings, stretching wire and getting the access points mounted. I did have help from the main school for a few, mainly height issues. The OTD took care of the initial configuration of the controller, and he shown me how everything worked and how to administer it when he was done.

At this point in time, the district was made up of 3 buildings; High School, Middle School, and Elementary. After the last school year, the Elementary was closed and consolidated in the other 2. So, the new equipment went into the HS and MS, with the idea of the old wireless network going into the Elementary. With all of the server problems going on during the year, however, the Elementary wireless network didn't happen.

With the old equipment, wireless access was spotty, and administered a little different; instead of an internal server-based controller, the access points were administered via a website cloud-based controller. That would be a good solution, but they were definitely sending out a weaker signal that what we replaced them with, and there were fewer of them - how the previous tech could determine that 4-6 access points in each building would provide adequate coverage is beyond me. I've put in almost 40 of the new access points in the 2 buildings now; there's only a couple of weak spots I have to attend to.

The only real obstacle in installing the access points was the abundance of VLANS - Virtual Local Area Networks. Typically you would use a few on the switches, but the previous tech used over 15.

There was a problem with the MS access points communicating with the controller because of that mess. The OTD and I had to modify the MS switches and install the same VLAN that the controller was on at the HS; then we had to configure each port to use that VLAN.

So, the wireless network is in place, now to get the laptops out...

Announcement

The School Tech

The School Tech

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment