Jump to content
 

The non-railway and non-modelling social zone. Please ensure forum rules are adhered to in this area too!

Major worldwide IT outages


AY Mod
 Share

Recommended Posts

4 hours ago, kevinlms said:

But no one would be silly enough to unplug the server - would they?

Something like a server should not be supplied via a plug & socket - it should be hard wired & any switches should be key operated (circuit breakers upstream should be suitable marked/locked on.

Absolutely no excuse.

  • Like 2
  • Agree 1
  • Funny 2
Link to post
Share on other sites

  • RMweb Premium
15 minutes ago, GrumpyPenguin said:

Something like a server should not be supplied via a plug & socket - it should be hard wired & any switches should be key operated (circuit breakers upstream should be suitable marked/locked on.

Absolutely no excuse.

...and I would expect any server worthy of the name to have at least two PSUs, with separate mains cabling. Things were somewhat different 20-30 years ago though- in non safety critical environments anyway.

  • Like 3
  • Agree 1
Link to post
Share on other sites

3 hours ago, kevinlms said:

Indeed it was dead, because the UPS had been unplugged from the power point and the battery had gone flat!

I wonder how long they put up with the beeping from the UPS, before everything stopped?

 

Ah yes, fun with UPS.

 

Another job I had was with a certain High Street bank's insurance department, on one floor of an office block in Bristol. Just to protect the guilty party, let's call it the "Not Willing" bank, because t'management was notoriously tight fisted on equipment. One time, there was a major power cut that affected the whole office block the "Not Willing" bank was sharing. Which set off fire alarms, and the whole building was evacuated. Fortunately t'was a nice sunny day, so we didn't mind standing outside. All the desktop computers had crashed, so someone wondered aloud if the servers would have crashed as well. One of t'management pompously said "Oh no, they've got a UPS". So we waited ... and waited ... and then wondered why we could see multiple fire engines arriving, and fire crews rushing into the building seriously quickly, not like they thought it was a false alarm.  Well, it wasn't a false alarm, there was a real fire, and it was in the server room of the "Not Willing" bank! It turned out t'management had refused the IT Dept request for a UPS for each server. So some bright spark (sic) had wired about ten seperate servers into one poor little UPS. When the mains power went down, the surge load on the one poor little UPS was so great it overheated and burst into flames, setting off the sprinkers and another call to the Fire Brigade.

 

  • Like 11
  • Round of applause 2
Link to post
Share on other sites

  • RMweb Premium

When working for BT as an installer for business customers, I and a little, twitchy Cockney mate ended up in the server room of a major design and control company (they, at the time did design work for the automotive and petrochemical industries). 

We were there to install some cables for some new telephone system they were having. I was the 'Youff' so was doing what I was told, and Mr Twitch was doing the easy parts of the job (like pushing the rods where they needed to go, and telling me to cleat things to the walls). In the server room we had to get across the ceiling (which was suspended), so Mr Twitch pushes up the ceiling tiles and thrusts across the rods. I tape on the heavy multi-pair cable and then proceed to feed it up into the ceiling. While this was going on there is a sudden click followed by silence. 

It must have been about 12 seconds before the server room door slams open (against a data rack of all things!) and there the words along the lines of 'have either of you kind gentlemen happened to switch anything off?'.

I looked at Twitchy, and noticed behind him that there was what looked like an MK cooker switch, underneath which was a mark on the wall made by the ends of his rods. I think my face told Twitchy all he needed to know and he turned round, flicked the switch back on, and said 'Thats a bit daft having your server fed on one switch', and then carried on pulling the cable in. 

When the manager turned up, I was told to beat a retreat and go to the exchange.....

When he arrived at the exchange, his twitches were some what larger than normal. We didn't go back to finish up.

 

Andy G

  • Like 2
  • Funny 5
  • Friendly/supportive 1
Link to post
Share on other sites

  • RMweb Premium
58 minutes ago, KeithMacdonald said:

 

Ah yes, fun with UPS.

 

Another job I had was with a certain High Street bank's insurance department, on one floor of an office block in Bristol. Just to protect the guilty party, let's call it the "Not Willing" bank, because t'management was notoriously tight fisted on equipment. One time, there was a major power cut that affected the whole office block the "Not Willing" bank was sharing. Which set off fire alarms, and the whole building was evacuated. Fortunately t'was a nice sunny day, so we didn't mind standing outside. All the desktop computers had crashed, so someone wondered aloud if the servers would have crashed as well. One of t'management pompously said "Oh no, they've got a UPS". So we waited ... and waited ... and then wondered why we could see multiple fire engines arriving, and fire crews rushing into the building seriously quickly, not like they thought it was a false alarm.  Well, it wasn't a false alarm, there was a real fire, and it was in the server room of the "Not Willing" bank! It turned out t'management had refused the IT Dept request for a UPS for each server. So some bright spark (sic) had wired about ten seperate servers into one poor little UPS. When the mains power went down, the surge load on the one poor little UPS was so great it overheated and burst into flames, setting off the sprinkers and another call to the Fire Brigade.

 

 

That reminds me of a similar incident when I worked for an insurance company at their newly opened office in Manchester back in the late 1980's.

 

They had rented a couple of floors in an open plan city centre office block, and so when we arrived, all the equipment was still being installed, including the computer servers.  Local management decided to build a server room around the servers.

 

Shortly after that, the servers started shutting down at random causing systems to shut down.  After a few days of this, the IT guys from head office turned up, walked in to the server room and said 'Where's the air conditioning?'  To which the local management said 'What air conditioning?' 

 

They hadn't taken into account that building a non-ventilated room full of computer servers would get very warm very quickly, and it was overheating that was causing the servers to shut down.  Needless to say, air conditioning was installed in the server room pretty sharpish.

  • Like 8
  • Funny 3
Link to post
Share on other sites

  • RMweb Gold
2 hours ago, spamcan61 said:

...and I would expect any server worthy of the name to have at least two PSUs, with separate mains cabling. Things were somewhat different 20-30 years ago though- in non safety critical environments anyway.

And is a requirement of DORA also…

it doesn’t just cover software… power resilience is just as important. 
I’ve heard today a facility with over 500 Linux servers went down because the power monitoring system went down, was based on a Windows server.

  • Interesting/Thought-provoking 2
Link to post
Share on other sites

I read that CrowdStrike’s president tweeted that the incident had been caused by a “defect found in a single content update for Windows hosts” (which we all know by now) and added: “This is not a security incident".  He's presumably never heard of the "CIA Triad" of Information Security; CIA being an abbreviation for Confidentiality, Integrity and Availability.

Edited by ejstubbs
  • Like 3
Link to post
Share on other sites

  • RMweb Premium
10 hours ago, Coldgunner said:

Exactly this, some things are simply rushed to get them out the door without rigourous testing and limited rollouts beforehand.

Do you want it now, or do you want it right

  • Like 2
Link to post
Share on other sites

  • RMweb Premium
41 minutes ago, adb968008 said:

So my trip to DC went out of the window this morning it took 2 hours to get rebooked to London from Newark.

 

The App keeps crashing, customer service online wasn’t working and the two girls on the premier desk were struggling to process anyone.

 

Airside security isn’t much better..IMG_6381.jpeg.02b6668fbd25635c40fa3dff86b85f91.jpeg

 

And gate calling is all manual (indeed it’s refreshing not to be hearing the automated announcements) but you are reliant on yourself to be at the right gate..

 

IMG_6383.jpeg.814e519b482776506aacd474a3e6ca51.jpeg

 

early finish now for me without the IAD trip I’ve a day off the schedule.

 

corona..IMG_6384.jpeg.294d5a734f7992305e18d3b04c8d8c87.jpeg wasn’t that a virus ?

 

  • Funny 4
Link to post
Share on other sites

M'Lady has taken this as an excellent lesson in the perils of the drive towards a "cashless society", which has just proved how fragile it all can be. She is a great believer in holding onto wodges of cash. Shopping is natural directed towards those still willing to accept legal tender in cash form. Especially the market stalls for meat, veg, and cheese. Where the prices are cheaper anyway.

  • Like 4
  • Round of applause 1
Link to post
Share on other sites

  • RMweb Gold

We had a railway related IT problem. My ex employer had a training center where people bought their own laptops and logged onto a public guest WiFi which we had set up. But about every 10 minutes a lot of their PCs lost their connection. We checked everything and could not find the cause. Eventually I was sent over to observe onsite. The location was very convenient as it was close to a local metro station. I had my own laptop on the network and sure enough lots of people lost their connection, but my laptop had no problem at all. Then the penny dropped: everytime a train stopped at the station just outside the window, all their laptops hopped on to its free WiFi. When the trains pulled out, they lost their connection.

  • Interesting/Thought-provoking 1
  • Funny 10
Link to post
Share on other sites

  • RMweb Premium
24 minutes ago, KeithMacdonald said:

M'Lady has taken this as an excellent lesson in the perils of the drive towards a "cashless society", which has just proved how fragile it all can be. She is a great believer in holding onto wodges of cash. Shopping is natural directed towards those still willing to accept legal tender in cash form. Especially the market stalls for meat, veg, and cheese. Where the prices are cheaper anyway.

Not necessarily, if the registers at supermarkets and the like are down, they will close, because the system can't calculate the bill and re-order replacement stock. So easier to close for the duration. If it's only a matter of hours, not enough time to worry about alternative means, as by the time you do that, the problem will be resolved.

  • Like 1
  • Interesting/Thought-provoking 2
  • Friendly/supportive 1
Link to post
Share on other sites

13 hours ago, rab said:

Do you want it now, or do you want it right

You give 3 options on anything you are offering:-

cost (how cheap), speed (how quick), quality (how good)

then tell the customer to pick 2, because you can't have all 3

  • Like 2
Link to post
Share on other sites

  • RMweb Gold
20 hours ago, adb968008 said:

With nothing else to do for 8 hours I thought I’d take the shuttle to Newark Airport and have a look.. Departure board seems to have bought it…


IMG_6376.jpeg.ae25065cb49404c5d2f859453ce74ed7.jpeg

the desks themselves are very quiet but that’s because the Police are restricting road traffic coming in .IMG_6375.jpeg.a6870e6f2cc51e5f7892625304a97428.jpeg


Few canceled, just delayed.. I suspect they arent able to cancel the flights, as theres really not much chance this lot taking off at the rate things are moving.

 

IMG_6377.jpeg.090d6fd2b26e6d0065296559123e187c.jpegeven the checkin desks are blue screened.. you can only checkin if you I’ve mobile checked in.. but that means little as virtually nowt is flying.

 

IMG_6379.jpeg.f466b61eb3ec5efc53f4dff489bb6f3e.jpeg

 

went to the hotel breakfast and well seems the guests have disappeared too..

IMG_6366.jpeg.12c32a7c88eb9913a88ed8f2a12792f0.jpeg

 

it’s beginning to feel like Sept 11th again at the airport.. like a shopping mall after midnight not a Friday afternoon at one of the US’s busiest airports

 

 

Maybe the airline should change its name to UNTIED?

  • Round of applause 1
Link to post
Share on other sites

  • RMweb Gold
18 hours ago, PMP said:

I suspect very few will need to be positioned. The best option is to bin the days flying program, or start from a      specific sector once the systems are serviceable. As many domestic routes (and regional) are out and back, you ‘kill’ those sectors you know you can’t operate, and then fly those you can. That then puts the aircraft back into the correct location for the following days flying program. The crew positioning will be more complex, but a similar rationale will be applied.

How very sensible - exactly what i used to do with trains when I went into emergency service planning and recovery mode.  

I only wish that GWR would do it that way nowadays.

  • Like 2
Link to post
Share on other sites

21 hours ago, adb968008 said:

 


IMG_6376.jpeg.ae25065cb49404c5d2f859453ce74ed7.jpeg

the desks themselves are very quiet but that’s because the Police are restricting road traffic coming in .


Few canceled, just delayed.. I suspect they arent able to cancel the flights, as theres really not much chance this lot taking off at the rate things are moving.

 

IMG_6377.jpeg.090d6fd2b26e6d0065296559123e187c.jpegeven the checkin desks are blue screened.. you can only checkin if you I’ve mobile checked in.. but that means little as virtually nowt is flying.

 

According the TV news India seems to have managed better.  Perhaps they are more used to systems going down.

In Delhi as they couldn't print boarding cards they apparently just gave everybody blank boarding cards and told the punters to fill them in themselves.    As the departure boards weren't working they wrote the gate numbers on a white board.  The main problem with that was that people kept touching the white board and accidentally erasing the info so it had be written up again.

  • Like 3
  • Informative/Useful 1
Link to post
Share on other sites

41 minutes ago, Michael Hodgson said:

According the TV news India seems to have managed better.  Perhaps they are more used to systems going down.

 

Indian colleagues (that I work with daily) say this may be because a lot of IT systems in India are based on Linux servers, not Microsoft. They were indeed more used to systems going down, in the past when Indian mains electricity was less reliable, but that's rapidly changing for the better. They also say Indian IT systems are built to be more resiliant, as they are built with failure in mind.

Edited by KeithMacdonald
Typo fixes
  • Like 3
  • Informative/Useful 1
  • Interesting/Thought-provoking 1
Link to post
Share on other sites

4 hours ago, Vistisen said:

We had a railway related IT problem. My ex employer had a training center where people bought their own laptops and logged onto a public guest WiFi which we had set up. But about every 10 minutes a lot of their PCs lost their connection. We checked everything and could not find the cause. Eventually I was sent over to observe onsite. The location was very convenient as it was close to a local metro station. I had my own laptop on the network and sure enough lots of people lost their connection, but my laptop had no problem at all. Then the penny dropped: everytime a train stopped at the station just outside the window, all their laptops hopped on to its free WiFi. When the trains pulled out, they lost their connection.

A similar issue can arise on buses in the UK. in our area most First Bus buses have wifi. If you log on at a bus station there's a very high probability you loose you connection when the bus departs, as you will have logged ontothe system on another bus.

Link to post
Share on other sites

  • RMweb Gold

I understand the root cause of the problem was that the updated definition file automatically sent out to all systems was just full of zeroes, and the software couldn't cope with that.  Looks like someone messed up big time!

CS.png.3a0f8b8f89363402f8c3fe908ebc8d95.png

 

 

 

  • Informative/Useful 4
Link to post
Share on other sites

28 minutes ago, RFS said:

I understand the root cause of the problem was that the updated definition file automatically sent out to all systems was just full of zeroes, and the software couldn't cope with that.  Looks like someone messed up big time!

CS.png.3a0f8b8f89363402f8c3fe908ebc8d95.png

 

 

 

Yes, it's the old story of a big fuss was all about a lot of nothing.

  • Like 1
  • Round of applause 2
  • Funny 10
Link to post
Share on other sites

  • RMweb Gold
Posted (edited)
4 hours ago, The Stationmaster said:

Maybe the airline should change its name to UNTIED?

https://en.wikipedia.org/wiki/Untied.com
 

site existed for many years.

 

This made it famous.. when a bands guitar was trashed, united didn't care so he wrote a song that went viral, called “United Breaks Guitars” and got 25mn hits, became an itunes no1 hit, whilst even dipping the share price at one point..

you can see his broken guitar in some scenes

 

https://en.wikipedia.org/wiki/United_Breaks_Guitars

 

 

Edited by adb968008
  • Like 3
Link to post
Share on other sites

  • RMweb Gold
6 hours ago, KeithMacdonald said:

M'Lady has taken this as an excellent lesson in the perils of the drive towards a "cashless society", which has just proved how fragile it all can be. She is a great believer in holding onto wodges of cash. Shopping is natural directed towards those still willing to accept legal tender in cash form. Especially the market stalls for meat, veg, and cheese. Where the prices are cheaper anyway.

I had no problems paying with plastic at Tesco and Amazon was happy as well. Tbh apart from the headlines I saw no evidence of any problems. Business as usual.

  • Agree 1
  • Informative/Useful 1
Link to post
Share on other sites

  • RMweb Gold
2 hours ago, RFS said:

I understand the root cause of the problem was that the updated definition file automatically sent out to all systems was just full of zeroes, and the software couldn't cope with that.  Looks like someone messed up big time!

CS.png.3a0f8b8f89363402f8c3fe908ebc8d95.png

 

 

 

Probably a test file that wasnt replaced with the payload file after testing.

  • Agree 1
  • Interesting/Thought-provoking 1
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...