Our data centre provider was carrying out planned maintenance between 20:00 and 00:00. During this time they encountered brief unplanned network disruption. This caused an interruption to one of our services, which then did not recover on its own as we would usually expect. This resulted in sites from the systems listed below to stop processing alarms.

  • WebWayOne UK Host 1

  • WebWayOne UK Host 2

  • CSL Pro - all clusters

Sites from all other systems were unaffected.

On the morning of 3rd Feb we reset the service and started to process the alarms we had missed to ensure there were no gaps in the event logs.

By 10:30am the system returned to normal operation.

Conclusions and remedial action

We have identified the cause of the issue and will implement a solution to ensure all services are able to automatically recover following any network issues.

We will also expand our monitoring of the system as our out of hours notification mechanism failed to trigger.

Further Information

For more information please contact

Did this answer your question?