External endpoints not available

Incident Report for Gini

Postmortem

Gini operates it's services from two datacenters in Germany run by the same provider. We are currently using one of them to accept external traffic and processing is spread between the two locations.

The outage on the 14th of February was caused by another customer in our ingress datacenter who has a special setup and deployed a erroneous network configuration. The infrastructure of our provider has dedicated network equipment and safety mechanisms in place to isolate misbehaving network segments and protect the core infrastructure. Unfortunately these mechanisms have resorted too late and the errors propagated to the core infrastructure which also affected Ginis external access points.

Once the problem was identified our provider took these immediate countermeasures:

The customer causing the erroneous network configuration was shut down immediately and was reenabled only after their systems were isolated with another dedicated firewall.
The firewall is a managed system by our provider and they will ensure that a similar problem cannot happen again.

After these measures have been taken the network was operational again and Ginis backend recovered from the outage. There are the next steps our provider initiated to prevent further events of this kind:

Other customers with similar setups are identified and will be isolated asap.
The special setup operated by the other customer is no longer allowed and new requests will be denied.
Our provider is in contact with the vendor of the core network components to figure out why the safety mechanisms did not work on time.

Gini and our datacenter provider are both deeply sorry about the outage and the impact it had on Ginis customers. We strive to offer the best products and services and we haven not lived up to this goal. Feel free to contact us at technical-support@gini.net if you have any questions about Gini or this outage.

Posted Feb 17, 2017 - 12:21 CET

Resolved

Everything works as expected. We will share details about the root cause in a post-mortem as soon as we get additional informations from our provider.

Posted Feb 14, 2017 - 15:27 CET

Monitoring

The networking team of our provider implemented a fix and we're monitoring the situation closely. Api and usercenter are working as expected again.

Posted Feb 14, 2017 - 15:01 CET

Identified

We're still waiting for details from our datacenter provider but it seems to be a problem in their internal network stack. We will post updates as soon as we get more informations. We're deeply sorry for any inconveniences caused by this outage.

Posted Feb 14, 2017 - 14:02 CET

Investigating

We are currently investigating this issue.

Posted Feb 14, 2017 - 13:52 CET