Earlier this week, a large-scale internet outage made sites including Amazon, Reddit, the New York Times and the UK government’s website inaccessible to many users.
The technical issue was reportedly caused by an unexplained configuration error at a single infrastructure provider, which handles 10 per cent of the world’s internet traffic.
The internet outage shows that there is a real need for organisations in Australia and globally to have a unified view across their IT estates; enriched with the context of the business and IT metrics that are relevant to them. By embedding technologies that facilitate a broad overview of the organisation, IT teams can detect, respond to, and resolve business-impacting incidents early and proactively.
Addressing major outages demands attention from both IT operations and security teams. Organisations operating applications on multi-cloud and hybrid cloud architectures benefit tremendously from a unified view spanning across those estates, consolidating telemetry from applications, infrastructure, and security systems. Doing so is critical to proactive detection, faster response, and ultimately the resolution of incidents.
Many organisations are still operating IT technologies in silos which leaves critical blind spots. An observability stack, for example, can help by continuously analysing the digital exhaust from IT systems and surfacing configuration errors and failures to IT teams early. Furthermore, enriching those views with sources like IT change and release management, social media, and business transaction data helps organisations understand service delivery trends in the context of business and IT changes.