Time is running out: LDeX Connect CTO shares his thoughts on the Internet Routing Table

Simon Chamberlain, CTO of LDeX Connect, gives his thoughts on the Internet Routing Table.

Recently, a somewhat anticipated problem, somehow managed to (ironically) creep up on the Internet Service Provider community, causing panic amongst many network engineers all over the world. The problem being that outdated infrastructure, which is still widely relied upon, ran out of sufficient memory to store the full Internet routing table. This resulted in a spike in outages leaving users unable to access Internet resources.

Despite having several years to prepare, network engineers rushed to make last minute configuration changes and performed hardware upgrades as routers and switches fell over left, right and centre. This event sparked widespread attention amongst the global media as it was the third time this century that the Internet had broken through such a threshold. The number of routes surpassed 128,000 around 2003 and 256,000 in 2008, each causing disruption for those who failed to update their networking equipment.

As the CTO of an ISP myself, it’s interesting to observe how this ‘ticking time bomb’ of a problem could have taken the Internet community by surprise. It’s almost a self-imposed denial of service attack – perhaps we should spend at least the same amount of effort as we do patching software bugs and vulnerabilities, keeping infrastructure up to date.

The Internet Routing Table

To give some insight, the Internet comprises of prefixes allocated to organisations which are subsequently distributed across the Internet using routers and switches operating Border Gateway Protocol (BGP). Recently, the amount of memory required to hold all the prefixes exceeded the 512Mb threshold. This caused a problem for many legacy routers with only 512Mb of memory installed.

What took many ISPs by surprise was the fact that Cisco’s flagship Sup-720 supervisor engine (for many years) was affected, even with 1GB of RAM installed. No doubt many network managers presumed that this would be plenty and failed to give the issue a second thought. Most network support staff on the other hand, were too busy with the usual firefighting I suspect.

So, what happened?

By default, Cisco (out of the box) only allocates 512MB of the Sup-720 supervisor memory to IPV4 prefixes. 256Mb of memory is allocated to IPV6 and the rest left unallocated. In my view, it’s a simple process to reallocate memory to IPV4 with a few lines of configuration, but here’s the gotcha – the system needs rebooting. Cisco has of course advised that network operators reassign some of the memory in their routers and switches and reboot. However, when it’s 10am and your router is about to fall over, the last thing you want to do is reboot it and suffer a total outage.

Luckily for us at LDeX Connect, we reconfigured our Sup 720s memory allocation when they were installed in anticipation, but to be fair I can’t criticise any ISPs who were caught out.

It’s my belief that the Internet is a living breathing monster of a creation and very difficult to tame. Every so often, something unexpected happens that you thought you were completely prepared for and then you’re caught by surprise. Unfortunately it’s not a view often shared by Management!

Follow by Email
LinkedIn
LinkedIn
Share