Hunting down the stuck BGP routes (posted 2021-04-22)
Ben Cox (Benjojo) has an interesting post about stuck BGP routes and a flaw in many BGP implementations where they hang when their neighbor stops accepting data over TCP: Hunting down the stuck BGP routes
A stuck BGP route means that a prefix was advertised at some point, and then it's withdrawn but the withdrawal somehow gets lost somewhere, so part of the internet still sees the withdrawn route.
There doesn't seem to be an obvious way to get these prefixes unstuck, although I would try advertising them again and withdrawing them again. The AS in the middle where the prefixes are stuck should probably clear (reset) BGP sessions until the problem is fixed, although I would be tempted to just reboot the whole router just to be safe.
The hanging issue happens when a router gets behind on processing incoming BGP updates. At some point its TCP receive buffer fills up, so it sets its TCP window size to 0. Then the buffer on the sending BGP neighbor starts to fill up with updates and keepalives. When that buffer is full, the sending BGP process can't write data to the TCP socket anymore... and bad things start to happen.
Ben feels that the BGP spec needs to be updated to address this issue. However, this is not a matter of interoperation that needs to be standardized, but simply an implementation issue that each vendor can fix on their own, in my opinion.