Broadband Outage Statement by Stephen Crane
Broadband Outage Statement by Town Manager Stephen Crane
To the Concord Community:
The Concord Municipal Light Plant (CMLP), which operates the Town Broadband, experienced a prolonged and intermittent outage that affected many of our customers. For those who were impacted, I share your frustration at the situation and, as the person ultimately responsible for the performance of the Town departments, offer my sincere apologies for the breakdown of our service and the poor communication throughout the outage.
The issues with broadband were a cascading series of failures that made it extremely difficult to figure out what went wrong—let alone communicate it in an accurate way to our subscribers. The CMLP Broadband team is a small group that is supported by Town customer service representatives as well as a third-party call center. The rapidly changing circumstances of the outage required the focus of the team, who worked countless hours without rest to restore service. Nevertheless, we should have provided more updates on the outage even if we were not completely sure what to report in any given moment. Fortunately, outages are rare; unfortunately, we did not have an emergency notification system in place for our subscribers.
We are working on a plan to ensure updates are timely and provide what information we have in a more clear and functional manner for the future.
As a demonstration of our regret over the outage and our commitment to our customers, we will be issuing a credit to all subscribers for 25% of their regular monthly bill.
What follows is a brief overview of the series of issues that kept CMLP staff working around the clock to restore service:
The Town Network consisted of two core routers and two core switches which were at end of life and in need of replacement. Since February, the Broadband team has been changing the architecture of the equipment that provides internet service to customers and had been preparing new servers that manage individual customers. In an effort to remove the failing infrastructure, a device was removed a few weeks ago to improve security and eliminate a point of failure for Concord’s broadband service. The Broadband team then focused on building replacements for the aging servers responsible for handing out individual customers’ IP addresses.
On Friday June 25th, a physical break in the fiber necessitated a repair that caused a change in the way internet traffic is routed and required equipment to be reconfigured. This also required networking equipment configurations to change to accommodate the new paths of data. This change inadvertently impacted customers of a particular class of service (static IP addresses.) This resulted in a portion of customers being knocked offline. The troubleshooting was made complicated both by the recent changes and the original devices that were never set up in line with best practices, according to the third-party consultants we brought in to aid in the restoration of service. Changes to restore these customers grew complex, and the new servers were deployed before they were fully tested and ready.
Fixes were attempted that would restore service to some customers but bringing these new servers online caused problems for customers who had service on the old servers. When the new servers that distribute IP address were utilized in an attempt to restore service, they caused conflicts that had to be handled almost completely manually, and the transition between old and new equipment had several hiccups, causing people that were getting service to suddenly lose it. Since they are managed individually or in small groups, it took around 2 days to ensure that all customers were able to be managed by the new servers and have traffic successfully flow through all equipment.
Due to other complications, the Broadband team, with the support of our trusted vendors, needed to remove duplicate IP addresses individually to ensure proper function after sorting through the system and correlating several reports. This was obviously a very time consuming process.
The system is working well now, but we will continue to monitor operations going forward to catch any issues early and send out notifications should we identify any concerns. We also believe that internet speeds are back to expected levels for all customers. Given the unusual nature of the outage, we will also investigate the incident to determine if the known failures were exacerbated by unknown outside influences.
While the failure of our broadband was a letdown to our community and our customers, we believe we will become a better operation because of the changes we have now made and lessons we have learned. Now that service is fully restored, we will dedicate every effort to replacing all hardware and improving the resilience of the networks.
Thank you for your understanding, if you would like to provide comments on how we can improve in the future and better serve you, please contact firstname.lastname@example.org.
This information in document format (PDF) can be found by clicking here.