Coronavirus crisis as the ultimate network stress-test
As Coronavirus crisis continues to expand its reach across the world, network operators have played a critical role in supporting how we work, learn and care for each other. Never before has a crisis highlighted how critical the telecom infrastructure has become to every single aspect of our lives. In these unprecedented circumstances, telecom services and the applications they support allow millions of people to work, collaborate and study remotely while also staying in touch with close family and friends. In addition, we now can participate in telehealth visits with our doctors, stay updated with all the recent news, and enjoy endless entertainment options without leaving our homes. It allows all levels of government – be it a small town, county, state, or federal – to continue functioning without putting people at risk. It enables hospitals and other medical organizations to remain connected; collecting and sharing critical data that has become so vital for saving lives, advancing possible treatments and making progress with vaccine development. So, what has made all this possible? How are telecom providers coping with this surge in network traffic, and what kind of impact should we expect on telco operations once the COVID-19 crisis is over?
Increased network strain is created both by the traffic growth, as well as new, and sometimes unexpected, traffic patterns. The numbers reported globally by various types of network operators show an overall traffic increase in the range of 25-30 percent, with occasional spikes that go well beyond that. When looking at this more closely, we see that there is a significant increase in upload usage, driven primarily by video-conferencing, a large spike in gaming, and an expected surge in video streaming. At the same time, there is a noticeable decline of mobile data usage in many areas. With people staying in their homes and relying increasingly on their Wi-Fi and wireline broadband connections, “mobile” connectivity suddenly became somewhat less relevant. The pandemic has therefore inadvertently led to an unplanned “stress test” for the entire telecom and cloud infrastructure – certainly better than any test anyone could have planned. And so far, from a network capacity perspective, most network operations have successfully passed this ‘test’ and have met the unprecedented spike in demand since stay-at-home orders became commonplace.
Good networks are planned for peak traffic, not for average traffic, and operators typically plan their networks to handle demand a year or two into the future, assuming around 40% annual traffic growth. In a sense, they’ve seen an entire year of traffic growth happen in only two weeks – but thankfully most were ready for it, largely due to the investments made over the past decade through improvements in the fiber infrastructure and upgrades to the network core.
But with all the unquestionable importance of the underlying network infrastructure, all eyes are now on how user applications are performing. As people work from home and utilize video conferencing services, many of these applications have experienced a meteoric rise in demand. Zoom, for example, reported more than 200 million daily users in March, up from its previous high of 10 million users. So, how could the application providers be able to support all this overnight growth – without compromising application performance? For that, thank G-d (well, and Jeff Bezos) for the cloud infrastructure, which truly became a game changer in the current situation. There is no way that traditional infrastructure could have handled the same level of application layer load increase in a legacy data center hosting environment, with same level of agility, dynamic scaling and automation that companies like AWS, Microsoft and Google enable. If anyone still had any doubts about the benefits of the public cloud, by now, this discussion is basically over.
From an operational perspective, it’s safe to say that things are largely under control. Network operators’ investments in analytics and automation are paying off during this crisis, where visibility is key. For example, the surge in Zoom and other video-conferencing applications highlights the scale and the unexpected speed at which application use can grow. It also highlights the need for network operators to be able to quickly determine if reported application performance issues are related to a specific application – or are a result of an underlying network issue. This is where a lot of operations teams are focusing right now: understanding network behavior, identifying anomalies (and there are plenty of those), forecasting and anticipating emerging traffic patterns, many of which are out of norm, and quickly respond to network faults. They need to do all of this while relying on a reduced workforce that is primarily working from home wherever possible. Suffice it to say, network analytics teams are very busy these days. Everyone wants answers and quickly.
There are a lot of high-quality data sets and a lot of very talented people, both within the operator and, like TEOCO, on the vendor side, who are committed to getting us successfully through this crisis. Service providers are leveraging this talent and analyzing this data to produce more insights on the spot. Many of the operators have re-purposed their analytics teams, moving them away from their long-term projects, to providing ad-hoc insights for solving the day’s most burning problems. We provide our customers with the tools and expertise needed to help them at every step of this journey.
But if the current situation continues for much longer, we recognize there will be some challenges ahead. First, let’s not forget that network operators are not immune from COVID19; both in terms of people getting ill, unfortunately, and in terms of the change in the way they operate. For example, both Verizon and Comcast have reported that tens of thousands of their employees are now working from home – illustrating a completely different way in which teams are now managing network operations. While all the large network operators have deployed their business continuity plans, no business of this scale was built to operate entirely remotely, and networks will start showing the strain as the current situation lingers on. Second, not everything can be fixed remotely. Field technicians, for example, are called “field” for a reason. And as there are less of them available, as more employees and their families are affected by the crisis, it creates a workforce shortage that will impact operators’ ability to address network failures or install new infrastructure. With greater investments in automating existing network operations, manual processes can be further reduced, and such operational bottlenecks can be decreased, or even eliminated.
If we fast-forward to the 5G era, there will be a lot to learn from the current crisis. Network capacity will become less of an issue, and application performance will come to the fore. The reason for this is because the stakes will be higher. With 4G, video applications are typically used for things like virtual team meetings and staying in touch with the family; but with 5G, video will be utilized for more critical applications, such as remote surgery or operating a manufacturing facility from the other side of the world. This means that network operators will need to ascertain, faster and more effectively than ever before, where service performance issues are occurring – are they on the application layer, at the network slice level, or are they being caused by issues within the network core?
With 5G, it is widely accepted that automation in service assurance will quickly become table-stakes. It is simply impossible for assurance teams to manage the rise in 5G network data volumes and complexity to where they can detect, prioritize and resolve network and service issues without advanced automation tools. This sentiment is being reflected today in 5G industry spending projections. But it’s our belief that this isn’t enough. Network operators should also reconsider their stance on automation investments for their existing networks, as there are ample opportunities and efficiencies to be derived now, particularly as 4G/LTE networks will co-exist with 5G for many years to come.
With zero-touch automation, operators will be able to more effectively manage the end-to-end service assurance needs of their customers today and tomorrow. Combined with advanced analytics and AI and machine learning algorithms, automation will enable faster detection and diagnosis of the root cause of service faults. It also will also enable operators to contextualize relevant data to help operations teams prioritize their activities, depending on the services their networks are supporting.
Coming out of this pandemic, we do expect to see increased and accelerated investments into network automation. It will not only help telecom operators on an ongoing basis and make their operations more efficient; it will also prepare them better for the next crisis.
About the Author
Dima Alkin has more than 15 years of experience in the telecommunications industry, and has held various leading roles spanning network operations, IT architecture, sales engineering and product innovation. In his current role as a VP of Service Assurance Solutions at TEOCO he focuses on serving the evolving operators’ needs by helping them successfully face OSS/BSS challenges with innovative solutions and new approaches. His primary area of interest revolves around growing use of advanced analytics, Machine Learning and AI in solving real operational issues driven by NFV/SDN advancement and 5G evolution.
This article was first published on The Fast Mode.