DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Learning AI/ML: The Hard Way
  • Machine Learning: A Revolutionizing Force in Cybersecurity
  • AI Advancement for API and Microservices
  • From Algorithms to AI: The Evolution of Programming in the Age of Generative Intelligence

Trending

  • Building a Performant Application Using Netty Framework in Java
  • Architecture: Software Cost Estimation
  • Sprint Anti-Patterns
  • How To Optimize Your Agile Process With Project Management Software
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Leveraging ML-Based Anomaly Detection for 4G Networks Optimization

Leveraging ML-Based Anomaly Detection for 4G Networks Optimization

How the latest technology is helping cellular providers to improve their services.

By 
Oleksandr Stefanovskyi user avatar
Oleksandr Stefanovskyi
·
Jan. 14, 23 · Analysis
Like (1)
Save
Tweet
Share
3.6K Views

Join the DZone community and get the full member experience.

Join For Free

Artificial intelligence and machine learning already have some impressive use cases for industries like retail, banking, or transportation. While the technology is far from perfect, the advancements in ML allow other industries to benefit as well. In this article, we will look at our own research on how to make the operations of internet providers more effective. 

Improving 4G Networks Traffic Distribution With Anomaly Detection

Previous generations of cellular networks were not very efficient with the distribution of network resources, providing coverage evenly for all territories all the time. As an example, you can envision a massive area with big cities, little towns, or forests that span for miles. All these areas are receiving the same amount of coverage — although cities and towns need more internet traffic, and forests require very little. 

Considering the higher amount of traffic of modern 4G networks, cellular providers are able to achieve impressive energy savings and improve the experience of the customers by optimizing the utilization of frequency resources. 

Machine learning-based anomaly detection allows predicting the traffic demand in various parts of the network, helping operators to distribute it more properly. This article is based on our analysis of the information from the public domain and implementing ML algorithms to effectively solve this problem with one of the possible approaches. 

There are multiple solutions for this particular problem. The most interesting include:

  • Anomaly detection and classification in cellular networks using automatic labeling technique for applying supervised learning suitable for 2G/3G/4G/5G networks.

  • CellPAD is a unified performance anomaly detection framework for detecting performance anomalies in cellular networks via regression analysis.

Data Overview

The research was done with information extracted from the actual LTE Network. The dataset included 14 features in total, with 12 numerical and 2 categorical. We had almost 40,000 rows of data records with no missing values (empty rows). The data analytics team divided the information into two labeled classes:  

  • Normal or 0: the data doesn’t need any reconfiguration or redistribution

  • Unusual or 1: where the reconfiguration is required due to abnormal activities

The labeling was executed manually based on the amount of traffic in particular parts of the network. However, there is an option for automated data labeling leveraging neural networks. Look up Amazon SageMaker Ground Truth for this function, or Data Labeling Service from Google’s AI Platform. 

Data Analysis Results

Analysis of the labeled data showed us that the whole dataset was imbalanced. We had 26,271 normal (class 0) and 10,183 (class 1) unusual values:

According to the dataset, a Pearson correlation matrix was built:

4G networks utilization features correlation plot (Pearson)

As you can see, the big amount of features is heavily correlated. This correlation allows us to understand how different attributes in the dataset are connected to each other. It serves as a basic quantity for different modeling techniques and sometimes can help us to discover a casual relationship and predict one attribute based on the other. 

This time we have perfectly positive and negative attributes, which may lead to the multicollinearity problem that will impact the performance of a model in a bad way. It happens when one predictor variable in a multi-regression model could be predicted linearly from any other variables highly accurately. 

Lucky for us, decision trees and boosted trees are able to solve this issue by picking a single perfectly correlated feature at the moment of splitting. When using other models like logistic regression or linear regression, remember that they can experience this problem and need extra adjustment before training. Among other methods to deal with multicollinearity are principle component analysis (PCA) and the deletion of perfectly correlated features. The best option for us was the usage of tree-based algorithms because they don’t need any adjustments to deal with this problem.

Basic accuracy is one of the key metrics to measure classification, and it is the ratio of correct predictions to the total number of samples in the dataset. As was mentioned previously, we have imbalanced classes in our case, which means that basic accuracy can provide us with not correct results due to high metrics that don't show the prediction capacity for the minority class. 

We can have accuracy close to 100% but still have low prediction capacity in a particular class because anomalies are the rarest in the dataset. Instead of accuracy, we decided to use the F1 metric, a harmonic average of precision and recall, which is great for imbalanced classification situations. F1 metrics cover the range from 0 to 1, where 0 is a total failure and 1 is a perfect classification. 

The samples can be sorted in four ways:

  • True Positive, TP — a positive label and a positive classification
  • True Negative, TN — a negative label and a negative classification
  • False Positive, FP — a negative label and a positive classification
  • False Negative, FN — a positive label and a negative classification

Here is what the metrics of the imbalanced classes look like:

True Positive Rate, Recall, or Sensitivity

False Positive Rate or Fall-out

Precision

True Negative Rate or Specificity

The formula of the F1-score metric is:

Algorithms of Our Choice

DecisionTreeClassifier was a great place to begin for us, as we got 94% accuracy on the test selection without any additional tweaks. To make our results even better, we moved to BaggingClassifier, also a tree algorithm that provided us with 96% accuracy according to the F1-score metric. We also tried RandomForestClassifier and GradientBoostingClassifier algorithms, which gave us 91% and 93% accuracy.

Feature Engineering Step

We achieved great results thanks to tree-based algorithms, but there was still some room to grow, so we decided to improve the accuracy further. While processing data, we added time features (minutes and hours), added the possibility to extract the part of the day from the “time” parameter, and tried the time lags feature — these moves didn’t help much. However, what did help to improve the outcomes of the model was the usage of upsampling techniques that allowed feature transformation and balancing of the data. 

Parameters Tuning Step

All out-of-the-box algorithms showed results that exceeded 90%, which was great, but with the GridSearch technique, it was possible to improve them even further. Among four algorithms, GridSearch was the most effective for GradientBoostingClassifier and helped to achieve an astonishing 99% accuracy to complete our initial goal. 

Conclusion

The issue we highlighted in this article is very common among all mobile internet providers that offer 3G or 4G coverage and can be dealt with to improve the experience of users. In this scenario, the “anomaly” is seen as a waste of internet traffic. Machine learning models can make a decision on the effectiveness of the distribution of resources based on input data. The described usage of GradientBoostingClassifier tuned with GridSearch can help companies to evaluate the efficiency of traffic distribution and advise them which parameters need to be changed in order to provide the best user experience.

Ineffective traffic utilization is not the only problem that data science can solve in the telecom industry. Solutions like fraud detection, predictive analytics, customer segmentation, churn prevention, and lifetime value prediction are also possible with the right team of developers.  

Anomaly detection Data science Machine learning Network

Published at DZone with permission of Oleksandr Stefanovskyi. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Learning AI/ML: The Hard Way
  • Machine Learning: A Revolutionizing Force in Cybersecurity
  • AI Advancement for API and Microservices
  • From Algorithms to AI: The Evolution of Programming in the Age of Generative Intelligence

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: