DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Empowering Developers: Navigating the AI Revolution in Software Engineering
  • Exploring the Frontiers of AI: The Emergence of LLM-4 Architectures
  • Organizing Knowledge With Knowledge Graphs: Industry Trends
  • Weka Makes Life Simpler for Developers, Engineers, and Architects

Trending

  • How to Submit a Post to DZone
  • Service Mesh Unleashed: A Riveting Dive Into the Istio Framework
  • Some Thoughts on Bad Programming Practices
  • DZone's Article Submission Guidelines
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. What Is the Fastest Way to Solve a Machine Learning Problem?

What Is the Fastest Way to Solve a Machine Learning Problem?

Solving a machine learning problem can be a daunting affair for beginner data scientists. There are simply so many algorithms to choose from!

By 
Stylianos Kampakis user avatar
Stylianos Kampakis
·
Dec. 02, 22 · Tutorial
Like (1)
Save
Tweet
Share
6.6K Views

Join the DZone community and get the full member experience.

Join For Free

Solving a machine learning problem can be a daunting affair for beginner data scientists. There are simply so many algorithms to choose from! Simply go to scikit-learn‘s page, and you are already overwhelmed by all the options! One of the main challenges is that if you get bad performance results, you can’t be sure whether it is your fault or the dataset is simply not good enough.

Through all the years of practice, I have developed a process that I am using to quickly figure out whether the data is of good quality or not.

In machine learning, algorithms can be placed on a continuum of “power,” from the least powerful to the most powerful ones. Naive Bayes, for example, is a very simple classifier. Deep neural networks and random forests, on the other hand, are very powerful models. In terms of regression, linear regression is probably the simplest regression algorithm in existence.

Now, let’s see how we can use this to quickly determine whether there is something wrong with the dataset or something wrong with our approach.

scikit learn data science cheatsheet

A Fast Process for Machine Learning Problems

So, the trick is simply as follows:

  1. Use 1-2 very simple models. Record the results.
  2. Use 1-2 very complicated models. Record the results.

If the results are very similar, then this means that it is difficult for the more powerful models to extract more information from the dataset than the simple models. So, what this means is that it is very likely that there is simply not enough useful information in the dataset.

So, for example, let’s say that you used linear regression for a regression task, and you are getting an RMSE of 2.5. And then, you are using a random forest with a large number of trees, e.g., 500 trees, for a dataset of 50 features. If the performance (in terms of RMSE) is something like 2.34, then this means that the random forest finds it difficult to extract more information than a simple linear model.

machine learning black box

How to Use This Process for Machine Learning Problems

The law of parsimony states that you want to use the simplest possible model that works well for a given problem. So, what you want to do, is you want to make sure that you are not using more complicated models than needed.

With this simple process I outlined, you can ensure that you do exactly that.

  1. Re-examine the quality of the data.
  2. Understand whether you can collect more data.
  3. Think of potential features which can be extracted from the dataset in order to further improve performance.

That being said, however, if you keep seeing that simple models have very similar performance to complicated ones, then you can be sure that simply adding even more complexity into the mix is unlikely to benefit you much. Good data with average algorithms will usually overperform bad data with excellent algorithms. So, if you are a beginner in data science, make sure to focus on approaching problems holistically instead of simply trying models until you find something that works.

Machine learning

Published at DZone with permission of Stylianos Kampakis. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Empowering Developers: Navigating the AI Revolution in Software Engineering
  • Exploring the Frontiers of AI: The Emergence of LLM-4 Architectures
  • Organizing Knowledge With Knowledge Graphs: Industry Trends
  • Weka Makes Life Simpler for Developers, Engineers, and Architects

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: