- Machine Learning with Python
- What is Machine Learning?
- Data Preprocessing in Data Science and Machine Learning
- Feature Selection in Machine Learning
- Train-Test Datasets in Machine Learning
- Evaluate Model Performance - Loss Function
- Model Selection in Machine Learning
- Bias Variance Trade Off
- Supervised Learning Models
- Multiple Linear Regression
- Logistic Regression
- Logistic Regression in Python using scikit-learn Package
- Decision Trees in Machine Learning
- Random Forest Algorithm in Python
- Support Vector Machine Algorithm Explained
- Multivariate Linear Regression in Python with scikit-learn Library
- Classifier Model in Machine Learning Using Python
- Cross Validation to Avoid Overfitting in Machine Learning
- K-Fold Cross Validation Example Using Python scikit-learn
- Unsupervised Learning Models
- K-Means Algorithm Python Example
- Neural Networks Overview

# Model Selection in Machine Learning

Model selection refers to choose the best statistical machine learning model for a particular problem. For this task we need to compare the relative performance between models. Therefore the ** loss function** and the metric that represent it, becomes fundamental for selecting the right and non-overfitted model.

We can state a machine learning supervised problem with the following equation:

This equation is composed with the ** x** matrix that contains the predictor’s factors

**. These factors can be the lagged prices/returns of a time series or some others factors such as volume, foreign exchange rates, etc.**

*x1,x2,x3,…xn*

*y i**s the response vector that depend of the function*

*f**and the predictors*

*x**.*

** f** contain the underlying relationship between the

**x**features and the

**y**response and can be modeled with a

**if the underlying relationship is linear or with a**

*linear regression***or**

*Random Forest***algorithm if the underlying relationship is non-linear.**

*Support Vector Machine*Ε represent the error term, which is often assumed to have mean zero and a standard deviation of one.

Once we fit a particular model for a certain dataset, we need to define the ** loss function** that we will use to assess model performance. Many measures can be used for the

**. Some common measures for the**

*loss function***are the**

*loss function***and the**

*Absolute Error***between predicted values and real values.**

*Squared Error*Both choices are non-negative, so the best value for the loss function is zero. The ** Absolute Error** and

**above, compute the difference between the true value (y) and the prediction (y) for each observation of the dataset.**

*Squared Error*Both the ** Absolute Error** and

**are vectors or arrays of**

*Squared Error***dimension, reflecting the error term per each of the observations. In order to aggregate the error term of a certain model between all the predicted and real values of a variable, a popular measure is the**

*n x 1***which is simply the average of the squared loss:**

*Mean Squared Error*Where ** n** is the number of observations

Unlock full access to Finance Train and see the entire library of member-only content and resources.