• Home
  • AI News
  • Bookmarks
  • Contact US
Reading: Multinomial Logistic Regression
Share
Notification
Aa
  • Inspiration
  • Thinking
  • Learning
  • Attitude
  • Creative Insight
  • Innovation
Search
  • Home
  • Categories
    • Creative Insight
    • Thinking
    • Innovation
    • Inspiration
    • Learning
  • Bookmarks
    • My Bookmarks
  • More Foxiz
    • Blog Index
    • Sitemap
Have an existing account? Sign In
Follow US
© Foxiz News Network. Ruby Design Company. All Rights Reserved.
> Blog > AI News > Multinomial Logistic Regression
AI News

Multinomial Logistic Regression

admin
Last updated: 2023/05/26 at 9:19 PM
admin
Share
33 Min Read

Introduction

AdvertisementsLogistic regression is one of the most commonly used statistical methods for solving classification problems and its aim is to estimate the probability of an event occurring based on the independent variables. It analyses the relationship between a binary dependent variable i.e only two categories and one or more independent variables. However, there may be situations where the dependent variable has three or more categories. In such situations multinomial logistic regression is used. The main difference between logistic regression and multinomial logistic regression is the number of categories in the dependent variable and similar to multiple linear regression, the multinomial logistic regression does predictive analysis.

Contents
IntroductionWhat is Multinomial Logistic Regression?Data PreparationModel SpecificationModel EstimationModel Evaluation ProcedureInterpretation of CoefficientsPredictions and InferenceModel ValidationReporting and InterpretationDependent VariableAssumptionsIndependence of observationsLinearity of the logitAbsence of multicollinearityNo outliersLarge sample sizeSetup in SPSS StatisticsTest Procedure in SPSS StatisticsGoodness-of-Fit tableModel Fitting Information TablePseudo R-Square TableLikelihood Ratio Tests Table:Parameter Estimates TableSolution ApproachesK models for K classesSimultaneous ModelsAdvantagesHandling multiple classesInterpretable coefficientsModel flexibilityProbabilistic predictionsEasy implementationAccount for co-variatesChallengesOverfittingMulticollinearityClass imbalanceNonlinearityCategorical predictor variablesModel complexityReal World ApplicationsNatural language processingHealthcareFinanceConclusionReferencesShare this:

What is Multinomial Logistic Regression?

Multinomial Logistic Regression is a statistical analysis model used when the dependent variable is categorical and has more than two outcomes. It extends binary logistic regression, which is applicable for dichotomous outcomes, allowing for multiple classes. It provides probabilities for each outcome category and determines the impact of predictor variables on these probabilities. This makes it useful for understanding and predicting the categorial outcome in complex data sets.

Multinomial logistic regression is a classification algorithm used to model and analyze relationships between a dependent categorical variable with more than two categories (i.e., a multinomial variable) and one or more independent variables or explanatory variables. The number of pairs of categories that can be formed depends on the total number of categories present in the dependent variable. In this type of regression, the dependent variable is categorical and should be either an ordinal variable or a nominal variable. Ordinal variable can be ordered or ranked. The independent variables can be a continuous variable, categorical variable, or a combination of both. The ordinal model, specifically ordinal logistic regression, is used to analyze outcomes with ordered categories.

When conducting a multinomial logistic regression, one category is chosen as the reference or baseline category, against which the other categories are compared. The reference category is typically selected based on theoretical or practical considerations, and the interpretation of the model’s coefficients is based on the comparisons with this reference category. The response categories represent the different possible outcomes or groups that the observations can be classified into. For example, if we are studying the factors influencing a person’s choice of transportation mode (car, bus, or bicycle), the response variable would have three categories: car, bus, and bicycle.

- Advertisement -
Ad imageAd image

Also, two or more independent variables can be multiplied to create additional independent variables called interaction variables. They capture the combined effect of the interacting variables on the dependent variable. This allows you to examine whether the relationship between the independent variables and the outcome categories differs across different levels of the interaction variable. For example, suppose you have a multinomial logistic regression model predicting the likelihood of different car types  (categories: sedan, SUV, and hatchback) based on variables such as age and income. If you include an interaction term between age and income, you can assess whether the effect of income on car type varies across different age groups.

It’s important to note that the interpretation of odds ratios in multinomial regression can be more complex than in binary logistic regression, as they involve comparing multiple outcome categories. Additionally, in multinomial regression, the concept of “adjusted odds ratios” is not as commonly used as it is in binary logistic regression, where it refers to accounting for the influence of other variables. For example, if the reference category chosen is “car,” the model estimates the probabilities of choosing the bus or bicycle relative to choosing the car. The estimated coefficients represent the log-odds ratios of each category compared to the reference category. These coefficients indicate the direction and magnitude of the association between the explanatory variables and the log-odds of belonging to a particular category relative to the reference category. It’s important to note that standard deviation is not directly used in multinomial logistic regression, as it is typically employed for continuous variables rather than categorical outcomes.

Source: YouTube

Test Procedure in SPSS Statistics

Step 1 – Click Analyze > Regression > Multinominal Logistic on the main menu. This will open a Multinomial Logistic Regression dialogue box

Step 2 – Transfer the dependent variable, car purchased, into the Dependent: box and the covariate variables, income, age and gender, into the Covariate(s): box

Step 3 – Click on the Statistics button. You will be presented with the Multinomial Logistic Regression: Statistics dialogue box

Step 4 – Click the Cell probabilities, Classification table and Goodness-of-fit checkboxes.

Step 5 – Click on the Continue button and you will be returned to the Multinomial Logistic Regression dialogue box.

Step 6 – Click on the OK button. This will generate the results.

The results generate the following tables:

Goodness-of-Fit table

It provides two measures, Pearson and Deviance, that can be used to assess how well the model fits the data. Pearson presents the Pearson chi-square statistic. Large chi-square values indicate a poor fit for the model. Deviance presents the Deviance chi-square statistic. These two measures of goodness-of-fit might not always give the same result.

Model Fitting Information Table

It provides an overall measure of your model. The “Final” row presents information on whether all the coefficients of the model are zero i.e., whether any of the coefficients are statistically significant.

Pseudo R-Square Table

It provides the Cox and Snell, Nagelkerke and McFadden pseudo R2 measures

Likelihood Ratio Tests Table:

It shows which of your independent variables are statistically significant. This table is mostly useful for nominal independent variables because it is the only table that considers the overall effect of a nominal variable.

Parameter Estimates Table

It presents the parameter estimates also known as the coefficients of the model

Solution Approaches

There are two solution approaches – K models for K classes and Simultaneous Models

K models for K classes

In Multinomial Logistic Regression, we use K-1 logistic regression models to predict K classes. Let’s assume that we have K classes labeled 1, 2, …, K. We fit K-1 binary logistic regression models, where each model compares the probability of being in class i with the probability of being in class K for i=1,2,…,K-1.

For example, suppose we have three classes labeled 1, 2, and 3. We would fit two logistic regression models: one to compare the probability of being in class 1 to the probability of being in class 3, and another to compare the probability of being in class 2 to the probability of being in class 3.

Then, for a new observation, we would apply all K-1 logistic regression models to calculate the predicted probabilities of belonging to classes 1, 2, …, K-1. The probability of belonging to class K is then obtained by subtracting the sum of the predicted probabilities for classes 1, 2, …, K-1 from 1.

Simultaneous Models

Simultaneous models in Multinomial Logistic Regression refer to fitting all K classes into one model with a single set of parameters. In contrast to fitting K-1 binary logistic regression models, simultaneous models estimate all K class probabilities at once using a softmax function. Softmax function maps a K-dimensional vector of real numbers to a K-dimensional vector of probabilities that sum up to one. Specifically, for an observation with predictor variables X, the K-class probabilities are computed as follows:

P(Y=i|X) = e^(βi0 + βi1X1 + βi2X2 + … + βipXp) / (e^(β10 + β11X1 + β12X2 + … + β1pXp) + e^(β20 + β21X1 + β22X2 + … + β2pXp) + … + e^(βK-10 + βK-11X1 + βK-12X2 + … + βK-1pXp))

where

βij – jth coefficient estimate for the ith class
βi0 – intercept for the ith class
Xj – value of the jth predictor variable for the observation
Y – random variable representing the class label

The simultaneous model estimates a single set of parameters, which simplifies the interpretation of the model and makes it more computationally efficient. However, it assumes that the relationships between the predictor variables and the class probabilities are the same for all K classes. This may not be a reasonable assumption in some cases, which is why K-1 binary logistic regression models are often used instead.

Advantages

Multinomial logistic regression has several advantages, including:

Handling multiple classes

Multinomial logistic regression can handle situations where the response variable has more than two categories, making it a useful tool for multi-class classification problems.

Interpretable coefficients

Multinomial logistic regression produces coefficients that are interpretable and can be used to understand the relationship between the predictor variables and the class probabilities.

Model flexibility

Multinomial logistic regression can handle both linear and nonlinear relationships between the predictor variables and the class probabilities, allowing for a more flexible model than some other classification methods.

Probabilistic predictions

Multinomial logistic regression produces probabilistic predictions, which can be useful in situations where it is important to know the likelihood of an observation belonging to each class.

Easy implementation

Multinomial logistic regression is widely available in statistical software packages and is relatively easy to implement, making it a practical choice for many applications.

Account for co-variates

Multinomial logistic regression can account for the effect of covariates on the class probabilities, which can be useful in situations where the relationship between the predictor variables and the response variable may differ across different groups of observations.

Challenges

Multinomial logistic regression, like any statistical method, has several challenges that can affect its performance and interpretation. Some of these challenges include:

Overfitting

Multinomial logistic regression models can be prone to overfitting, especially when the number of predictor variables is large relative to the number of observations. Overfitting can result in a model that performs well on the training data but poorly on new data.

Multicollinearity

Multinomial logistic regression assumes that the predictor variables are independent, but in practice, there may be correlations or multicollinearity among the predictor variables. This can lead to unstable and unreliable coefficient estimates.

Class imbalance

When the number of observations in each class is not balanced, the model may be biased towards the majority class, leading to poor performance for the minority classes.

Nonlinearity

Multinomial logistic regression assumes a linear relationship between the predictor variables and the log odds of each class, but in practice, this assumption may not hold, leading to poor model performance.

Categorical predictor variables

Multinomial logistic regression requires that all predictor variables are continuous, but in practice, there may be categorical predictor variables. Categorical predictor variables can be converted into dummy variables, but this can lead to issues with multicollinearity and overfitting.

Model complexity

Multinomial logit models can become complex when there are many predictor variables, interactions, and higher-order terms. This can make the model difficult to interpret and prone to overfitting.

Real World Applications

Real-world predictive models play a crucial role in various domains and multinomial logistic regression has a wide range of real-world applications, particularly in fields where categorical outcomes with more than two classes are common. Some examples include:

Natural language processing

Multinomial LR classifiers can be used in text classification problems where the outcome variable has multiple categories, such as sentiment analysis of customer reviews.The learning process can be slower compared to Naive Bayes classifier as the maximum entropy classifier requires the iterative learning of weights.

Healthcare

Multinomial logistic regression can be used to predict the probability of a patient belonging to different disease severity classes based on their symptoms and medical history.

Finance

Multinomial LR classifiers can be used to predict the probability of a company being assigned different credit ratings based on their financial statements and market performance.

Conclusion

The multinomial regression model has proven to be a valuable tool for analyzing categorical outcomes with more than two unordered categories. By estimating the coefficients associated with the predictor variables, this model allows us to understand the relationship between the predictors and the probabilities of each outcome category, while considering a reference category. The coefficients provide insights into the direction and magnitude of the effects, indicating how changes in the predictors influence the likelihood of belonging to different categories. Additionally, the model’s ability to handle multiple outcome categories simultaneously enhances its versatility in various research fields. However, interpretation and understanding of the coefficients should be done with caution, considering the reference category and the specific context of the study. Overall, multinomial logistic regression provides a robust framework for investigating complex categorical outcomes and contributes to our understanding of the factors influencing categorical responses.

Logistic and Multinomial Regressions by Example: Hands on approach using R

References

Multinominal Logistic Regression | Stata Data Analysis Examples https://stats.oarc.ucla.edu/stata/dae/multinomiallogistic-regression/. Accessed 14 May. 2023.

Multinomial logistic regression https://en.wikipedia.org/wiki/Multinomial_logistic_regression. Accessed 14 May. 2023.

Multinomial Logistic Regression By Great Learning Team Updated on Sep 9, 2022 https://www.mygreatlearning.com/blog/multinomial-logistic-regression/. Accessed 15 May. 2023.

Multinomial Logistic Regression using SPSS Statistics https://statistics.laerd.com/spss-tutorials/multinomial-logistic-regression-using-spss-statistics.php. Accessed 15 May. 2023.

Share this:

Advertisements

admin Mai 26, 2023 Mai 26, 2023
Share this Article
Facebook Twitter Email Copy Link Print
Leave a comment Leave a comment

Schreibe einen Kommentar Antworten abbrechen

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

Follow US

Find US on Social Medias
Facebook Like
Twitter Follow
Youtube Subscribe
Telegram Follow
newsletter featurednewsletter featured

Subscribe Newsletter

Subscribe to our newsletter to get our newest articles instantly!

[mc4wp_form]

Popular News

Averaged One-Dependence (AODE) Algorithm and its Use in Machine Learning
Oktober 2, 2022
Putting a Real Face on Deepfake Porn
Oktober 20, 2023
How Humanity Can Avoid an AI Takeover
Mai 17, 2023
Met Gala Deepfakes Are Flooding Social Media
Mai 7, 2024

Quick Links

  • Home
  • AI News
  • My Bookmarks
  • Privacy Policy
  • Contact
Facebook Like
Twitter Follow

© All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?