• Home
  • AI News
  • Bookmarks
  • Contact US
Reading: How to Use Pandas Melt – pd.melt() for AI and Machine Learning
Share
Notification
Aa
  • Inspiration
  • Thinking
  • Learning
  • Attitude
  • Creative Insight
  • Innovation
Search
  • Home
  • Categories
    • Creative Insight
    • Thinking
    • Innovation
    • Inspiration
    • Learning
  • Bookmarks
    • My Bookmarks
  • More Foxiz
    • Blog Index
    • Sitemap
Have an existing account? Sign In
Follow US
© Foxiz News Network. Ruby Design Company. All Rights Reserved.
> Blog > AI News > How to Use Pandas Melt – pd.melt() for AI and Machine Learning
AI News

How to Use Pandas Melt – pd.melt() for AI and Machine Learning

admin
Last updated: 2022/09/19 at 1:31 PM
admin
Share
11 Min Read

What is Pandas Melt?

Pandas Melt is currently the most efficient and flexible function that is used to reshape Pandas’ data frames. It reshapes the data frames from a wide format to a long format, which makes it more useful in the field of data science. A wide format contains values that do not repeat in the first column. A long format contains values that do repeat in the first column. In code format it can be called using “pd.melt ()”.

Contents
What is Pandas Melt?Table of contentsLong data frame vs Wide data frameWide Data Frame:Long Data Frame:Reversing Pandas Melt ConclusionReferencesShare this:

There are seven parameters that can be used in the parentheses part of the code. These are df, id_vars, value_vars, var_name, value_name, col_level, and ignore_index. The only parameter that is required is “df” which is used to choose the data frame that you want to perform operations on. Id_vars is used to name the columns to use as identifier parameters. Value_vars is used to name the columns that will be melted. Var_name is used to name the variable column in the output. Value_name is used to name the value column in the output. Col_level is used when you need multi indexed columns. Finally, ignore_index is used to ignore or retain the original index.

This can be set to true or false. All of these parameters can be used at once and the code would look like “pd.melt (df, id_vars = None, value_vars = None, var_name = ‘variable’, value_name = ‘value’, col_level = None, ignore_index = True)”.

Table of contents

Also Read: Pandas and Large DataFrames: How to Read in Chunks

- Advertisement -
Ad imageAd image

Long data frame vs Wide data frame

We talked about long data frames vs wide data frames above, but it is easier to understand the concept when you can see it visually. Keep in mind that wide data frames will have many columns which can become difficult to manage. Meanwhile, a long data frame will make it easier to perform machine learning on the data. Below is an example of how a wide data frame may look:

Wide Data Frame:

Person   Age   Weight   Height
——–   —–    ——–     ——–
Bob       32      168       180
Alice      24      150       175
Steve     64      144       165

In this example we have four columns. By using the melt function, we can transform this data efficiently into a long data frame as shown below:

Long Data Frame:

Person   Variable   Value
——–    ———-     ——-
Bob      Age           32
Bob      Weight      168
Bob      Height       180
Alice    Age            24
Alice    Weight       150
Alice    Height        175
Steve    Age           64
Steve    Weight      144
Steve    Height       165

Now the columns have shrunk from four to three. Now let us look at how to change a wide data frame into a long data frame using Python code. First, we need to create a wide data frame. The code to do this is shown below:

# Creating sample data

import pandas as pd# creating a dataframedf = pd.DataFrame(
{ 'Item': ['Cereals', 'Dairy', 'Frozen', 'Meat'], 'Price': [100, 50, 200, 250], 'Hour_1': [5, 5, 3, 8], 'Hour_2': [8, 8, 2, 1], 'Hour_3': [7, 7, 8, 2]
}
)
print(df)

A table was created with the item’s cereal, dairy, frozen, and meat. There are five columns named items, price, hour 1, hour 2, and hour 3. This is easy to read for humans, but harder for a machine. Because of that we need to do some reshaping and change it into a long data frame. Below is an example of how the data frame would look:

Item       Price     Hour_1   Hour_2   Hour_3
——–    ——-    ——–     ——–      ——–
Cereals   100       5             8              7
Dairy        50        5             8              7
Frozen     200      3             2              8
Meat        250       8            1              2

Now let’s use Python to reshape this data frame into a long format. We will have one column containing item, one column containing hour, and one column containing sales. Below is the code on how to do that:

melt_df = pd.melt(
df,
id_vars=['Item'],
value_vars=('Hour_1','Hour_2','Hour_3'),
var_name='Hour',
value_name='Sales',
col_level=None
)melt_df

The output of this code can be seen below:

Item         Hour        Sales
———    ——–      ——-
Cereals  Hour_1       5
Dairy     Hour_1       5
Frozen   Hour_1       3
Meat      Hour_1       8
Cereals   Hour_2      8
Dairy     Hour_2        8
Frozen    Hour_2      2
Meat      Hour_2       1
Cereals   Hour_3      7
Dairy     Hour_3        7
Frozen    Hour_3      8
Meat      Hour_3       2

Now the data shrunk from five columns to three columns, which allows for easier application of machine learning on the data. For example, we can group the data by items and sales using the “group by” function. Group by is a Pandas function that allows the user to group rows according to defined values in each column. This would get us the total sales. This can easily be done in one line of code by simply typing “melt_df.groupby (`Item`) [`Sales`].sum()”. The output of this code is shown below:

Item          Sales
———    ——-
Cereals      20
Dairy          20
Frozen       13
Meat          11

This tells us how many of each item was sold. We can also group by hours to see how many sales occurred per hour. The code for this is “melt_df.groupby(`Hour`) [`Sales].sum()”. The output for this can be seen below:

Hour            Sales
——–        ——-
Hour_1      21
Hour_2      19
Hour_3      24

As you can start to see, having data in long form makes it much easier to work with. The data frame can also be updated using Pandas Melt easily. Let us try adding a new column in called price. Below is the code needed to accomplish this:

melt_df = pd.melt(
df,
id_vars=['Item','Price'],
value_vars=('Hour_1','Hour_2','Hour_3'),
var_name='Hour',
value_name='Sales',
col_level=None
)melt_df

With this now our long format data frame looks like this:

Item          Price          Hour            Sales
———    ——-     ——–           ——-
Cereals     100   Hour_1                 5
Dairy        50      Hour_1                5
Frozen      200   Hour_1               3
Meat        250    Hour_1               8
Cereals     100   Hour_2               8
Dairy        50      Hour_2              8
Frozen      200   Hour_2             2
Meat        250    Hour_2              1
Cereals     100   Hour_3              7
Dairy        50      Hour_3             7
Frozen      200   Hour_3            8
Meat        250    Hour_3             2

As you can see the new column was seamlessly added into the long data frame with no issues. Now with a price column we can calculate things like total revenue or even revenue by item or by hour. These can all be done with the group by function and the code is very similar to what is shown above.

Also Read: What is Argmax in Machine Learning?

Reversing Pandas Melt 

The Pandas Melt function can also be reversed, which allows us to go from a long data frame back to a wide data frame. This can be done using the pivot function and will get back the original data frame. The documentation for the pivot function can be found at https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html. To reverse Pandas Melt, the index value of the pivot function must be the same as the ‘id_vars’ value on the data frame. The columns value must be passed as the name of the variable column. The code to do this can be seen below:

df_unmelted = melt_df.pivot_table(
index['Item','Price'],columns='Hour',values='Sales'
)df_unmelted

By doing this the data frame is now back to a wide format as seen below:

Item         Price     Hour_1      Hour_2     Hour_3
———     ——-   ——–       ——–         ——–
Cereals    100        5            8              7
Dairy         50         5            8              7
Frozen      200       3            2              8
Meat         250       8            1              2

Source: YouTube

Also Read: Artificial Intelligence and Otolaryngology.

Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and The Cloud

Conclusion

I hope this article has shown you the importance of Pandas Melt in the context of data science. Changing a wide data frame into a long one efficiently helps other machine learning algorithms function easier. Thank you for reading this article.    

The pd.melt() function in Pandas is an invaluable tool for data preprocessing, a crucial step in any AI or machine learning workflow. By transforming the dataset from a wide format to a long format, pd.melt() allows for more efficient data analysis and makes it easier for algorithms to interpret the data. This, in turn, improves the performance of AI and machine learning models, enabling them to make more accurate predictions.

pd.melt() provides a high level of flexibility, allowing data scientists to specify which columns to keep unchanged and which to unpivot. This granular control makes it possible to tailor the data transformation process to the specific requirements of each AI or machine learning project. Given its versatility and effectiveness, pd.melt() is a must-have tool in any data scientist’s toolkit, facilitating the development of robust and efficient AI and machine learning solutions.

References

Fenner, Mark. Machine Learning with Python for Everyone. Addison-Wesley Professional, 2019.

Molin, Stefanie, and Ken Jee. Hands-On Data Analysis with Pandas: A Python Data Science Handbook for Data Collection, Wrangling, Analysis, and Visualization. Packt Publishing Ltd, 2021.

Sarkar, Dipanjan, et al. Practical Machine Learning with Python: A Problem-Solver’s Guide to Building Real-World Intelligent Systems. Apress, 2017.

Share this:

admin September 19, 2022 September 19, 2022
Share this Article
Facebook Twitter Email Copy Link Print
Leave a comment Leave a comment

Schreibe einen Kommentar Antworten abbrechen

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

Follow US

Find US on Social Medias
Facebook Like
Twitter Follow
Youtube Subscribe
Telegram Follow
newsletter featurednewsletter featured

Subscribe Newsletter

Subscribe to our newsletter to get our newest articles instantly!

[mc4wp_form]

Popular News

5 Fascinating Ways VR Tech is Reshaping Education Worldwide
Oktober 28, 2022
An Alleged Deepfake of UK Opposition Leader Keir Starmer Shows the Dangers of Fake Audio
Oktober 9, 2023
Hannah Diamond Has Cracked the Code of Using AI for Music
Oktober 4, 2023
AI Could Usher in a New Era of Music. Will It Suck?
Juni 6, 2023

Quick Links

  • Home
  • AI News
  • My Bookmarks
  • Privacy Policy
  • Contact
Facebook Like
Twitter Follow

© All Rights Reserved.

Welcome Back!

Sign in to your account

Lost your password?