Pandas Pivot: A Guide with Examples

Alex DeBrie

June 29, 2019

Why to pivot your data
How to use the Pandas pivot method
When to use pivot vs pivot_table in Pandas
How to use the Pandas pivot_table method
Conclusion

Python’s Pandas library is one of the most popular tools in the data scientist’s toolbelt. Data scientists use Pandas to explore, clean, and understand datasets.

In a previous article, we saw how to work with the Pandas DataFrame, the core data structure in the Pandas library. If you’re new to Pandas, that post is a great way to get started.

In this post, we’ll learn how to reshape your DataFrames by using the Pivot method. Pivot allows you to twist your data into a different shape for easier analysis.

We’ll cover:

Why to pivot your data;
How to use the pivot method;
When to use pivot method vs pivot_table method;
How to use the pivot_table method.

Let’s get started.

Why to `pivot` your data

Before we get into details how to pivot, it’s important to know why you want to pivot.

Pivoting your data allows you to reshape it in a way that makes it easier to understand or analyze. Often you’ll use a pivot to demonstrate the relationship between two columns that can be difficult to reason about before the pivot.

If you’re an Excel wizard who has spent a lot of time in spreadsheets, the idea of a pivot may be easy for you. If not, it can be hard to understand without an example. Let’s see how it works.

In the example below, I’ll create a Pandas DataFrame from some stock trading data that I’ve used in the previous Pandas articles.

>>> import pandas as pd

>>> stocks = pd.read_csv('https://gist.githubusercontent.com/alexdebrie/b3f40efc3dd7664df5a20f5eee85e854/raw/ee3e6feccba2464cbbc2e185fb17961c53d2a7f5/stocks.csv')

>>> stocks

          date symbol     open     high      low    close    volume

0   2019-03-01   AMZN  1655.13  1674.26  1651.00  1671.73   4974877

1   2019-03-04   AMZN  1685.00  1709.43  1674.36  1696.17   6167358

2   2019-03-05   AMZN  1702.95  1707.80  1689.01  1692.43   3681522

3   2019-03-06   AMZN  1695.97  1697.75  1668.28  1668.95   3996001

4   2019-03-07   AMZN  1667.37  1669.75  1620.51  1625.95   4957017

5   2019-03-01   AAPL   174.28   175.15   172.89   174.97  25886167

6   2019-03-04   AAPL   175.69   177.75   173.97   175.85  27436203

7   2019-03-05   AAPL   175.94   176.00   174.54   175.53  19737419

8   2019-03-06   AAPL   174.67   175.49   173.94   174.52  20810384

9   2019-03-07   AAPL   173.87   174.44   172.02   172.50  24796374

10  2019-03-01   GOOG  1124.90  1142.97  1124.75  1140.99   1450316

11  2019-03-04   GOOG  1146.99  1158.28  1130.69  1147.80   1446047

12  2019-03-05   GOOG  1150.06  1169.61  1146.19  1162.03   1443174

13  2019-03-06   GOOG  1162.49  1167.57  1155.49  1157.86   1099289

14  2019-03-07   GOOG  1155.72  1156.76  1134.91  1143.30   1166559

When we print out the code, you can see that the data has a number of columns and that the rows are organized by trading date and stock symbol.

That organization may be helpful for some analysis, but it can be hard to glean information about trading volume across dates and stock symbols. Let’s reshape our data to look closer at volume.

>>> stocks.pivot(index='symbol', columns='date', values='volume')

date    2019-03-01  2019-03-04  2019-03-05  2019-03-06  2019-03-07

symbol

AAPL      25886167    27436203    19737419    20810384    24796374

AMZN       4974877     6167358     3681522     3996001     4957017

GOOG       1450316     1446047     1443174     1099289     1166559

In the example above, I use the pivot method to reshape the data so that the rows are indexed by stock symbol and the columns are trading dates. The value in each cell is the volume on that day. This makes it easy to compare the volume for a stock over time, by reading horizontally, or to compare volume across stocks on a particular day, by reading vertically.

This reshaping power of pivot makes it much easier to understand relationships in your datasets.

How to use the Pandas `pivot method`

To use the pivot method in Pandas, you need to specify three parameters:

Index: Which column should be used to identify and order your rows vertically
Columns: Which column should be used to create the new columns in our reshaped DataFrame. Each unique value in the column stated here will create a column in our new DataFrame.
Values: Which column(s) should be used to fill the values in the cells of our DataFrame.

Let’s walk through these in an example with our stock trading data. In the example below, I use pivot to examine the closing trading price for each stock symbol over our trading window.

>>> stocks

          date symbol     open     high      low    close    volume

0   2019-03-01   AMZN  1655.13  1674.26  1651.00  1671.73   4974877

1   2019-03-04   AMZN  1685.00  1709.43  1674.36  1696.17   6167358

2   2019-03-05   AMZN  1702.95  1707.80  1689.01  1692.43   3681522

3   2019-03-06   AMZN  1695.97  1697.75  1668.28  1668.95   3996001

4   2019-03-07   AMZN  1667.37  1669.75  1620.51  1625.95   4957017

5   2019-03-01   AAPL   174.28   175.15   172.89   174.97  25886167

6   2019-03-04   AAPL   175.69   177.75   173.97   175.85  27436203

7   2019-03-05   AAPL   175.94   176.00   174.54   175.53  19737419

8   2019-03-06   AAPL   174.67   175.49   173.94   174.52  20810384

9   2019-03-07   AAPL   173.87   174.44   172.02   172.50  24796374

10  2019-03-01   GOOG  1124.90  1142.97  1124.75  1140.99   1450316

11  2019-03-04   GOOG  1146.99  1158.28  1130.69  1147.80   1446047

12  2019-03-05   GOOG  1150.06  1169.61  1146.19  1162.03   1443174

13  2019-03-06   GOOG  1162.49  1167.57  1155.49  1157.86   1099289

14  2019-03-07   GOOG  1155.72  1156.76  1134.91  1143.30   1166559

>>> stocks.pivot(index='symbol', columns='date', values='close')

date    2019-03-01  2019-03-04  2019-03-05  2019-03-06  2019-03-07

symbol

AAPL        174.97      175.85      175.53      174.52      172.50

AMZN       1671.73     1696.17     1692.43     1668.95     1625.95

GOOG       1140.99     1147.80     1162.03     1157.86     1143.30

First, I printed out our DataFrame to see how it is organized by default. Then, I use the pivot method to change the data.

Notice that for the index parameter, I used symbol. As a result, each unique value for the symbol column — AAPL, AMZN, GOOG — is used as the index, the leftmost column in our DataFrame.

Want to Code Faster?

Kite is a plugin for PyCharm, Atom, Vim, VSCode, Sublime Text, and IntelliJ that uses machine learning to provide you with code completions in real time sorted by relevance. Start coding faster today.

Download Kite Free

I used date for the column parameter. This resulted in five non-index columns across the top of our DataFrame, one for each unique value in the date column of our initial DataFrame.

Finally, I used close as the values parameter. This specifies which value should be placed in each column. I’m interested in the closing price for each stock across the trading days, so I use the close column.

Note that the index and column parameters are interchangeable. If you want to reorganize so that the dates are used as the index and the stock symbols are my columns, you can just flip the parameters:

>>> stocks.pivot(index='date', columns='symbol', values='close')

symbol        AAPL     AMZN     GOOG

date

2019-03-01  174.97  1671.73  1140.99

2019-03-04  175.85  1696.17  1147.80

2019-03-05  175.53  1692.43  1162.03

2019-03-06  174.52  1668.95  1157.86

2019-03-07  172.50  1625.95  1143.30

When to use `pivot` vs `pivot_table` in Pandas

So far we’ve only been using the term ‘pivot’ broadly, but there are actually two Pandas methods for pivoting. The first is the pivot method, which we reviewed in this section. The second is the pivot_table method, which we’ll learn about in the next section.

When deciding between using the pivot or pivot_table method, you need to ask yourself one question:

Will the results of my pivot have more than one entry in any index + column?

If the answer to this question is “yes”, you must use the pivot_table method. If the answer to this question is “no”, you may use the pivot method.

Note that any use of pivot can be switched to pivot_table, but the reverse is not true. If you try to use the pivot method where there would be more than one entry in any index + column combination, it will throw a ValueError.

In the next section, we’ll take a look at how the pivot_table method works in practice.

How to use the Pandas `pivot_table` method

For those familiar with Excel or other spreadsheet tools, the pivot table is more familiar as an aggregation tool. Pandas pivot tables are used to group similar columns to find totals, averages, or other aggregations.

You can accomplish this same functionality in Pandas with the pivot_table method. For example, imagine we wanted to find the mean trading volume for each stock symbol in our DataFrame. You could do so with the following use of pivot_table:

>>> stocks.pivot_table(index='symbol', values='volume')

            volume

symbol

AAPL    23733309.4

AMZN     4755355.0

GOOG     1321077.0

Like with pivot, we specify the index we want to to use for our data as well as the column to use to fill in the values.

Notice that each stock symbol in our index will have five values for the volume column as there are five trading days for each stock. The pivot_table method aggregates these values and applies an aggregate function to reduce it to a single value.

By default, the aggregate function is the mean function from NumPy, but you can pass in a custom aggregation function. If I want to combine my values into a total, I could use NumPy’s sum function:

>>> import numpy as np

>>> stocks.pivot_table(index='symbol', values='volume', aggfunc=np.sum)

           volume

symbol

AAPL    118666547

AMZN     23776775

GOOG      6605385

The pivot table method is really powerful when using it with a MultiIndex, which allows you to have hierarchies in your index. For example, imagine you had a larger stock trading dataset that included training data over an entire year. You could use a MultiIndex to create a pivot table where values were grouped by stock symbol and month, allowing you to quickly explore how trading volume and other statistics changed on a month-over-month basis for particular stocks.

Want to Code Faster?

Download Kite Free

Conclusion

In this post, we learned about pivoting your DataFrames in Pandas with the pivot and pivot_table functions. We saw why you would want to pivot your data as well as walkthroughs of using both pivot and pivot_table.

Just like Pandas makes it easy to work with data, the Kite plugin for your IDE makes it easy to work with Python. Kite gives you an AI-powered autocomplete in the editor, which saves you keystrokes and helps you code faster on the fly. It also has inline documentation for popular libraries so you don’t have to break your flow. Download it today!

View Article Comments

Pandas Pivot: A Guide with Examples

Table of Contents

Why to `pivot` your data

How to use the Pandas `pivot method`

When to use `pivot` vs `pivot_table` in Pandas

How to use the Pandas `pivot_table` method

Conclusion

Company

Product

Resources

Stay in touch

Pandas Pivot: A Guide with Examples

Table of Contents

Why to pivot your data

How to use the Pandas pivot method

When to use pivot vs pivot_table in Pandas

How to use the Pandas pivot_table method

Conclusion

Company

Product

Resources

Stay in touch

Why to `pivot` your data

How to use the Pandas `pivot method`

When to use `pivot` vs `pivot_table` in Pandas

How to use the Pandas `pivot_table` method