Pandas Pivot: A Guide with Examples
Table of Contents
- Why to pivot your data
- How to use the Pandas pivot method
- When to use pivot vs pivot_table in Pandas
- How to use the Pandas pivot_table method
- Conclusion
Python’s Pandas library is one of the most popular tools in the data scientist’s toolbelt. Data scientists use Pandas to explore, clean, and understand datasets.
In a previous article, we saw how to work with the Pandas DataFrame, the core data structure in the Pandas library. If you’re new to Pandas, that post is a great way to get started.
In this post, we’ll learn how to reshape your DataFrames by using the Pivot method. Pivot allows you to twist your data into a different shape for easier analysis.
We’ll cover:
- Why to pivot your data;
- How to use the
pivot
method; - When to use
pivot
method vspivot_table
method; - How to use the
pivot_table
method.
Let’s get started.
Why to pivot
your data
Before we get into details how to pivot, it’s important to know why you want to pivot.
Pivoting your data allows you to reshape it in a way that makes it easier to understand or analyze. Often you’ll use a pivot to demonstrate the relationship between two columns that can be difficult to reason about before the pivot.
If you’re an Excel wizard who has spent a lot of time in spreadsheets, the idea of a pivot may be easy for you. If not, it can be hard to understand without an example. Let’s see how it works.
In the example below, I’ll create a Pandas DataFrame from some stock trading data that I’ve used in the previous Pandas articles.
>>> import pandas as pd
>>> stocks = pd.read_csv('https://gist.githubusercontent.com/alexdebrie/b3f40efc3dd7664df5a20f5eee85e854/raw/ee3e6feccba2464cbbc2e185fb17961c53d2a7f5/stocks.csv')
>>> stocks
date symbol open high low close volume
0 2019-03-01 AMZN 1655.13 1674.26 1651.00 1671.73 4974877
1 2019-03-04 AMZN 1685.00 1709.43 1674.36 1696.17 6167358
2 2019-03-05 AMZN 1702.95 1707.80 1689.01 1692.43 3681522
3 2019-03-06 AMZN 1695.97 1697.75 1668.28 1668.95 3996001
4 2019-03-07 AMZN 1667.37 1669.75 1620.51 1625.95 4957017
5 2019-03-01 AAPL 174.28 175.15 172.89 174.97 25886167
6 2019-03-04 AAPL 175.69 177.75 173.97 175.85 27436203
7 2019-03-05 AAPL 175.94 176.00 174.54 175.53 19737419
8 2019-03-06 AAPL 174.67 175.49 173.94 174.52 20810384
9 2019-03-07 AAPL 173.87 174.44 172.02 172.50 24796374
10 2019-03-01 GOOG 1124.90 1142.97 1124.75 1140.99 1450316
11 2019-03-04 GOOG 1146.99 1158.28 1130.69 1147.80 1446047
12 2019-03-05 GOOG 1150.06 1169.61 1146.19 1162.03 1443174
13 2019-03-06 GOOG 1162.49 1167.57 1155.49 1157.86 1099289
14 2019-03-07 GOOG 1155.72 1156.76 1134.91 1143.30 1166559
When we print out the code, you can see that the data has a number of columns and that the rows are organized by trading date and stock symbol.
That organization may be helpful for some analysis, but it can be hard to glean information about trading volume across dates and stock symbols. Let’s reshape our data to look closer at volume.
>>> stocks.pivot(index='symbol', columns='date', values='volume')
date 2019-03-01 2019-03-04 2019-03-05 2019-03-06 2019-03-07
symbol
AAPL 25886167 27436203 19737419 20810384 24796374
AMZN 4974877 6167358 3681522 3996001 4957017
GOOG 1450316 1446047 1443174 1099289 1166559
In the example above, I use the pivot method to reshape the data so that the rows are indexed by stock symbol and the columns are trading dates. The value in each cell is the volume on that day. This makes it easy to compare the volume for a stock over time, by reading horizontally, or to compare volume across stocks on a particular day, by reading vertically.
This reshaping power of pivot makes it much easier to understand relationships in your datasets.
How to use the Pandas pivot method
To use the pivot method in Pandas, you need to specify three parameters:
- Index: Which column should be used to identify and order your rows vertically
- Columns: Which column should be used to create the new columns in our reshaped DataFrame. Each unique value in the column stated here will create a column in our new DataFrame.
- Values: Which column(s) should be used to fill the values in the cells of our DataFrame.
Let’s walk through these in an example with our stock trading data. In the example below, I use pivot to examine the closing trading price for each stock symbol over our trading window.
>>> stocks
date symbol open high low close volume
0 2019-03-01 AMZN 1655.13 1674.26 1651.00 1671.73 4974877
1 2019-03-04 AMZN 1685.00 1709.43 1674.36 1696.17 6167358
2 2019-03-05 AMZN 1702.95 1707.80 1689.01 1692.43 3681522
3 2019-03-06 AMZN 1695.97 1697.75 1668.28 1668.95 3996001
4 2019-03-07 AMZN 1667.37 1669.75 1620.51 1625.95 4957017
5 2019-03-01 AAPL 174.28 175.15 172.89 174.97 25886167
6 2019-03-04 AAPL 175.69 177.75 173.97 175.85 27436203
7 2019-03-05 AAPL 175.94 176.00 174.54 175.53 19737419
8 2019-03-06 AAPL 174.67 175.49 173.94 174.52 20810384
9 2019-03-07 AAPL 173.87 174.44 172.02 172.50 24796374
10 2019-03-01 GOOG 1124.90 1142.97 1124.75 1140.99 1450316
11 2019-03-04 GOOG 1146.99 1158.28 1130.69 1147.80 1446047
12 2019-03-05 GOOG 1150.06 1169.61 1146.19 1162.03 1443174
13 2019-03-06 GOOG 1162.49 1167.57 1155.49 1157.86 1099289
14 2019-03-07 GOOG 1155.72 1156.76 1134.91 1143.30 1166559
>>> stocks.pivot(index='symbol', columns='date', values='close')
date 2019-03-01 2019-03-04 2019-03-05 2019-03-06 2019-03-07
symbol
AAPL 174.97 175.85 175.53 174.52 172.50
AMZN 1671.73 1696.17 1692.43 1668.95 1625.95
GOOG 1140.99 1147.80 1162.03 1157.86 1143.30
First, I printed out our DataFrame to see how it is organized by default. Then, I use the pivot method to change the data.
Notice that for the index parameter, I used symbol
. As a result, each unique value for the symbol column — AAPL, AMZN, GOOG — is used as the index, the leftmost column in our DataFrame.
I used date
for the column parameter. This resulted in five non-index columns across the top of our DataFrame, one for each unique value in the date
column of our initial DataFrame.
Finally, I used close
as the values parameter. This specifies which value should be placed in each column. I’m interested in the closing price for each stock across the trading days, so I use the close
column.
Note that the index
and column
parameters are interchangeable. If you want to reorganize so that the dates are used as the index and the stock symbols are my columns, you can just flip the parameters:
>>> stocks.pivot(index='date', columns='symbol', values='close')
symbol AAPL AMZN GOOG
date
2019-03-01 174.97 1671.73 1140.99
2019-03-04 175.85 1696.17 1147.80
2019-03-05 175.53 1692.43 1162.03
2019-03-06 174.52 1668.95 1157.86
2019-03-07 172.50 1625.95 1143.30
When to use pivot
vs pivot_table
in Pandas
So far we’ve only been using the term ‘pivot’ broadly, but there are actually two Pandas methods for pivoting. The first is the pivot
method, which we reviewed in this section. The second is the pivot_table
method, which we’ll learn about in the next section.
When deciding between using the pivot
or pivot_table
method, you need to ask yourself one question:
Will the results of my pivot have more than one entry in any index + column?
If the answer to this question is “yes”, you must use the pivot_table
method. If the answer to this question is “no”, you may use the pivot
method.
Note that any use of pivot can be switched to pivot_table
, but the reverse is not true. If you try to use the pivot method where there would be more than one entry in any index + column combination, it will throw a ValueError
.
In the next section, we’ll take a look at how the pivot_table
method works in practice.
How to use the Pandas pivot_table
method
For those familiar with Excel or other spreadsheet tools, the pivot table is more familiar as an aggregation tool. Pandas pivot tables are used to group similar columns to find totals, averages, or other aggregations.
You can accomplish this same functionality in Pandas with the pivot_table
method. For example, imagine we wanted to find the mean trading volume for each stock symbol in our DataFrame. You could do so with the following use of pivot_table
:
>>> stocks.pivot_table(index='symbol', values='volume')
volume
symbol
AAPL 23733309.4
AMZN 4755355.0
GOOG 1321077.0
Like with pivot, we specify the index we want to to use for our data as well as the column to use to fill in the values.
Notice that each stock symbol in our index will have five values for the volume column as there are five trading days for each stock. The pivot_table
method aggregates these values and applies an aggregate function to reduce it to a single value.
By default, the aggregate function is the mean function from NumPy, but you can pass in a custom aggregation function. If I want to combine my values into a total, I could use NumPy’s sum function:
>>> import numpy as np
>>> stocks.pivot_table(index='symbol', values='volume', aggfunc=np.sum)
volume
symbol
AAPL 118666547
AMZN 23776775
GOOG 6605385
The pivot table method is really powerful when using it with a MultiIndex, which allows you to have hierarchies in your index. For example, imagine you had a larger stock trading dataset that included training data over an entire year. You could use a MultiIndex to create a pivot table where values were grouped by stock symbol and month, allowing you to quickly explore how trading volume and other statistics changed on a month-over-month basis for particular stocks.
Conclusion
In this post, we learned about pivoting your DataFrames in Pandas with the pivot and pivot_table functions. We saw why you would want to pivot your data as well as walkthroughs of using both pivot
and pivot_table
.
Just like Pandas makes it easy to work with data, the Kite plugin for your IDE makes it easy to work with Python. Kite gives you an AI-powered autocomplete in the editor, which saves you keystrokes and helps you code faster on the fly. It also has inline documentation for popular libraries so you don’t have to break your flow. Download it today!