樞紐分析表¶
- .pivot(): 選擇index及column以及表格內欲填入值的欄位,做快速的交叉比對。
- .pivot_table(): 針對指定的index跟column做交叉比對,可以針對內容應用聚合函式做運算。
- .crosstab(): 針對指定index及column做交叉比對,主要是計算頻率或次數,也可以應用聚合函式做運算。
.pivot()做交叉比對時,欄列的值不能有重複,不然會出錯。.pivot_table()對於重複的值,可以使用指定函式做聚合,如果沒有指定函式預設會使用numpy.mean。crosstab()對於重複值也可以使用指定函式做聚合,如果沒有指定函式,則預設會計算次數。
In [1]:
Copied!
import numpy as np
import pandas as pd
import numpy as np
import pandas as pd
pivot()¶
In [2]:
Copied!
np.random.seed(987)
rows = np.random.choice(['row1', 'row2', 'row3', 'row4', 'row5'], 10)
cols = np.random.choice(['col1', 'col2', 'col3', 'col4', 'col5'], 10)
values = np.random.randint(1, 11, 10)
np.random.seed(987)
rows = np.random.choice(['row1', 'row2', 'row3', 'row4', 'row5'], 10)
cols = np.random.choice(['col1', 'col2', 'col3', 'col4', 'col5'], 10)
values = np.random.randint(1, 11, 10)
In [3]:
Copied!
df = pd.DataFrame({'rows':rows, 'cols':cols, 'values':values})
df = pd.DataFrame({'rows':rows, 'cols':cols, 'values':values})
In [4]:
Copied!
df
df
Out[4]:
rows | cols | values | |
---|---|---|---|
0 | row4 | col4 | 9 |
1 | row2 | col2 | 7 |
2 | row4 | col4 | 10 |
3 | row3 | col3 | 4 |
4 | row3 | col1 | 3 |
5 | row4 | col3 | 8 |
6 | row3 | col2 | 3 |
7 | row5 | col2 | 3 |
8 | row3 | col3 | 9 |
9 | row5 | col2 | 4 |
In [5]:
Copied!
df.pivot(index='rows', columns='cols', values='values')
df.pivot(index='rows', columns='cols', values='values')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-5-1a39f561f155> in <cell line: 1>() ----> 1 df.pivot(index='rows', columns='cols', values='values') /usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(*args, **kwargs) 332 333 # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no /usr/local/lib/python3.10/dist-packages/pandas/core/frame.py in pivot(self, index, columns, values) 8565 from pandas.core.reshape.pivot import pivot 8566 -> 8567 return pivot(self, index=index, columns=columns, values=values) 8568 8569 _shared_docs[ /usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(*args, **kwargs) 332 333 # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no /usr/local/lib/python3.10/dist-packages/pandas/core/reshape/pivot.py in pivot(data, index, columns, values) 538 # [List[Any], ExtensionArray, ndarray[Any, Any], Index, Series]"; expected 539 # "Hashable" --> 540 return indexed.unstack(columns_listlike) # type: ignore[arg-type] 541 542 /usr/local/lib/python3.10/dist-packages/pandas/core/series.py in unstack(self, level, fill_value) 4453 from pandas.core.reshape.reshape import unstack 4454 -> 4455 return unstack(self, level, fill_value) 4456 4457 # ---------------------------------------------------------------------- /usr/local/lib/python3.10/dist-packages/pandas/core/reshape/reshape.py in unstack(obj, level, fill_value) 487 if is_1d_only_ea_dtype(obj.dtype): 488 return _unstack_extension_series(obj, level, fill_value) --> 489 unstacker = _Unstacker( 490 obj.index, level=level, constructor=obj._constructor_expanddim 491 ) /usr/local/lib/python3.10/dist-packages/pandas/core/reshape/reshape.py in __init__(self, index, level, constructor) 135 ) 136 --> 137 self._make_selectors() 138 139 @cache_readonly /usr/local/lib/python3.10/dist-packages/pandas/core/reshape/reshape.py in _make_selectors(self) 187 188 if mask.sum() < len(self.index): --> 189 raise ValueError("Index contains duplicate entries, cannot reshape") 190 191 self.group_index = comp_index ValueError: Index contains duplicate entries, cannot reshape
In [7]:
Copied!
df.drop_duplicates(subset=['rows']).pivot(index='rows', columns='cols', values='values')
df.drop_duplicates(subset=['rows']).pivot(index='rows', columns='cols', values='values')
Out[7]:
cols | col2 | col3 | col4 |
---|---|---|---|
rows | |||
row2 | 7.0 | NaN | NaN |
row3 | NaN | 4.0 | NaN |
row4 | NaN | NaN | 9.0 |
row5 | 3.0 | NaN | NaN |
pivot_table()¶
In [8]:
Copied!
pd.pivot_table(df, index='rows', columns='cols', values='values')
pd.pivot_table(df, index='rows', columns='cols', values='values')
Out[8]:
cols | col1 | col2 | col3 | col4 |
---|---|---|---|---|
rows | ||||
row2 | NaN | 7.0 | NaN | NaN |
row3 | 3.0 | 3.0 | 6.5 | NaN |
row4 | NaN | NaN | 8.0 | 9.5 |
row5 | NaN | 3.5 | NaN | NaN |
In [9]:
Copied!
pd.pivot_table(df, index='rows', columns='cols', values='values', aggfunc=sum)
pd.pivot_table(df, index='rows', columns='cols', values='values', aggfunc=sum)
Out[9]:
cols | col1 | col2 | col3 | col4 |
---|---|---|---|---|
rows | ||||
row2 | NaN | 7.0 | NaN | NaN |
row3 | 3.0 | 3.0 | 13.0 | NaN |
row4 | NaN | NaN | 8.0 | 19.0 |
row5 | NaN | 7.0 | NaN | NaN |
crosstab()¶
In [10]:
Copied!
pd.crosstab(df['rows'], df['cols'])
pd.crosstab(df['rows'], df['cols'])
Out[10]:
cols | col1 | col2 | col3 | col4 |
---|---|---|---|---|
rows | ||||
row2 | 0 | 1 | 0 | 0 |
row3 | 1 | 1 | 2 | 0 |
row4 | 0 | 0 | 1 | 2 |
row5 | 0 | 2 | 0 | 0 |
In [11]:
Copied!
pd.crosstab(df['rows'], df['cols'], values=df['values'], aggfunc=sum)
pd.crosstab(df['rows'], df['cols'], values=df['values'], aggfunc=sum)
Out[11]:
cols | col1 | col2 | col3 | col4 |
---|---|---|---|---|
rows | ||||
row2 | NaN | 7.0 | NaN | NaN |
row3 | 3.0 | 3.0 | 13.0 | NaN |
row4 | NaN | NaN | 8.0 | 19.0 |
row5 | NaN | 7.0 | NaN | NaN |