python|python pandas中DataFrame学习笔记

【python|python pandas中DataFrame学习笔记】本文章为个人在学习pandas中 DataFrame 部分内容的一些整理 。
注:总结并完全,DataFrame还有更多操作,此文章仅仅是我这几天学习的一点整理。
DataFrame DataFrame是一个表格型的数据结构,是以一个或多个二维块存放的数据表格(层次化索引),DataFrame既有行索引还有列索引,它有一组有序的列,每列既可以是不同类型(数值、字符串、布尔型)的数据,或者可以看做有Series组成的字典。

类型 说明
二维ndarray 数据矩阵,还可以传入行标和列标
由数组、列表或元组组成的字典 每个序列会变成DataFrame的一列,所有序列的长度必须相同
Numpy的结构化/记录数组 类似于“又数组组成的字典”
由Series组成的字典 每个Series会成为一列,如果没有显示指定索引,则个Series的索引会被合并(考虑到数字对齐的情况)成结果的行索引
由字典组成的字典 各内层字典会成为一列,每个字典的键会被合并成结果的行索引
字典或Series的列表 各项将会成为DataFrame的一行,字典键或series索引的并集将会成为DataFrame的列表
由列表或元组组成的列表 类似于“二维ndarray”
另一个DataFrame 该DataFrame的索引将会被沿用,除非显式指定了其他索引
Numpy的MaskedArray 类似于“二维ndarray”的情况,只是掩码值在结果DataFrame会变成NA/缺失值
DataFrame创建
import pandas as pd import numpy as np df=pd.DataFrame({'name':['aa','bb','cc'],'pay':[100,200,300]}) print(df) #### namepay 0aa100 1bb200 2cc300

import pandas as pd import numpy as np df=pd.DataFrame(np.arange(12).reshape(3,4),index=list('abc'),columns=list('abcd')) print(df) #### abcd a0123 b4567 c891011

import pandas as pd import numpy as np df=pd.DataFrame({'name':['aa','bb','cc'],'pay':[100,200,300]}) df['add']=[1,2,3]#增加 print(df) namepayadd 0aa1001 1bb2002 2cc3003

##数字对齐,add1中多余的部分会被切掉,而add2中缺少的部分会被NaN值填充import pandas as pd import numpy as np df=pd.DataFrame({'name':['aa','bb','cc'],'pay':[100,200,300]}) value=https://www.it610.com/article/pd.Series([1,2,3,4,5],index=[0,1,2,3,4]) value1=pd.Series([10,12],index=[0,1]) df['add']=[1,2,3] df['add1']=value df['add2']=value1 print(df) #### namepayaddadd1add2 0aa1001110.0 1bb2002212.0 2cc30033NaN

DataFrame 切片和索引
import pandas as pd import numpy as np df=pd.DataFrame({'name':['aa','bb','cc'],'pay':[100,200,300]}) value=https://www.it610.com/article/pd.Series([1,2,3,4,5],index=[0,1,2,3,4]) value1=pd.Series([10,12],index=[0,1]) df['add']=[1,2,3] df['add1']=value df['add2']=value1 print(df[1:3]) #### namepayaddadd1add2 1bb2002212.0 2cc30033NaN

import pandas as pd import numpy as np df=pd.DataFrame({'name':['aa','bb','cc'],'pay':[100,200,300]}) value=https://www.it610.com/article/pd.Series([1,2,3,4,5],index=[0,1,2,3,4]) value1=pd.Series([10,12],index=[0,1]) df['add']=[1,2,3] df['add1']=value df['add2']=value1 s=df['name'] print(s) #### 0aa 1bb 2cc Name: name, dtype: object

import pandas as pd import numpy as np df=pd.DataFrame({'name':['aa','bb','cc'],'pay':[100,200,300]}) value=https://www.it610.com/article/pd.Series([1,2,3,4,5],index=[0,1,2,3,4]) value1=pd.Series([10,12],index=[0,1]) df['add']=[1,2,3] df['add1']=value df['add2']=value1 print(df[['name','add']]) #### nameadd 0aa1 1bb2 2cc3

#交换第一列和第二列的值 import pandas as pd import numpy as np df=pd.DataFrame({'name':['aa','bb','cc'],'pay':[100,200,300]}) value=https://www.it610.com/article/pd.Series([1,2,3,4,5],index=[0,1,2,3,4]) value1=pd.Series([10,12],index=[0,1]) df['add']=[1,2,3] df['add1']=value df['add2']=value1 df.loc[:,['pay','name']]=df[['name','pay']].values print(df[['name','pay']]) #### name pay 0100aa 1200bb 2300cc

loc属性访问:
#基本示例代码及其输出结果注:由于使用random输出数值,因此后面示例中每次输出的数值都是变化的 import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df) #### ABCD a -0.1095590.0290601.293133 -0.436300 b -0.419417 -1.467318 -0.7488380.518618 c -1.7623490.263882 -0.0709891.153927 d0.380648 -0.1729810.737856 -0.039851 e -0.585820 -0.502932 -0.858317 -1.223934 f1.058826 -0.2759140.6078410.407702

按标签选择(loc):
import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df.loc[['a','b','c'],:]) #### ABCD a1.0018300.2164011.625705 -0.169794 b0.023062 -0.994016 -0.3094491.272813 c1.0049000.8257380.767043 -0.107114

import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df.loc['d':,'A':'C']) #### ABC d -0.512204 -1.0499161.340404 e1.1513000.8263900.630450 f0.9501060.068678 -0.254349

import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df.loc['a']) #### A-0.677992 B0.040744 C0.833783 D-0.138422 Name: a, dtype: float64

import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df.loc['a']>0) #### AFalse BTrue CFalse DFalse Name: a, dtype: bool

import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df.loc['a','A']) #### -0.5031521904958901

按位置选择(iloc):
import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df.iloc[:3]) #### ABCD a0.143855 -1.240150 -0.4261730.145956 b -1.6810271.327813 -0.060675 -0.722033 c -0.3187191.0376300.229185 -0.328399

import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df.iloc[1:4,2:4]) #### CD b -0.799387 -0.995885 c0.912693 -0.492005 d -0.3586510.963182

import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df.iloc[[1,3,5],[1,3]]) #### BD b -1.0287320.386360 d1.5075470.108043 f -0.6771902.200727

import pandas as pd import numpy as np df=pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD')) print(df.iloc[1:3,:]) print(df.iloc[2,1]) #### ABCD b -0.159791 -0.1058500.7614350.778722 c -1.294700 -0.9560180.038091 -1.121679-0.9560181811847951

可以对DataFrame进行一些操作

    推荐阅读