深度学习之路3:Pandas使用总结

什么是Pandas

Pandas是一个在Python中提供更好的数据结构和很多数据分析工具类的集合库,其有点包括:

  • 一个快速、高效的DataFrame对象,用于数据操作和综合索引;
    • 用于在内存数据结构和不同格式之间读写数据的工具:CSV和文本文件、MicrosoftExcel、SQL数据库和快速HDF 5格式;
      -智能数据对齐和丢失数据的综合处理:在计算中获得基于标签的自动对齐,并轻松地将凌乱的数据操作为有序的形式;
    • 数据集的灵活调整和旋转;
    • 基于智能标签的切片、花哨的索引和大型数据集的子集;
    • 可以从数据结构中插入和删除列,以实现大小可变;
    • 通过引擎与强大的组聚合或转换数据,允许对数据集进行拆分-应用-组合操作;
    • 数据集的高性能合并和连接;
    • 层次轴索引提供了在低维数据结构中处理高维数据的直观方法;
  • 时间序列功能:
    -日期范围生成和频率转换、移动窗口统计、移动窗口线性回归、日期转换和滞后。甚至在不丢失数据的情况下创建特定领域的时间偏移和加入时间序列;
  • 对性能进行了高度优化,用Cython或C编写了关键代码路径。
  • Python与Pandas在广泛的学术和商业领域中使用,包括金融,神经科学,经济学,统计学,广告,网络分析,等等
-- [Pandas中文文档](https://www.pypandas.cn/intro/home.html)

Pandas Series

怎么创建

Series是Pandas中的一维数据结构,类似于Python中的列表和Numpy中的Ndarray,不同之处在于:Series是一维的,能存储不同类型的数据,有一组索引与元素对应。

1
2
3
4
5
6
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
s1=pd.Series([1,3,4,5,6,np.nan,345])
s1

也可以自己自定下标

1
s2=pd.Series([1,2,3,4,5],['a','b','c','d'])

这样大数据结构好处是可以通过下标来选取数据:

1
2
s2['a'] #等价下面
s2.iloc[0]

常用方法

下面是常用的一些方法具体使用方法可以看附录里的代码输出):

函数名 说明 举例 备注
Series(array,index=array) 初始化一个Panda数列 s1=pd.Series([1,3,4,5,6,np.nan,345]) 可以自定义下标 index是自定义的下标集合
iloc[position] 通过位数获取值 s1.ilo
size 元素的个数 s1.size
head(num) 输出从头开始的第几个 s1.head(2)
describe() 返回对数列的描述 s1.describe() 描述包括最大值、最小值方差、平均值等
sort_values(ascending,na_position) 排序 s1.sort_values(ascending=False,na_position =”first”) False倒序,first 是NAN数据放最上面
plot 直出图形 s2.plot.bar() bar 柱状图,pie 饼状图 area 折线区域图 density 曲线图
to_dict() 将数列转换成一个python的字典类型 sdic=s1.to_dict()
drop_duplicates() 去掉数列里重复的值 s1.drop_duplicates()

Pandas DataFrame

如何创建

我们可以直接使用DataFrame来创建

1
s3=pd.DataFrame({"studens_1":[67,45,89,98,99],"studens_2":[67,75,79,98,100]},index=["语文","数学","英语","体育","政治"])

相比较于Series,DataFramed除了index,values,还多了一个Colums并且上面的series方法DataFram适用。

assign操作以及lambda

对DataFrame内的数据进行操作,可以通过assgin

1
2

s3.assign(studens_1= lambda x:x['studens_1']*2,studens_2=lambda x:x['studens_2']+2)

附录

1
2
3
4
5
6
7
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
s1=pd.Series([1,3,4,5,6,np.nan,345])
s1
0      1.0
1      3.0
2      4.0
3      5.0
4      6.0
5      NaN
6    345.0
dtype: float64
1
2
s2=pd.Series([1,2,8,4],['a','b','c','d'])
s2
a    1
b    2
c    8
d    4
dtype: int64
1
s1[3]
5.0
1
s2[0]
1
1
s1.size
7
1
s2.head(2)
a    1
b    2
dtype: int64
1
np.round(s2.describe(),2)
count    4.0
mean     2.0
std      1.0
min      1.0
25%      2.0
50%      2.0
75%      3.0
max      4.0
dtype: float64
1
s1.sort_values(ascending=False,na_position ="first")
5      NaN
6    345.0
4      6.0
3      5.0
2      4.0
1      3.0
0      1.0
dtype: float64
1
s2.plot.pie()
<matplotlib.axes._subplots.AxesSubplot at 0x26cee511f98>

png

1
s2.to_dict()
{'a': 1, 'b': 2, 'c': 8, 'd': 4}
1
s1.drop_duplicates()
0      1.0
1      3.0
2      4.0
3      5.0
4      6.0
5      NaN
6    345.0
dtype: float64
1
2
s3=pd.DataFrame({"studens_1":[67,45,89,98,99],"studens_2":[67,75,79,98,100]},index=["语文","数学","英语","体育","政治"])
s3





































studens_1 studens_2
语文 67 67
数学 45 75
英语 89 79
体育 98 98
政治 99 100

1
s3.iloc[0].iloc[1]
67
1
s3.assign(studens_1= lambda x:x['studens_1']*2,studens_2=lambda x:x['studens_2']+2)





































studens_1 studens_2
语文 134 69
数学 90 77
英语 178 81
体育 196 100
政治 198 102

1
2
data=pd.read_csv("train.csv")
data









































































































































































































































































































































































































































































































































































































































































































































































































































































































































































PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q
6 7 0 1 McCarthy, Mr. Timothy J male 54.0 0 0 17463 51.8625 E46 S
7 8 0 3 Palsson, Master. Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S
8 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 2 347742 11.1333 NaN S
9 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 0 237736 30.0708 NaN C
10 11 1 3 Sandstrom, Miss. Marguerite Rut female 4.0 1 1 PP 9549 16.7000 G6 S
11 12 1 1 Bonnell, Miss. Elizabeth female 58.0 0 0 113783 26.5500 C103 S
12 13 0 3 Saundercock, Mr. William Henry male 20.0 0 0 A/5. 2151 8.0500 NaN S
13 14 0 3 Andersson, Mr. Anders Johan male 39.0 1 5 347082 31.2750 NaN S
14 15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14.0 0 0 350406 7.8542 NaN S
15 16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 0 248706 16.0000 NaN S
16 17 0 3 Rice, Master. Eugene male 2.0 4 1 382652 29.1250 NaN Q
17 18 1 2 Williams, Mr. Charles Eugene male NaN 0 0 244373 13.0000 NaN S
18 19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vande… female 31.0 1 0 345763 18.0000 NaN S
19 20 1 3 Masselmani, Mrs. Fatima female NaN 0 0 2649 7.2250 NaN C
20 21 0 2 Fynney, Mr. Joseph J male 35.0 0 0 239865 26.0000 NaN S
21 22 1 2 Beesley, Mr. Lawrence male 34.0 0 0 248698 13.0000 D56 S
22 23 1 3 McGowan, Miss. Anna “Annie” female 15.0 0 0 330923 8.0292 NaN Q
23 24 1 1 Sloper, Mr. William Thompson male 28.0 0 0 113788 35.5000 A6 S
24 25 0 3 Palsson, Miss. Torborg Danira female 8.0 3 1 349909 21.0750 NaN S
25 26 1 3 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia… female 38.0 1 5 347077 31.3875 NaN S
26 27 0 3 Emir, Mr. Farred Chehab male NaN 0 0 2631 7.2250 NaN C
27 28 0 1 Fortune, Mr. Charles Alexander male 19.0 3 2 19950 263.0000 C23 C25 C27 S
28 29 1 3 O’Dwyer, Miss. Ellen “Nellie” female NaN 0 0 330959 7.8792 NaN Q
29 30 0 3 Todoroff, Mr. Lalio male NaN 0 0 349216 7.8958 NaN S
861 862 0 2 Giles, Mr. Frederick Edward male 21.0 1 0 28134 11.5000 NaN S
862 863 1 1 Swift, Mrs. Frederick Joel (Margaret Welles Ba… female 48.0 0 0 17466 25.9292 D17 S
863 864 0 3 Sage, Miss. Dorothy Edith “Dolly” female NaN 8 2 CA. 2343 69.5500 NaN S
864 865 0 2 Gill, Mr. John William male 24.0 0 0 233866 13.0000 NaN S
865 866 1 2 Bystrom, Mrs. (Karolina) female 42.0 0 0 236852 13.0000 NaN S
866 867 1 2 Duran y More, Miss. Asuncion female 27.0 1 0 SC/PARIS 2149 13.8583 NaN C
867 868 0 1 Roebling, Mr. Washington Augustus II male 31.0 0 0 PC 17590 50.4958 A24 S
868 869 0 3 van Melkebeke, Mr. Philemon male NaN 0 0 345777 9.5000 NaN S
869 870 1 3 Johnson, Master. Harold Theodor male 4.0 1 1 347742 11.1333 NaN S
870 871 0 3 Balkic, Mr. Cerin male 26.0 0 0 349248 7.8958 NaN S
871 872 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 1 11751 52.5542 D35 S
872 873 0 1 Carlsson, Mr. Frans Olof male 33.0 0 0 695 5.0000 B51 B53 B55 S
873 874 0 3 Vander Cruyssen, Mr. Victor male 47.0 0 0 345765 9.0000 NaN S
874 875 1 2 Abelson, Mrs. Samuel (Hannah Wizosky) female 28.0 1 0 P/PP 3381 24.0000 NaN C
875 876 1 3 Najib, Miss. Adele Kiamie “Jane” female 15.0 0 0 2667 7.2250 NaN C
876 877 0 3 Gustafsson, Mr. Alfred Ossian male 20.0 0 0 7534 9.8458 NaN S
877 878 0 3 Petroff, Mr. Nedelio male 19.0 0 0 349212 7.8958 NaN S
878 879 0 3 Laleff, Mr. Kristo male NaN 0 0 349217 7.8958 NaN S
879 880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 1 11767 83.1583 C50 C
880 881 1 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25.0 0 1 230433 26.0000 NaN S
881 882 0 3 Markun, Mr. Johann male 33.0 0 0 349257 7.8958 NaN S
882 883 0 3 Dahlberg, Miss. Gerda Ulrika female 22.0 0 0 7552 10.5167 NaN S
883 884 0 2 Banfield, Mr. Frederick James male 28.0 0 0 C.A./SOTON 34068 10.5000 NaN S
884 885 0 3 Sutehall, Mr. Henry Jr male 25.0 0 0 SOTON/OQ 392076 7.0500 NaN S
885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39.0 0 5 382652 29.1250 NaN Q
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen “Carrie” female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns


1
2
3
4
5
#整理数据,删除空数据
data=data.drop('Cabin',axis=1)
data['Age']=data['Age'].fillna(20)
data['Embarked']=data['Embarked'].notnull()
data.isnull().sum()
PassengerId    0
Survived       0
Pclass         0
Name           0
Sex            0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Embarked       0
dtype: int64
1
data.loc[2:5,["Age"]]



























Age
2 26.0
3 35.0
4 35.0
5 20.0

1
data.loc[data["Survived"]==1]
  File "<ipython-input-108-e75dc7c72bde>", line 1
    data.?loc[data["Survived"]==1]
         ^
SyntaxError: invalid syntax
1
data['Survived'].corr(data["Pclass"])
-0.33848103596101475
1
pd.DataFrame([1,2,3,4],index=pd.date_range("20190206",periods=4))



























0
2019-02-06 1
2019-02-07 2
2019-02-08 3
2019-02-09 4

1
2


Copyright © 2016 - 2020 Life-long Learning All Rights Reserved.

UV : | PV :