持续创造,加速生长!这是我参加「日新计划 10 月更文挑战」的第32天,点击检查活动概况

1. 检查一部分数据

咱们能够运用 head() 和 tail() 办法来检查 Series 目标或 DataFrame 目标的一小部分数据,默许检查的元素个数为 5 个,head() 展现头部的 5 个元素,tail() 展现尾部的 5 个元素,也能够自定义展现的元素个数。当 Series 目标或 DataFrame 目标包括的数据较多时,运用 head() 或 tail() 检查数据的结构会十分便利。

1.1 Series 目标

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2, np.NaN, 8, 1, -9]))
print(my_series.head())

由于没有清晰指定展现的元素个数,上面的代码默许输出了 Series 目标头部的 5 个元素。

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2, np.NaN, 8, 1, -9]))
print(my_series.head(3))

由于指定了展现的元素个数为 3,所以上面的代码输出了 Series 目标头部的 3 个元素。

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2, np.NaN, 8, 1, -9]))
print(my_series.tail())

由于没有清晰指定展现的元素个数,上面的代码默许输出了 Series 目标尾部的 5 个元素。

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2, np.NaN, 8, 1, -9]))
print(my_series.tail(3))

由于指定了展现的元素个数为 3,所以上面的代码输出了 Series 目标尾部的 3 个元素。

1.2 DataFrame 目标

import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
print(df.head())

由于没有清晰指定展现的行数,上面的代码默许输出了 DataFrame 目标头部的 5 行元素。

import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
print(df.head(3))

由于指定了展现的行数为 3,所以上面的代码输出了 DataFrame 目标头部的 3 行元素。

import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
print(df.tail())

由于没有清晰指定展现的行数,上面的代码默许输出了 DataFrame 目标尾部的 5 行元素。

import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
print(df.tail(3))

由于指定了展现的行数为 3,所以上面的代码输出了 DataFrame 目标尾部的 3 行元素。

2. 检查特点及数据

2.1 特点

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2, np.NaN, 8, 1, -9]))
print(my_series.index)
print(my_series.shape)

上述代码检查了 Series 目标 my_series 的索引和形状。

import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
print(df.index)
print(df.columns)
print(df.shape)

上述代码检查了 DataFrame 目标 df 的行、列索引以及形状。

2.2 数据

在曩昔,pandas 主张运用 Series.valuesDataFrame.values 从 Series 目标 或 DataFrame 目标中提取数据。现在,主张避免运用 .values 而改用 .to_numpy(),由于 .values 有一些缺陷。

2.2.1 Series 目标

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2, np.NaN, 8, 1, -9]))
print(my_series.to_numpy())
print(type(my_series.to_numpy()))

上述代码获取了 Series 目标的数据,类型为 ndarray。

2.2.2 DataFrame 目标

import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, 142, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
print(df.to_numpy())
print(type(df.to_numpy()))

上述代码获取了 DataFrame 目标的数据,类型为 ndarray。

3. 检查核算信息

关于 Series、DataFrame 目标来说,pandas 有许多办法用来核算它们的描述核算。这其间的大多数为聚合类办法,例如:sum()、mean()、cumsum()、cumprod() 等。关于 Series 目标来说,这些办法不需要指定 axis 参数,关于 DataFrame 目标来说,这些办法需要指定 axis 参数,axis 参数的值能够是姓名或者数字。当参数的值为 index 或 0 时,表明按列进行核算;当参数的值为 columns 或 1 时,表明按行进行核算。

3.1 Series 目标

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2, np.NaN, 8, 1, -9]))
print(my_series.sum())
print(my_series.mean())
print(my_series.cumsum())
print(my_series.cumprod())

sum() 函数核算的是一切元素的和(除掉 np.NaN)。mean() 函数核算的是一切元素的平均值(除掉 np.NaN)。cumsum() 函数核算的是一切元素的累计和(除掉 np.NaN)。cumprod() 函数核算的是一切元素的累计积(除掉 np.NaN)。当上述函数什么参数都不设置时,在进行核算时,默许是疏忽 np.NaN 值的。假如不想疏忽 np.NaN 的值,能够设置 skipna 参数。例如:

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2, np.NaN, 8, 1, -9]))
print(my_series.sum(skipna=False))
print(my_series.mean(skipna=False))
print(my_series.cumsum(skipna=False))
print(my_series.cumprod(skipna=False))

上述代码在核算时,便没有疏忽 np.NaN。

3.2 DataFrame 目标

import numpy as np
import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, np.NaN, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, np.NaN, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
# 按列进行核算,疏忽 np.NaN
print(df.sum(axis=0))
print(df.sum(axis='index'))
print(df.mean(axis=0))
print(df.mean(axis='index'))
print(df.cumsum(axis=0))
print(df.cumsum(axis='index'))
print(df.cumprod(axis=0))
print(df.cumprod(axis='index'))
# 按行进行核算,疏忽 np.NaN
print(df.sum(axis=1))
print(df.sum(axis='columns'))
print(df.mean(axis=1))
print(df.mean(axis='columns'))
print(df.cumsum(axis=1))
print(df.cumsum(axis='columns'))
print(df.cumprod(axis=1))
print(df.cumprod(axis='columns'))

上述代码别离按行和列进行了 sum()、mean()、cumsum() 和 cumprod() 运算。在进行核算时,疏忽了 np.NaN。假如不想疏忽 np.NaN 的值,能够设置 skipna 参数。例如:

import numpy as np
import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, np.NaN, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, np.NaN, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
# 按列进行核算,疏忽 np.NaN
print(df.sum(axis=0, skipna=False))
print(df.sum(axis='index', skipna=False))
print(df.mean(axis=0, skipna=False))
print(df.mean(axis='index', skipna=False))
print(df.cumsum(axis=0, skipna=False))
print(df.cumsum(axis='index', skipna=False))
print(df.cumprod(axis=0, skipna=False))
print(df.cumprod(axis='index', skipna=False))
# 按行进行核算,疏忽 np.NaN
print(df.sum(axis=1, skipna=False))
print(df.sum(axis='columns', skipna=False))
print(df.mean(axis=1, skipna=False))
print(df.mean(axis='columns', skipna=False))
print(df.cumsum(axis=1, skipna=False))
print(df.cumsum(axis='columns', skipna=False))
print(df.cumprod(axis=1, skipna=False))
print(df.cumprod(axis='columns', skipna=False))

除了上面的函数外,还有一些函数,见下表:

如何查看 Series、DataFrame 对象的数据

4. describe() 函数

pandas 还有一个便利的函数 describe(),能够一次产生多个汇总核算,在进行核算时,排除了 np.NaN。例如:

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2, np.NaN, 8, 1, -9]))
print(my_series.describe())

上面代码的输出成果中,count 表明元素的个数;mean 表明元素的平均值;std 表明元素的标准差;min 表明元素的最小值;25% 表明元素的第一四分位数;50% 表明元素的第二四分位数;75% 表明元素的第三四分位数;max 表明元素的最大值。当把 describe() 函数用于 DataFrame 目标时,是对每一列进行汇总核算。例如:

import numpy as np
import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, np.NaN, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, np.NaN, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
print(df.describe())

上述代码输出了 DataFrame 目标每列的核算信息。当 describe() 没有参数设置时,默许输出了第一四分位数、第二四分位数、第三四分位数。咱们还能够经过设置参数 percentiles,来指定输出哪些百分位数。例如:

import numpy as np
import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, np.NaN, 142, 144], index=['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, np.NaN, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-02', '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
print(df.describe(percentiles=[0.05, 0.25, 0.75, 0.95]))

在 percentiles 里没有指定 0.5,由于中位数是默许包括的。上面 Series 目标中元素的类型为数字,当元素的类型为非数字时,describe() 函数会给出每个元素的次数以及一切元素中的最高次数。例如:

import numpy as np
import pandas as pd
s = pd.Series(["a", "a", "b", "b", "a", "a", np.NaN, "c", "d", "a"])
print(s.describe())

上述代码的输出成果中,count 表明元素的个数;unique 表明不同元素的个数;top 表明呈现次数最高的元素;freq 表明呈现次数最高的元素的次数。当 DataFrame 目标中既有数字的列也有非数字的列,在不设置参数的情况下,describe() 会只对数字的列进行核算核算,例如:

import numpy as np
import pandas as pd
frame = pd.DataFrame({"a": ["Yes", "Yes", "No", "No"], "b": range(4)})
print(frame.describe())

上面代码的输出成果只对 b 列进行了核算核算。当然,对哪些类型的列进行核算,咱们也能够用参数进行操控。例如:

import numpy as np
import pandas as pd
frame = pd.DataFrame({"a": ["Yes", "Yes", "No", "No"], "b": range(4)})
print(frame.describe(include=['object']))

当 include 参数的值为 object 时,只核算非数字列的核算信息。

import numpy as np
import pandas as pd
frame = pd.DataFrame({"a": ["Yes", "Yes", "No", "No"], "b": range(4)})
print(frame.describe(include=['number']))

当 include 参数的值为 number 时,只核算数字列的核算信息。

import numpy as np
import pandas as pd
frame = pd.DataFrame({"a": ["Yes", "Yes", "No", "No"], "b": range(4)})
print(frame.describe(include='all'))

当 include 参数的值为 all 时,核算一切列的核算信息,数字列依照数字列的规则,非数字列依照非数字列的规则。

5. 排序

5.1 按索引排序

依据条件对数据集排序是很常见的一种操作,要对行或列索引进行排序(按字典次序),可运用 sort_index 办法,它将回来一个已排序的新目标,例如:

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2]), index=["b", "e", "c", "d", "a", "f"])
# 排序前
print(my_series)
# 排序后
print(my_series.sort_index())

上面代码中,调用 sort_index() 之后,索引依照字典序进行了升序排序。咱们也能够依照降序进行排序,只需要传入参数 ascending=False,例如:

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, -5, 3, 2]), index=["b", "e", "c", "d", "a", "f"])
# 排序前
print(my_series)
# 排序后
print(my_series.sort_index(ascending=False))

关于 DataFrame 目标来说,能够依据恣意一个轴上的索引进行排序,例如:

import numpy as np
import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, np.NaN, 142, 144], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, np.NaN, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
# 排序前
print(df)
# 排序后
print(df.sort_index())

上面代码中,对 DataFrame 目标依照行索引进行了升序排序。当不传入参数时,默许依照行索引进行升序排序,咱们也能够经过指定参数来依照列索引进行排序。例如:

import numpy as np
import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, np.NaN, 142, 144], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, np.NaN, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
# 排序前
print(df)
# 排序后
print(df.sort_index(axis=1))

上面代码中,依照列索引进行了升序排序。上面都是依照升序进行排序,和 Series 目标相同,咱们能够经过指定参数,使得依照降序进行排序。例如:

import numpy as np
import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, np.NaN, 142, 144], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, np.NaN, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
# 排序前
print(df)
# 排序后
# 按列索引进行降序排序
print(df.sort_index(axis=1, ascending=False))
# 按行索引进行降序排序
print(df.sort_index(ascending=False))

5.2 按值进行排序

上面是按索引进行排序,假如要按值进行排序,能够运用 sort_values 办法。例如:

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, np.NaN, 3, 2]), index=["b", "e", "c", "d", "a", "f"])
# 排序前
print(my_series)
# 排序后
print(my_series.sort_values())

上面的代码依照值进行了升序排序,在排序时,任何缺失值默许都会被放到末尾。同样地,咱们能够经过指定排序的次序,使得值依照降序进行排序,例如:

import numpy as np
import pandas as pd
my_series = pd.Series(np.array([4, -7, 6, np.NaN, 3, 2]), index=["b", "e", "c", "d", "a", "f"])
# 排序前
print(my_series)
# 排序后
print(my_series.sort_values(ascending=False))

上面的代码依照值进行了降序排序。

当对一个 DataFrame 目标进行排序时,你可能希望依据一个或多个列中的值进行排序。将一个或多个列的姓名传递给 sort_values 的 by 选项即可到达该意图。例如:

import numpy as np
import pandas as pd
d = {
    "Open": pd.Series([136, 137, 140, 143, 141, 142, 146], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "High": pd.Series([137, 140, 143, 144, 144, 145, 146], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Low": pd.Series([135, 137, 140, 142, 140, 142, 144], index=['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12']),
    "Close": pd.Series([137, 139, np.NaN, 144, 143, 145, 144], index = ['2021-07-01', '2021-07-06', '2021-07-02', '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-12'])
}
df = pd.DataFrame(d)
# 排序前
print(df)
# 按open列的值进行排序
print(df.sort_values(by=['Open']))
# 按open列和High列进行排序
print(df.sort_values(by=['High', 'Low']))