您好,欢迎来到外链网!
当前位置:外链网 » 站长资讯 » 专业问答 » 文章详细 订阅RssFeed

pythonpandas按时间分组,python pandas高级案例

来源:互联网 浏览:78次 时间:2023-04-08

?

demo.py(分组,groupby(),分组后的聚合函数):

import pandas as pdmy_list = [{"name":"zhangsan","age":18, "province":"jiangsu"}, {"name":"lisi", "age":19, "province":"henan"}, {"name":"xiaohong", "age":18, "province":"henan"}, {"name":"wangwu", "age":18, "province":"jiangsu"}]df = pd.DataFrame(my_list)print(df)''' age name province0 18 zhangsan jiangsu1 19 lisi henan2 18 xiaohong henan3 18 wangwu jiangsu'''# 分组grouped = df.groupby(by="province") # 根据"province"列分组print(type(grouped)) # <class 'pandas.core.groupby.generic.DataFrameGroupBy'> 可以遍历for i,j in grouped: print(i) # henan jiangsu print(j) # DataFrame类型 print("-"*30)'''henan age name province1 19 lisi henan2 18 xiaohong henan------------------------------jiangsu age name province0 18 zhangsan jiangsu3 18 wangwu jiangsu------------------------------'''# 也可以通过bool索引达到分组的效果。print(df[df["province"]=="henan"])''' age name province1 19 lisi henan2 18 xiaohong henan'''# 聚合函数。 count()每一列的个数print(grouped.count()) # DataFrame类型''' age nameprovincehenan 2 2jiangsu 2 2'''# 统计分组后指定列的个数print(grouped["name"].count()) # Series类型'''provincehenan 2jiangsu 2Name: name, dtype: int64'''

?

DataFrameGroupBy分组对象的聚合函数:

?

demo.py(按多列进行分组,groupby()):

import pandas as pdmy_list = [{"name":"zhangsan","age":18, "province":"jiangsu"}, {"name":"lisi", "age":19, "province":"henan"}, {"name":"xiaohong", "age":18, "province":"henan"}, {"name":"wangwu", "age":18, "province":"jiangsu"}]df = pd.DataFrame(my_list)print(df)''' age name province0 18 zhangsan jiangsu1 19 lisi henan2 18 xiaohong henan3 18 wangwu jiangsu'''# 根据多列分组grouped = df.groupby(by=["province","age"]) # 根据"province"和"age"列分组for i,j in grouped: print(i) # ('henan', 18) ('henan', 19) ('jiangsu', 18) print(j) # DataFrame类型 print("-"*30)'''('henan', 18) age name province2 18 xiaohong henan------------------------------('henan', 19) age name province1 19 lisi henan------------------------------('jiangsu', 18) age name province0 18 zhangsan jiangsu3 18 wangwu jiangsu------------------------------'''# 根据多列分组# grouped = df["name"].groupby(by=["province","age"]) # 错误。 df["name"]是Series类型,没有"province"和"age"列。grouped2 = df["name"].groupby(by=[df["province"],df["age"]]) # 正确。 by=[df["province"],df["age"]表示df的"province"列和df的"age"列print(grouped2.count()) # Series类型。 ("province"和"age"都是索引(复合索引),真正的数据只有一列)'''province agehenan 18 1 19 1jiangsu 18 美国高防vps 2Name: name, dtype: int64'''# 如果想得到DataFrame类型可以使用 df[["name"]]代替df["name"]grouped3 = df.groupby(by=[df["province"],df["age"]])["name"] # 取"name"列也可以在分组之后取。print(grouped3.count())'''province agehenan 18 1 19 1jiangsu 18 2Name: name, dtype: int64'''

?

demo.py(交叉表,特殊的分组工具,crosstab()):

import pandas as pd# 模拟用户购买商品的表(数据)my_list = [{"user_id": 11, "goods": "苹果"}, {"user_id": 11, "goods": "苹果"}, {"user_id": 11, "goods": "香蕉"}, {"user_id": 22, "goods": "苹果"}, {"user_id": 22, "goods": "香蕉"}, {"user_id": 22, "goods": "香蕉"}, {"user_id": 33, "goods": "梨"}, {"user_id": 33, "goods": "香蕉"}, {"user_id": 33, "goods": "苹果"}]df = pd.DataFrame(my_list)print(df)''' goods user_id0 苹果 111 苹果 112 香蕉 113 苹果 224 香蕉 225 香蕉 226 梨 337 香蕉 338 苹果 33'''# 交叉表 (特殊的分组工具)crass_tb = pd.crosstab(df["user_id"], df["goods"]) # 交叉表,统计每个用户购买各种商品的数量print(crass_tb)'''goods 梨 苹果 香蕉user_id 11 0 2 122 0 1 233 1 1 1'''

?

?

67068346