【对比Python】可重复的条件分组
任务:按在公司的工龄将员工分段分组统计每组的男女工人数
Python
| 1 | import pandas as pd |
| 2 | import datetime |
| 3 | def eval_g(dd:dict,ss:str): |
| 4 | return eval(ss,dd) |
| 5 | emp_file = 'E:\\txt\\employee.txt' |
| 6 | emp_info = pd.read_csv(emp_file,sep='\t') |
| 7 | employed_list = ['Within five years','Five to ten years','More than ten years','Over fifteen years'] |
| 8 | employed_str_list = ["(s<5)","(s>=5) & (s<10)","(s>=10)","(s>=15)"] |
| 9 | today = datetime.datetime.today().year |
| 10 | arr = pd.to_datetime(emp_info['HIREDATE']) |
| 11 | employed = today-arr.dt.year |
| 12 | emp_info['EMPLOYED']=employed |
| 13 | dd = {'s':emp_info['EMPLOYED']} |
| 14 | group_cond = [] |
| 15 | for n in range(len(employed_str_list)): |
| 16 | emp_g = emp_info.groupby(eval_g(dd,employed_str_list[n])) |
| 17 | emp_g_index = [index for index in emp_g.size().index] |
| 18 | if True not in emp_g_index: |
| 19 | female_emp=0 |
| 20 | male_emp=0 |
| 21 | else: |
| 22 | group = emp_g.get_group(True) |
| 23 | sum_emp = len(group) |
| 24 | female_emp = len(group[group['GENDER']=='F']) |
| 25 | male_emp = sum_emp-female_emp |
| 26 | group_cond.append([employed_list[n],male_emp,female_emp]) |
| 27 | group_df = pd.DataFrame(group_cond,columns=['EMPLOYED','MALE','FEMALE']) |
| 28 | print(group_df) |
Pandas没有现成的重复条件分组的函数,所以只能按照条件重新分组,取到满足条件的分组。
集算器
| A | B | |
| 1 | ?<5 | Within five years |
| 2 | ?>=5 && ?<10 | Five to ten years |
| 3 | ?>=10 | More than ten years |
| 4 | ?>=15 | Over fifteen years |
| 5 | E:\\txt\\employee.txt | |
| 6 | =[A1:A4] | =A6.concat@c() |
| 7 | =file(A5).import@t() | =A7.derive(age@y(HIREDATE):EMPLOYED) |
| 8 | =B7.enum@r(A6,EMPLOYED) | =[B1:B4] |
| 9 | =A8.new(B8(#):EMPLOYED,~.count(GENDER=="M"):MALE,~.count(GENDER=="F"):FEMAL) |
集算器有强大的枚举分组功能,可以轻松实现重复的条件分组。
