【对比Python】日志处理3
任务:每条日志不定行,每条记录开始的时候有固定的标记。
Python
1 | import pandas as pd |
2 | log_file = 'E://txt//Indefinite _info2.txt' |
3 | log_info = pd.read_csv(log_file,header=None) |
4 | group_cond = log_info[0].apply(lambda x:1 if x.split("\t")[0].split(":")[0]=="userid" else 0).cumsum() |
5 | log_g = log_info.groupby(group_cond,sort=False) |
6 | columns = ["userid","gender","age","salary","province","musicid","watch_time","time"] |
7 | df_dic = {} |
8 | for c in columns: |
9 | df_dic[c]=[] |
10 | for index,group in log_g: |
11 | rec_dic = {} |
12 | rec = group.values.flatten() |
13 | rec = '\t'.join(rec).split("\t") |
14 | for r in rec: |
15 | v = r.split(":") |
16 | rec_dic[v[0]]=v[1] |
17 | for col in columns: |
18 | if col not in rec_dic.keys(): |
19 | df_dic[col].append(None) |
20 | else: |
21 | df_dic[col].append(rec_dic[col]) |
22 | df = pd.DataFrame(df_dic) |
23 | print(df) |
Pandas没有按条件分组的功能,需要构造出一个按条件分组的数组。
集算器
A | ||
1 | E://txt//Indefinite _info2.txt | |
2 | [userid,gender,age,salary,province,musicid,watch_time,time] | |
3 | =file(A1).import@s() | |
4 | =A3.group@i(_1.array("\t")(1).array("\:")(1)=="userid") | |
5 | =A4.(~.(_1.array("\t")).conj().align(A2,~.array("\:")(1)).(~.array("\:")(2))).conj() | |
6 | =create(${A2.concat@c()}).record(A5) |
集算器强大的分组功能和循环计算能力,代码简单、明了。