非专业人员可以学会的
程序语言
桌面数据处理和分析的
工具集
操作简单、函数丰富
批量和重复性任务很和繁琐
对复杂的运算支持不足
使用简单、界面流畅美观
计算功能单一,只能做死板的多维分析
完整的编程能力,天然内置于Excel
对表格计算支持太差,简单任务也要大段代码
支持表格计算,交互性好,培训班遍地
表格计算不符合自然思维,太多种类表格需要掌握,难度超出非专业人员能力
df = pd.read_csv("../login_data.csv") df["ts"] = pd.to_datetime(df["ts"]).dt.date grouped = df.groupby("userid") aligned_dates = pd.date_range(start=df["ts"].min(), end=df["ts"].max(), freq='D') user_date_wether_con3days = [] for uid, group in grouped: group = group.drop_duplicates('ts') aligned_group = group.set_index("ts").reindex(aligned_dates) consecutive_logins = aligned_group.rolling(window=7) n = 0 date_wether_con3days = [] for r in consecutive_logins: n += 1 if n<7: continue else: ds = r['userid'].isna().cumsum() cont_login_times = r.groupby(ds).userid.count().max() wether_cont3days = 1 if cont_login_times>=3 else 0 date_wether_con3days.append(wether_cont3days) user_date_wether_con3days.append(date_wether_con3days) arr = np.array(user_date_wether_con3days) day7_cont3num = np.sum(arr,axis=0) result = pd.DataFrame({'dt':aligned_dates[6:],'cont3_num':day7_cont3num})
真正的记录集合,有交互性
环境复杂,不能直接处理桌面文件,非专业人员难以使用
过程式运算复杂度太高,非专业人员难以掌握
WITH all_dates AS ( SELECT DISTINCT TRUNC(ts) AS login_date FROM login_data), user_login_counts AS ( SELECT userid, TRUNC(ts) AS login_date, (CASE WHEN COUNT(*)>=1 THEN 1 ELSE 0 END) AS login_count FROM login_data GROUP BY userid, TRUNC(ts)), whether_login AS ( SELECT u.userid, ad.login_date, NVL(ulc.login_count, 0) AS login_count FROM all_dates ad CROSS JOIN ( SELECT DISTINCT userid FROM login_data) u LEFT JOIN user_login_counts ulc ON u.userid = ulc.userid AND ad.login_date = ulc.login_date ORDER BY u.userid, ad.login_date), whether_login_rn AS ( SELECT userid,login_date,login_count,ROWNUM AS rn FROM whether_login), whether_eq AS( SELECT userid,login_date,login_count,rn, (CASE WHEN LAG(login_count,1) OVER (ORDER BY rn)= login_count AND login_count =1 AND LAG(userid,1) OVER (ORDER BY rn)=userid THEN 0 ELSE 1 END) AS wether_e FROM whether_login_rn ), numbered_sequence AS ( SELECT userid,login_date,login_count,rn, wether_e, SUM(wether_e) OVER (ORDER BY rn) AS lab FROM whether_eq), consecutive_logins_num AS ( SELECT userid,login_date,login_count,rn, wether_e,lab, (SELECT (CASE WHEN max(COUNT(*))<3 THEN 0 ELSE 1 END) FROM numbered_sequence b WHERE b.rn BETWEEN a.rn - 6 AND a.rn AND b.userid=a.userid GROUP BY b. lab) AS cnt FROM numbered_sequence a) SELECT login_date,SUM(cnt) AS cont3_num FROM consecutive_logins_num WHERE login_date>=(SELECT MIN(login_date) FROM all_dates)+6 GROUP BY login_date ORDER BY login_date;
A | |
1 | =file("login_data.csv").import@tc() |
2 | =periods(date(A1.ts),date(A1.m(-1).ts)) |
3 | =A1.group(userid).(~.align(A2,date(ts)).(if(#<7,null,(cnt=~[-6:0].group@i(!~).max(count(~)),if(cnt>=3,1,0))))) |
4 | =msum(A3).~.new(A2(#):dt,int(~):cont3_num).to(7,) |
1. 按车型、月份、年份分组并计数
2. 计算同期比:if(Model==Model[-1] && Month==Month[-1],SalesVol/SalesVol[-1],null)
3. 重新按车型、年、月排序
4. 计算比上期: if(Model==Model[-1] ,SalesVol/SalesVol[-1],null)
1. 读入3个表格
2. 用standard关联absent和performance
3. 按照公式计算工资
A | B | C | |
1 | =file("data.xlsx").xlsimport@t() | ||
2 | for A1.fname() | =A1.field(A2) | |
3 | =B2.sum() | =B2.sum(int(~)) | |
4 | =if(B3==C3,B2.mode(),B2.avg()) | ||
5 | =B2.(if(~,~,B4)) | >A1.field(A2,B5) | |
6 | >file("dataNew.xlsx").xlsexport@t(A1) |
A | B | C | |
1 | [ID,Name,Sex,Postion,Birthday,Phone,Address,PostCode] | ||
2 | [C1,C2,F2,C3,C4,D5,C7,C8] | ||
3 | =directory@p("data/*.xlsx") | ||
4 | for A3 | =file(A4).xlsopen() | =B2.(B4.xlscell(~)) |
5 | =@|C4 | ||
6 | =create(${A1.concat@c()}).record(B5) | ||
7 | >file("all.xlsx").xlsexport@t(A6) |