最近实验室数据处理,一个Didi的数据文件要四千万条数据左右,了解了一下python。本着拿来主义的学习原则,用到什么学什么,记录之。
一、数据读入与输出
1 2 3 4 5 6 7 8 9 10 11 12 13 14 names = ['ID1' ,'ID2' ,'time' ,'Longitude' ,'Latitude' ] f = open ('F:/迅雷下载/已上传/g.csv' ) df = pd.read_csv(f,names = names) with open ("F:/迅雷下载/已上传/order.csv" ) as f: for line in f: f_average = open ('G:/DidiPython/driverData.txt' ) fw.write('Hello World!' ) fw.close
二、数据处理
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 df_order['ID2' ].isin(dd_gps['ID2' ]) import pandas as pdimport numpy as npnames = ['ID1' ,'average' ,'var' ] df_average = pd.read_csv(f_average,names = names) a_list=df_average['average' ].values.tolist() al_array=np.array(a_list) print ((al_array.sum ()/al_array.size))print ('方差:%0.2f' % r_array.std())d_driver=df_gps.drop_duplicates(['ID1' ]) for row_gps in d_driver.itertuples(index=True , name='Pandas' ): print (getattr (row_gps, "ID1" )) print ('(%.05f' % getattr (row, "Longitude" ),end='' )print (',%.05f),' % getattr (row, "Latitude" ))print (df.loc[indexs].values[0 :2 ])
三、地图数据轨迹还原
这里采用了GitHub上的开源库gmplot,具体使用方法 。
使用方法、项目代码及其他未尽事宜,详情请见 GAIARead 。