lithbitren
2020-06-23 20:50:54 +08:00
students = [
ㅤ{
ㅤㅤ'class': random.randrange(2000),
ㅤㅤ'sex': random.randint(0, 1),
ㅤㅤ'height': random.randrange(150, 190)
ㅤ}
ㅤfor _ in range(1_000_000)
]
collect = collections.defaultdict(lambda: {
ㅤ'maleSum': 0,
ㅤ'maleCount': 0,
ㅤ'femaleSum': 0,
ㅤ'femaleCount': 0
})
for student in students:
ㅤif student['sex']:
ㅤㅤcollect[student['class']]['maleSum'] += student['height']
ㅤㅤcollect[student['class']]['maleCount'] += 1
ㅤelse:
ㅤㅤcollect[student['class']]['femaleSum'] += student['height']
ㅤㅤcollect[student['class']]['femaleCount'] += 1
result = [
ㅤClass['maleSum'] / Class['maleCount'] - Class['femaleSum'] / Class['femaleCount']
ㅤfor Class in collect.values()
]
测了测,百万级数据查询时间肯定不超过半秒,这还是用带键名的,如果把临时字典换成数组,估计还能再将快几倍,拆分数组类型到 numpy 然后开 numba,估计还能再快几倍,几十分钟居然就真等了。。。