m1 有原生 numpy scipy 了

2020-12-09 15:37:56 +08:00
 YUX

https://github.com/conda-forge/miniforge

先下载对应版本的 Miniforge3, ====> OS X arm64 (Apple Silicon)

装上之后就有 conda 了,conda 里面装 numpy,scipy 什么的都是原生的

性能提升很大 无论对比 Rosetta 2 还是 intel i9

7594 次点击
所在节点    macOS
42 条回复
pb941129
2020-12-09 15:39:45 +08:00
想知道对比 Intel i9 mkl 版 numpy 提升多少……
NoobX
2020-12-09 16:42:16 +08:00
然而 16g 封顶...
Goldilocks
2020-12-09 16:45:04 +08:00
期待 benchmark,估计被 avx512 吊打
felixcode
2020-12-09 19:43:51 +08:00
显存比你内存大
YUX
2020-12-09 19:49:07 +08:00
@pb941129
@NoobX
@Goldilocks
@felixcode



找到了个 numpy 性能脚本 跑了一下 https://gist.github.com/markus-beuckelmann/8bc25531b11158431a5b09a45abd6276

```
Dotted two 4096x4096 matrices in 0.53 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.59 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 4.74 s.

This was obtained using the following Numpy configuration:
blas_info:
libraries = ['cblas', 'blas', 'cblas', 'blas']
library_dirs = ['/Users/yux/miniforge3/envs/maths/lib']
include_dirs = ['/Users/yux/miniforge3/envs/maths/include']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
libraries = ['cblas', 'blas', 'cblas', 'blas']
library_dirs = ['/Users/yux/miniforge3/envs/maths/lib']
include_dirs = ['/Users/yux/miniforge3/envs/maths/include']
language = c
lapack_info:
libraries = ['lapack', 'blas', 'lapack', 'blas']
library_dirs = ['/Users/yux/miniforge3/envs/maths/lib']
language = f77
lapack_opt_info:
libraries = ['lapack', 'blas', 'lapack', 'blas', 'cblas', 'blas', 'cblas', 'blas']
library_dirs = ['/Users/yux/miniforge3/envs/maths/lib']
language = c
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/yux/miniforge3/envs/maths/include']
`
```




p.s. python 版本 3.9.1 -arm64 跑的时候关掉了所有后台
pb941129
2020-12-09 19:58:15 +08:00
@YUX Thx 这是我 16 寸 MBP i9 款跑出来的结果。没有关后台。环境 anaconda 3.8 。看上去比 M1 还是快一点的。(不然 Intel 真的要哭)

```
Dotted two 4096x4096 matrices in 0.45 s.
Dotted two vectors of length 524288 in 0.05 ms.
SVD of a 2048x1024 matrix in 0.32 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 3.53 s.

This was obtained using the following Numpy configuration:
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/xxx/anaconda/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/xxx/anaconda/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/xxx/anaconda/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/xxx/anaconda/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/xxx/anaconda/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/xxx/anaconda/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/Users/xxx/anaconda/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/Users/xxx/anaconda/include']

```
changepc90
2020-12-09 20:12:20 +08:00
M1:Dotted two vectors of length 524288 in 0.25 ms
MBP16:Dotted two vectors of length 524288 in 0.05 ms.
这一项差的好多啊。
YUX
2020-12-09 20:13:27 +08:00
@pb941129 不错还是 i9 强😂 是不是跑的时候 8 核 16 线程都占满了
YUX
2020-12-09 20:15:42 +08:00
@changepc90 这应该就是指令集差异造成的叭
Aspector
2020-12-09 20:19:41 +08:00
T480s 上的 i7 8550u,库是 mkl_rt

Dotted two 4096x4096 matrices in 1.07 s.
Dotted two vectors of length 524288 in 0.13 ms.
SVD of a 2048x1024 matrix in 0.53 s.
Cholesky decomposition of a 2048x2048 matrix in 0.15 s.
Eigendecomposition of a 2048x2048 matrix in 5.07 s.

用 HWMonitor 读出来 8550u 的实时功耗大概在 40-45W,M1 应该才 20W 吧(悲
YUX
2020-12-09 20:21:59 +08:00
分享一下朋友的 16inch 2.6 GHz 6-Core Intel Core i7

Dotted two 4096x4096 matrices in 0.49 s.
Dotted two vectors of length 524288 in 0.05 ms.
SVD of a 2048x1024 matrix in 0.32 s.
Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
Eigendecomposition of a 2048x2048 matrix in 3.16 s.
YUX
2020-12-09 20:24:36 +08:00
@Aspector air 的 m1 限制在 10 瓦😂
pb941129
2020-12-09 20:25:33 +08:00
@YUX 没看任务,不过以我对 numpy 尿性的理解,不至于不至于。可以等 lightgbm 适配了然后一起跑跑 CPU 版本(当时跑一个小项目找最优参数跑满整个 8700k 三小时
rock_cloud
2020-12-09 20:25:53 +08:00
2017 iMac 3.4Ghz Intel i5
Dotted two 4096x4096 matrices in 1.04 s.
Dotted two vectors of length 524288 in 0.17 ms.
SVD of a 2048x1024 matrix in 0.58 s.
Cholesky decomposition of a 2048x2048 matrix in 0.12 s.
Eigendecomposition of a 2048x2048 matrix in 5.37 s.
没关任何后台
YUX
2020-12-09 20:26:54 +08:00
@pb941129 烤鸡仨小时啊 我能在冰箱里测么😂 没风扇怕烤糊了
sxd96
2020-12-09 20:31:25 +08:00
18 年 13 寸 MBP i5-8259U

Dotted two 4096x4096 matrices in 0.80 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.35 s.
Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
Eigendecomposition of a 2048x2048 matrix in 3.39 s.
sxd96
2020-12-09 20:35:06 +08:00
@sxd96 感觉心里平衡了一点点,也是没关后台,mkl 库。但是我发现在核心满负载的情况下,MBP 会有一点点电啸声。虽然现在 ARM 在这上面可能差了一点点,但是如果算能效比,可能并不差。我觉得移动设备重要的还是能效比。
Gandum
2020-12-09 20:35:15 +08:00
还是初步版本。不过现在是冬天还不用急,风扇不太吵。明年夏天再买。
IgniteWhite
2020-12-09 20:35:29 +08:00
哈哈我五个月前发帖讲过啦 /t/688402
rock_cloud
2020-12-09 20:36:02 +08:00
Intel Xeon Silver 4114 2.2Ghz
Dotted two 4096x4096 matrices in 0.60 s.
Dotted two vectors of length 524288 in 0.04 ms.
SVD of a 2048x1024 matrix in 0.66 s.
Cholesky decomposition of a 2048x2048 matrix in 0.26 s.
Eigendecomposition of a 2048x2048 matrix in 6.67 s.

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/733777

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX