m1 有原生 numpy scipy 了

2020-12-09 15:37:56 +08:00
 YUX

https://github.com/conda-forge/miniforge

先下载对应版本的 Miniforge3, ====> OS X arm64 (Apple Silicon)

装上之后就有 conda 了,conda 里面装 numpy,scipy 什么的都是原生的

性能提升很大 无论对比 Rosetta 2 还是 intel i9

7584 次点击
所在节点    macOS
42 条回复
YUX
2020-12-09 20:38:09 +08:00
@IgniteWhite 太超前啦😂确实是个好东西
Tilie
2020-12-09 20:54:48 +08:00
8 代 i7 mac mini
Dotted two 4096x4096 matrices in 0.76 s.
Dotted two vectors of length 524288 in 0.09 ms.
SVD of a 2048x1024 matrix in 0.56 s.
Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
Eigendecomposition of a 2048x2048 matrix in 5.20 s.
YUX
2020-12-09 21:03:39 +08:00
Google Colab - 2 Intel(R) Xeon(R) CPU @ 2.20GHz

Dotted two 4096x4096 matrices in 4.16 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 1.49 s.
Cholesky decomposition of a 2048x2048 matrix in 0.23 s.
Eigendecomposition of a 2048x2048 matrix in 13.11 s.
zr86
2020-12-09 21:14:01 +08:00
M1 Mac mini

Dotted two 4096x4096 matrices in 0.69 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.68 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 4.82 s.
wydinhk
2020-12-09 22:21:48 +08:00
M1 MacBook Pro

Dotted two 4096x4096 matrices in 0.68 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.71 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 5.03 s.

同时用 powermetrics 测量功耗,前两项约 26W,后三项约 16W
lovestudykid
2020-12-10 03:17:17 +08:00
这个测试拉不开差距
MF839,只是比楼主的 M1 慢了一倍
Dotted two 4096x4096 matrices in 2.33 s.
Dotted two vectors of length 524288 in 0.54 ms.
SVD of a 2048x1024 matrix in 1.05 s.
Cholesky decomposition of a 2048x2048 matrix in 0.20 s.
Eigendecomposition of a 2048x2048 matrix in 8.38 s.


Intel(R) Xeon(R) Gold 6134
Dotted two 4096x4096 matrices in 0.32 s.
Dotted two vectors of length 524288 in 0.05 ms.
SVD of a 2048x1024 matrix in 0.89 s.
Cholesky decomposition of a 2048x2048 matrix in 0.15 s.
Eigendecomposition of a 2048x2048 matrix in 8.19 s.
Anaconda 默认安装的 numpy 版本没有用 mkl,也没有开启 avx512,这个 cpu 是浪费了
pubby
2020-12-10 10:01:09 +08:00
3700X 黑苹果

Dotted two 4096x4096 matrices in 0.46 s.
Dotted two vectors of length 524288 in 0.08 ms.
SVD of a 2048x1024 matrix in 7.37 s.
Cholesky decomposition of a 2048x2048 matrix in 0.82 s.
Eigendecomposition of a 2048x2048 matrix in 49.05 s.

This was obtained using the following Numpy configuration:
atlas_threads_info:
NOT AVAILABLE
blas_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3', '-I/AppleInternal/BuildRoot/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.Internal.sdk/System/Library/Frameworks/vecLib.framework/Headers']
define_macros = [('NO_ATLAS_INFO', 3)]
atlas_blas_threads_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
lapack_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3']
define_macros = [('NO_ATLAS_INFO', 3)]
atlas_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
mkl_info:
NOT AVAILABLE


使用姿势不太对....
bnuliujing
2020-12-10 10:18:09 +08:00
i7-6950X 的成绩

Dotted two 4096x4096 matrices in 0.35 s.
Dotted two vectors of length 524288 in 0.03 ms.
SVD of a 2048x1024 matrix in 0.27 s.
Cholesky decomposition of a 2048x2048 matrix in 0.10 s.
Eigendecomposition of a 2048x2048 matrix in 3.39 s.
NoobX
2020-12-10 11:05:02 +08:00
Mac Mini i5 款的成绩

Dotted two 4096x4096 matrices in 0.58 s.
Dotted two vectors of length 524288 in 0.08 ms.
SVD of a 2048x1024 matrix in 0.32 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 3.30 s.

M1 成绩印象也不太深刻。。。
不过 16G 内存依旧是一个大问题,系统一般自己就吃掉 4G,16G 只有 12G 放 dataset,老实讲对我不太够用
处理器慢点问题不大,swap 吃满了,那速度是真的噩梦
MisakaTian
2020-12-10 11:58:25 +08:00
数据狗表示 anaconda 搞定就上
Goldilocks
2020-12-10 12:06:11 +08:00
Processor Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz, 3600 Mhz, 4 Core

Dotted two 4096x4096 matrices in 0.33s ,比 m1 快一倍。但是 m1 是 8 核哦。所以同等频率同样核数,intel 还是要比 m1 快 3-4 倍左右,这还是 3 年前的产品。
YUX
2020-12-10 12:12:50 +08:00
@MisakaTian 用 mamba 啊
Goldilocks
2020-12-10 12:18:45 +08:00
现在是 2020 年。Intel 如果出个 2 核 3.6G 的 cpu,你肯定看不上它的性能。你要想的是 Intel 10 核、20 核。马上 AMD 都要发布 64 核桌面 CPU 了,apple 还停留在 2 核的水准。
meloyang05
2020-12-10 13:35:48 +08:00
@Goldilocks

“8 代 i7 mac mini
Dotted two 4096x4096 matrices in 0.76 s.
Dotted two vectors of length 524288 in 0.09 ms.
SVD of a 2048x1024 matrix in 0.56 s.
Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
Eigendecomposition of a 2048x2048 matrix in 5.20 s.

M1 Mac mini

Dotted two 4096x4096 matrices in 0.69 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.68 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 4.82 s.”

你选择性无视其他测试成绩么。。时间在 ms 级别本来误差就可能很大,也可能是 numpy for m1 现在有 bug,你单独拎 vector 的成绩出来能说明什么问题?
Goldilocks
2020-12-10 13:38:09 +08:00
误差不会很大,一般都在 1%以内。因为矩阵乘法就受两个限制:

1. CPU flops
2. 内存带宽
Goldilocks
2020-12-10 13:45:33 +08:00
像矩阵乘法这样的数值计算是很成熟的领域,大家都研究的很透了。请参见这个: https://en.wikichip.org/wiki/flops

假设内存带宽能跟得上 cpu 的速度,要么要想跑的更快,就只有:
1. 增加核数
2. 增加 SIMD 的长度

比如 skylake 可以做到 64 FLOPs/cycle,但是同时代的 AMD CPU 只有 16 FLOPs/cycle 。大家主频都差不多,这其中的 4 倍就造成了主要的差距。而且这种差距很难追赶上,可以说一辈子都没希望。
Harry1993
2020-12-10 14:08:58 +08:00
用 Apple 的 numpy ( https://github.com/apple/tensorflow_macos)試了一下:

Dotted two 4096x4096 matrices in 0.84 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.54 s.
Cholesky decomposition of a 2048x2048 matrix in 0.06 s.
Eigendecomposition of a 2048x2048 matrix in 6.29 s.
IgniteWhite
2020-12-10 23:07:30 +08:00
@MisakaTian miniforge 的包管理器不就是 conda 么…只是默认 channel 是 conda-forge
lly0514
2020-12-11 15:35:01 +08:00
@Goldilocks 实际上误差非常大,我实测 MKL vs openblas 的性能差距有一倍多
Richardyyz
2020-12-13 09:58:14 +08:00
@Goldilocks ZEN2 都已经 32 FLOPs/cycle 了,你这一辈子这么短吗?降频严重的 AVX512 并没有在 ZEN3 面前有多么大的优势。

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/733777

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX