V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
sharpy
V2EX  ›  Apple

tensorflow_macos 速度测试

  •  
  •   sharpy · 2020-11-22 11:17:31 +08:00 · 2251 次点击
    这是一个创建于 1491 天前的主题,其中的信息可能已经有所发展或是发生改变。

    MBP16 i9-9880h 5500M 8G

    #!/usr/bin/env python
    # coding: utf-8
    import tensorflow.compat.v2 as tf
    import tensorflow_datasets as tfds
    
    tf.enable_v2_behavior()
    
    from tensorflow.python.framework.ops import disable_eager_execution
    
    disable_eager_execution()
    
    from tensorflow.python.compiler.mlcompute import mlcompute
    
    mlcompute.set_mlc_device(device_name='cpu')
    
    (ds_train, ds_test), ds_info = tfds.load(
        'mnist',
        split=['train', 'test'],
        shuffle_files=True,
        as_supervised=True,
        with_info=True,
    )
    
    
    def normalize_img(image, label):
        """Normalizes images: `uint8` -> `float32`."""
        return tf.cast(image, tf.float32) / 255., label
    
    
    ds_train = ds_train.map(
        normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    ds_train = ds_train.cache()
    ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
    ds_train = ds_train.batch(128)
    ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)
    
    ds_test = ds_test.map(
        normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    ds_test = ds_test.batch(128)
    ds_test = ds_test.cache()
    ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)
    
    model = tf.keras.models.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    model.compile(
        loss='sparse_categorical_crossentropy',
        optimizer=tf.keras.optimizers.Adam(0.001),
        metrics=['accuracy'],
    )
    
    model.fit(
        ds_train,
        epochs=10,
    )
    

    GPU 速度

    Epoch 1/10 469/469 [==============================] - 10s 14ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.3598 - accuracy: 0.9028

    Epoch 2/10 469/469 [==============================] - 9s 14ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1623 - accuracy: 0.9535

    Epoch 3/10 469/469 [==============================] - 9s 14ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.1182 - accuracy: 0.9664

    Epoch 4/10 469/469 [==============================] - 9s 14ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0911 - accuracy: 0.9735

    Epoch 5/10 469/469 [==============================] - 9s 14ms/step - batch: 234.0000 - size: 1.0000 - loss: 0.0732 - accuracy: 0.9786

    CPU 速度

    Epoch 1/10 469/469 [==============================] - 3s 1ms/step - batch: 234.0000 - size: 1.0000 - loss: nan - accuracy: 0.0987

    Epoch 2/10 469/469 [==============================] - 3s 1ms/step - batch: 234.0000 - size: 1.0000 - loss: nan - accuracy: 0.0987

    Epoch 3/10 469/469 [==============================] - 3s 1ms/step - batch: 234.0000 - size: 1.0000 - loss: nan - accuracy: 0.0987

    Epoch 4/10 469/469 [==============================] - 3s 1ms/step - batch: 234.0000 - size: 1.0000 - loss: nan - accuracy: 0.0987

    Epoch 5/10 469/469 [==============================] - 3s 1ms/step - batch: 234.0000 - size: 1.0000 - loss: nan - accuracy: 0.0987

    6 条回复    2020-11-23 11:00:39 +08:00
    tzm41
        1
    tzm41  
       2020-11-22 11:33:59 +08:00 via iPhone
    浅窄的 dense net,GPU 没啥加速效果吧…
    RichardSun
        2
    RichardSun  
       2020-11-22 11:50:59 +08:00 via iPhone
    想起之前我试过一个好像叫 plaidML 的 backend,随便跑了下试试 GPU 模式比普通 backend 的 CPU 都慢🤦🏻‍♂️
    ZRS
        3
    ZRS  
       2020-11-22 17:57:12 +08:00 via iPhone
    试试 resnet50
    tianshilei1992
        4
    tianshilei1992  
       2020-11-22 20:57:40 +08:00 via iPhone
    我一直想写一个 Metal 的 OpenMP offloading plugin,但是 Metal compiler 没开源,我搞不定 CodeGen…
    sharpy
        5
    sharpy  
    OP
       2020-11-23 10:33:55 +08:00
    @tianshilei1992 #4 你可以看看 https://github.com/a2flo/floor.git 这个项目,也许有点儿启发,这个项目修改了 clang 的源码,使之能生成各个后端代码,看说明是“compiles compute/graphics C++ code to CUDA/PTX, Metal/AIR, OpenCL/SPIR/SPIR-V, Vulkan/SPIR-V code/binaries ”
    tianshilei1992
        6
    tianshilei1992  
       2020-11-23 11:00:39 +08:00
    @sharpy 👍 感谢!粗看了一下代码,发现 Metal 的 AIR 竟然就是从 SPIR-V 魔改的…除了 data layout 有些不一样之外…
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2883 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 22ms · UTC 12:40 · PVG 20:40 · LAX 04:40 · JFK 07:40
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.