CIFAR-10 教程
如果您还没有,我们建议您在逐步完成本教程之前,先阅读入门指南。
在本教程中,我们将向 CIFAR-10 模型添加 DeepSpeed,CIFAR-10 是一个小型图像分类模型。
首先,我们将介绍如何运行原始 CIFAR-10 模型。然后,我们将逐步介绍如何启用此模型以使用 DeepSpeed 运行。
运行原始 CIFAR-10
来自CIFAR-10 教程的原始模型代码,我们已将此仓库复制到DeepSpeedExamples/training/cifar/并将其作为子模块提供。要下载,请执行
git submodule update --init --recursive
要安装 CIFAR-10 模型的要求
cd DeepSpeedExamples/cifar
pip install -r requirements.txt
运行python cifar10_tutorial.py
,它在第一次运行时下载训练数据集。
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
170500096it [00:02, 61124868.24it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
cat frog frog frog
[1, 2000] loss: 2.170
[1, 4000] loss: 1.879
[1, 6000] loss: 1.690
[1, 8000] loss: 1.591
[1, 10000] loss: 1.545
[1, 12000] loss: 1.467
[2, 2000] loss: 1.377
[2, 4000] loss: 1.374
[2, 6000] loss: 1.363
[2, 8000] loss: 1.322
[2, 10000] loss: 1.295
[2, 12000] loss: 1.287
Finished Training
GroundTruth: cat ship ship plane
Predicted: cat ship plane plane
Accuracy of the network on the 10000 test images: 53 %
Accuracy of plane : 69 %
Accuracy of car : 59 %
Accuracy of bird : 56 %
Accuracy of cat : 36 %
Accuracy of deer : 37 %
Accuracy of dog : 26 %
Accuracy of frog : 70 %
Accuracy of horse : 61 %
Accuracy of ship : 51 %
Accuracy of truck : 63 %
cuda:0
启用 DeepSpeed
参数解析
应用 DeepSpeed 的第一步是向 CIFAR-10 模型添加 DeepSpeed 参数,使用deepspeed.add_config_arguments()
函数,如下所示。
import argparse
import deepspeed
def add_argument():
parser=argparse.ArgumentParser(description='CIFAR')
# Data.
# Cuda.
parser.add_argument('--with_cuda', default=False, action='store_true',
help='use CPU in case there\'s no GPU support')
parser.add_argument('--use_ema', default=False, action='store_true',
help='whether use exponential moving average')
# Train.
parser.add_argument('-b', '--batch_size', default=32, type=int,
help='mini-batch size (default: 32)')
parser.add_argument('-e', '--epochs', default=30, type=int,
help='number of total epochs (default: 30)')
parser.add_argument('--local_rank', type=int, default=-1,
help='local rank passed from distributed launcher')
# Include DeepSpeed configuration arguments.
parser = deepspeed.add_config_arguments(parser)
args=parser.parse_args()
return args
初始化
我们使用deepspeed.initialize
创建model_engine
、optimizer
和trainloader
,其定义如下
def initialize(args,
model,
optimizer=None,
model_params=None,
training_data=None,
lr_scheduler=None,
mpu=None,
dist_init_required=True,
collate_fn=None):
在这里,我们使用 CIFAR-10 模型(net
)、args
、parameters
和trainset
初始化 DeepSpeed。
parameters = filter(lambda p: p.requires_grad, net.parameters())
args=add_argument()
# Initialize DeepSpeed to use the following features
# 1) Distributed model.
# 2) Distributed data loader.
# 3) DeepSpeed optimizer.
model_engine, optimizer, trainloader, _ = deepspeed.initialize(args=args, model=net, model_parameters=parameters, training_data=trainset)
初始化 DeepSpeed 后,原始的device
和optimizer
将被移除。
#from deepspeed.accelerator import get_accelerator
#device = torch.device(get_accelerator().device_name(0) if get_accelerator().is_available() else "cpu")
#net.to(device)
#optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
训练 API
由deepspeed.initialize
返回的model
是我们将用于使用前向、后向和步长 API 训练模型的DeepSpeed 模型引擎。
for i, data in enumerate(trainloader):
# Get the inputs; data is a list of [inputs, labels].
inputs = data[0].to(model_engine.device)
labels = data[1].to(model_engine.device)
outputs = model_engine(inputs)
loss = criterion(outputs, labels)
model_engine.backward(loss)
model_engine.step()
在使用小批量更新权重后,DeepSpeed 会自动处理梯度的清零。
配置
使用 DeepSpeed 的下一步是创建一个配置文件 (ds_config.json)。此文件提供用户定义的 DeepSpeed 特定参数,例如批次大小、优化器、调度程序和其他参数。
{
"train_batch_size": 4,
"steps_per_print": 2000,
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.001,
"betas": [
0.8,
0.999
],
"eps": 1e-8,
"weight_decay": 3e-7
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 0.001,
"warmup_num_steps": 1000
}
},
"wall_clock_breakdown": false
}
运行启用 DeepSpeed 的 CIFAR-10 模型
要开始训练应用了 DeepSpeed 的 CIFAR-10 模型,请执行以下命令,它将默认使用所有检测到的 GPU。
deepspeed cifar10_deepspeed.py --deepspeed_config ds_config.json
DeepSpeed 通常会打印更多训练详细信息供用户监控,包括训练设置、性能统计信息和损失趋势。
deepspeed.pt cifar10_deepspeed.py --deepspeed_config ds_config.json
Warning: Permanently added '[192.168.0.22]:42227' (ECDSA) to the list of known hosts.
cmd=['pdsh', '-w', 'worker-0', 'export NCCL_VERSION=2.4.2; ', 'cd /data/users/deepscale/test/ds_v2/examples/cifar;', '/usr/bin/python', '-u', '-m', 'deepspeed.pt.deepspeed_launch', '--world_info=eyJ3b3JrZXItMCI6IFswXX0=', '--node_rank=%n', '--master_addr=192.168.0.22', '--master_port=29500', 'cifar10_deepspeed.py', '--deepspeed', '--deepspeed_config', 'ds_config.json']
worker-0: Warning: Permanently added '[192.168.0.22]:42227' (ECDSA) to the list of known hosts.
worker-0: 0 NCCL_VERSION 2.4.2
worker-0: WORLD INFO DICT: {'worker-0': [0]}
worker-0: nnodes=1, num_local_procs=1, node_rank=0
worker-0: global_rank_mapping=defaultdict(<class 'list'>, {'worker-0': [0]})
worker-0: dist_world_size=1
worker-0: Setting CUDA_VISIBLE_DEVICES=0
worker-0: Files already downloaded and verified
worker-0: Files already downloaded and verified
worker-0: bird car horse ship
worker-0: DeepSpeed info: version=2.1, git-hash=fa937e7, git-branch=master
worker-0: [INFO 2020-02-06 19:53:49] Set device to local rank 0 within node.
worker-0: 1 1
worker-0: [INFO 2020-02-06 19:53:56] Using DeepSpeed Optimizer param name adam as basic optimizer
worker-0: [INFO 2020-02-06 19:53:56] DeepSpeed Basic Optimizer = FusedAdam (
worker-0: Parameter Group 0
worker-0: betas: [0.8, 0.999]
worker-0: bias_correction: True
worker-0: eps: 1e-08
worker-0: lr: 0.001
worker-0: max_grad_norm: 0.0
worker-0: weight_decay: 3e-07
worker-0: )
worker-0: [INFO 2020-02-06 19:53:56] DeepSpeed using configured LR scheduler = WarmupLR
worker-0: [INFO 2020-02-06 19:53:56] DeepSpeed LR Scheduler = <deepspeed.pt.deepspeed_lr_schedules.WarmupLR object at 0x7f64c4c09c18>
worker-0: [INFO 2020-02-06 19:53:56] rank:0 step=0, skipped=0, lr=[0.001], mom=[[0.8, 0.999]]
worker-0: DeepSpeedLight configuration:
worker-0: allgather_size ............... 500000000
worker-0: allreduce_always_fp32 ........ False
worker-0: disable_allgather ............ False
worker-0: dump_state ................... False
worker-0: dynamic_loss_scale_args ...... None
worker-0: fp16_enabled ................. False
worker-0: global_rank .................. 0
worker-0: gradient_accumulation_steps .. 1
worker-0: gradient_clipping ............ 0.0
worker-0: initial_dynamic_scale ........ 4294967296
worker-0: loss_scale ................... 0
worker-0: optimizer_name ............... adam
worker-0: optimizer_params ............. {'lr': 0.001, 'betas': [0.8, 0.999], 'eps': 1e-08, 'weight_decay': 3e-07}
worker-0: prescale_gradients ........... False
worker-0: scheduler_name ............... WarmupLR
worker-0: scheduler_params ............. {'warmup_min_lr': 0, 'warmup_max_lr': 0.001, 'warmup_num_steps': 1000}
worker-0: sparse_gradients_enabled ..... False
worker-0: steps_per_print .............. 2000
worker-0: tensorboard_enabled .......... False
worker-0: tensorboard_job_name ......... DeepSpeedJobName
worker-0: tensorboard_output_path ......
worker-0: train_batch_size ............. 4
worker-0: train_micro_batch_size_per_gpu 4
worker-0: wall_clock_breakdown ......... False
worker-0: world_size ................... 1
worker-0: zero_enabled ................. False
worker-0: json = {
worker-0: "optimizer":{
worker-0: "params":{
worker-0: "betas":[
worker-0: 0.8,
worker-0: 0.999
worker-0: ],
worker-0: "eps":1e-08,
worker-0: "lr":0.001,
worker-0: "weight_decay":3e-07
worker-0: },
worker-0: "type":"Adam"
worker-0: },
worker-0: "scheduler":{
worker-0: "params":{
worker-0: "warmup_max_lr":0.001,
worker-0: "warmup_min_lr":0,
worker-0: "warmup_num_steps":1000
worker-0: },
worker-0: "type":"WarmupLR"
worker-0: },
worker-0: "steps_per_print":2000,
worker-0: "train_batch_size":4,
worker-0: "wall_clock_breakdown":false
worker-0: }
worker-0: [INFO 2020-02-06 19:53:56] 0/50, SamplesPerSec=1292.6411179579866
worker-0: [INFO 2020-02-06 19:53:56] 0/100, SamplesPerSec=1303.6726433398537
worker-0: [INFO 2020-02-06 19:53:56] 0/150, SamplesPerSec=1304.4251022567403
......
worker-0: [2, 12000] loss: 1.247
worker-0: [INFO 2020-02-06 20:35:23] 0/24550, SamplesPerSec=1284.4954513975558
worker-0: [INFO 2020-02-06 20:35:23] 0/24600, SamplesPerSec=1284.384033658866
worker-0: [INFO 2020-02-06 20:35:23] 0/24650, SamplesPerSec=1284.4433482972925
worker-0: [INFO 2020-02-06 20:35:23] 0/24700, SamplesPerSec=1284.4664449792422
worker-0: [INFO 2020-02-06 20:35:23] 0/24750, SamplesPerSec=1284.4950124403447
worker-0: [INFO 2020-02-06 20:35:23] 0/24800, SamplesPerSec=1284.4756105952233
worker-0: [INFO 2020-02-06 20:35:24] 0/24850, SamplesPerSec=1284.5251526215386
worker-0: [INFO 2020-02-06 20:35:24] 0/24900, SamplesPerSec=1284.531217073863
worker-0: [INFO 2020-02-06 20:35:24] 0/24950, SamplesPerSec=1284.5125323220368
worker-0: [INFO 2020-02-06 20:35:24] 0/25000, SamplesPerSec=1284.5698818883018
worker-0: Finished Training
worker-0: GroundTruth: cat ship ship plane
worker-0: Predicted: cat car car plane
worker-0: Accuracy of the network on the 10000 test images: 57 %
worker-0: Accuracy of plane : 61 %
worker-0: Accuracy of car : 74 %
worker-0: Accuracy of bird : 49 %
worker-0: Accuracy of cat : 36 %
worker-0: Accuracy of deer : 44 %
worker-0: Accuracy of dog : 52 %
worker-0: Accuracy of frog : 67 %
worker-0: Accuracy of horse : 58 %
worker-0: Accuracy of ship : 70 %
worker-0: Accuracy of truck : 59 %