PytorchZeroToAll(2)

안녕하세요 이번 포스트에서는 지난번에 이어 김성훈 교수님의 PytorchToAll 강의를 게시하도록 하겠습니다. 오늘은 logistic regression부터 다루며 pytorch를 활용한 neural network 구조와 실험을 위한 데이터 세팅법을 살펴보겠습니다. 본 포스트에서 사용된 코드는 Sung Kim님의 강의에서 가져왔음을 밝힙니다.

logistic regression

앞서 다룬 linear 모델과의 차이점은 activation function인 sigmoid function을 사용하고, loss function으로 cross-entropy를 사용한다는 점입니다.

코드로 살펴본다면 y_pred와 criterion 부분이 달라졌음을 확인할 수 있습니다.

import torch
from torch.autograd import Variable
import torch.nn.functional as F

x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0], [4.0]]))
y_data = Variable(torch.Tensor([[0.], [0.], [1.], [1.]]))


class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(1, 1)  # One in and one out

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data.
        """
        y_pred = F.sigmoid(self.linear(x))
        return y_pred

model = Model()

criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Wide & Deep

위의 logistic regression을 더 넓게(Wide), 그리고 깊게(Deep) 만들어서 사용해봅시다. 코드에서 나온것처럼 차원에 맞게 여러 층을 추가해 간단하게 사용할 수 있습니다. 단순 로지스틱과의 차이는 input 차원이 1->8로 변화하였고 사용하는 layer도 l1,l2,l3로 깊어졌습니다.

class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.l1 = torch.nn.Linear(8, 6)
        self.l2 = torch.nn.Linear(6, 4)
        self.l3 = torch.nn.Linear(4, 1)

        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        out1 = self.sigmoid(self.l1(x))
        out2 = self.sigmoid(self.l2(out1))
        y_pred = self.sigmoid(self.l3(out2))
        return y_pred

# our model
model = Model()


# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

DataLoader

데이터 수가 많을 경우, 전체를 한번에 학습하는 것은 굉장히 비효율적입니다. 따라서 데이터를 여러 배치로 나누어서 학습을 진행합니다. Pytorch를 사용한다면 데이터를 배치 단위로 나누는 과정을 쉽게 해결할 수 있습니다.

주어진 데이터를 다루는 Dataloader는 세 단계로 구성됩니다. 코드와 함께 설명하겠습니다.

import torch
import numpy as np
from torch.autograd import Variable
from torch.utils.data import Dataset, DataLoader


class DiabetesDataset(Dataset):
    """ Diabetes dataset."""

    # Initialize your data, download, etc.
    def __init__(self):
        xy = np.loadtxt('./data/diabetes.csv.gz',
                        delimiter=',', dtype=np.float32)
        self.len = xy.shape[0]
        self.x_data = torch.from_numpy(xy[:, 0:-1])
        self.y_data = torch.from_numpy(xy[:, [-1]])

    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    def __len__(self):
        return self.len


dataset = DiabetesDataset()

1: download, read data, etc.

데이터를 가져오고, 읽는 단계입니다. numpy array를 torch로 변경하는 과정이 수반됩니다.

2: return one item on the index

주어진 index에 해당하는 데이터를 가져오는 기능입니다.

3: return the data length

데이터의 length를 반환하는 인자입니다.

이렇게 Dataset을 설정한 후 DataLoader함수를 사용하여 배치 사이즈에 맞게 데이터를 가져올 수 있습니다. Pytorch는 MNIST, COCO와 같은 유명한 데이터셋을 따로 지원하기에 개별적으로 다운로드하거나 custom할 필요 없이 데이터를 가져올 수 있습니다. 아래 코드에서 num_workers는 병렬 프로세싱을 위한 옵션인데 현재(2019-01-13)까지는 윈도우 운영체제에서는 작동하지 않는것 같습니다. 따라서 윈도우 OS에서는 0으로 두시는걸 권장합니다.

dataset = DiabetesDataset()
train_loader = DataLoader(dataset=dataset,
                          batch_size=32,
                          shuffle=True,
                          num_workers=2)

Softmax Classfier

Softmax를 사용하는 이유는 softmax를 통해서 output의 probability를 구할 수 있기 때문입니다. 물론 softmax가 절대적인 정답은 아니겠지만 하나의 clssifier로써 굉장히 좋은 성능을 자랑합니다. 주로 softmax를 적용한 아웃풋과 실제 정답 타겟과의 cross-entropy loss function을 사용하며 Multi label prediction task에서 활용합니다.

Pytorch에서는 cross-entropy loss를 간단히 사용할 수 있습니다. 주의해야할 점은 실제 타겟값 Y가 one-hot 값이 아닌 class 값으로 들어가야 한다는 점과 예측값 y_pred가 softmax를 적용한 결과물이 아닌 logit값 그 자체가 들어가야 한다는점입니다. nn.CrossEntropyLoss()함수에 이미 softmax가 내장되어있기 때문입니다. 여러모로 사용자 친화적인 결과물이라고 볼 수 있겠습니다.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable


# Softmax + CrossEntropy (logSoftmax + NLLLoss)
loss = nn.CrossEntropyLoss()

n 사이즈의 배치 또한 한번에 사용할 수 있습니다. 사이즈에 맞게 Y와 y_pred가 주어지면 됩니다.

Y = Variable(torch.LongTensor([2, 0, 1]), requires_grad=False)

# input is of size nBatch x nClasses = 2 x 4
# Y_pred are logits (not softmax)
Y_pred1 = Variable(torch.Tensor([[0.1, 0.2, 0.9],
                                 [1.1, 0.1, 0.2],
                                 [0.2, 2.1, 0.1]]))


Y_pred2 = Variable(torch.Tensor([[0.8, 0.2, 0.3],
                                 [0.2, 0.3, 0.5],
                                 [0.2, 0.2, 0.5]]))

l1 = loss(Y_pred1, Y)
l2 = loss(Y_pred2, Y)

아래의 코드는 MNIST dataset을 사용하여 4개의 hidden layer로 구축한 뉴럴 네트워크 예시입니다. activation function으로는 relu를 사용했습니다. Train data와 Test data를 따로 불러와 최종 Accuracy를 구할 수 있습니다.

# https://github.com/pytorch/examples/blob/master/mnist/main.py
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

# Training settings
batch_size = 64

# MNIST Dataset
train_dataset = datasets.MNIST(root='./mnist_data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='./mnist_data/',
                              train=False,
                              transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(784, 520)
        self.l2 = nn.Linear(520, 320)
        self.l3 = nn.Linear(320, 240)
        self.l4 = nn.Linear(240, 120)
        self.l5 = nn.Linear(120, 10)

    def forward(self, x):
        x = x.view(-1, 784)  # Flatten the data (n, 1, 28, 28)-> (n, 784)
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = F.relu(self.l3(x))
        x = F.relu(self.l4(x))
        return self.l5(x)


model = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)