Deep Learning with PyTorch

Classical machine learning relies on using statistics to determine relationships between features and labels, and can be very effective for creating predictive models. However, a massive growth in the availability of data coupled with advances in the computing technology required to process it has led to the emergence of new machine learning techniques that mimic the way the brain processes information in a structure called an artificial neural network.

PyTorch is a framework for creating machine learning models, including deep neural networks (DNNs). In this example, we'll use PyTorch to create a simple neural network that classifies penguins into species based on the length and depth of their culmen (bill), their flipper length, and their body mass.

Citation:The penguins dataset used in the this exercise is a subset of data collected and made available by Dr. KristenGorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.

Explore the Dataset

Before we start using PyTorch to create a model, let's load the data we need from the Palmer Islands penguins dataset, which contains observations of three different species of penguin.

Note:In reality, you can solve the penguin classification problem easily using classical machine learning techniques without the need for a deep learning model; but it's a useful, easy to understand dataset with which to demonstrate the principles of neural networks in this notebook.

import pandas as pd

# load the training dataset (excluding rows with null values)
penguins = pd.read_csv('../ml-basics/data/penguins.csv').dropna()

# Deep Learning models work best when features are on similar scales
# In a real solution, we'd implement some custom normalization for each feature, but to keep things simple
# we'll just rescale the FlipperLength and BodyMass so they're on a similar scale to the bill measurements
penguins['FlipperLength'] = penguins['FlipperLength']/10
penguins['BodyMass'] = penguins['BodyMass']/100

# The dataset is too small to be useful for deep learning
# So we'll oversample it to increase its size
for i in range(1,3):
    penguins = penguins.append(penguins)

# Display a random sample of 10 observations
sample = penguins.sample(10)
sample
CulmenLength CulmenDepth FlipperLength BodyMass Species
328 45.7 17.3 19.3 36.00 2
212 45.3 13.8 20.8 42.00 1
109 43.2 19.0 19.7 47.75 0
273 50.4 15.7 22.2 57.50 1
94 36.2 17.3 18.7 33.00 0
62 37.6 17.0 18.5 36.00 0
275 49.9 16.1 21.3 54.00 1
173 45.1 14.5 21.5 50.00 1
120 36.2 17.2 18.7 31.50 0
323 49.0 19.6 21.2 43.00 2

The Species column is the label our model will predict. Each label value represents a class of penguin species, encoded as 0, 1, or 2. The following code shows the actual species to which these class labels corrrespond.

penguin_classes = ['Adelie', 'Gentoo', 'Chinstrap']
print(sample.columns[0:5].values, 'SpeciesName')
for index, row in penguins.sample(10).iterrows():
    print('[',row[0], row[1], row[2],row[3], int(row[4]), ']',penguin_classes[int(row[-1])])
['CulmenLength' 'CulmenDepth' 'FlipperLength' 'BodyMass' 'Species'] SpeciesName
[ 37.8 17.3 18.0 37.0 0 ] Adelie
[ 37.8 17.1 18.6 33.0 0 ] Adelie
[ 38.6 21.2 19.1 38.0 0 ] Adelie
[ 38.2 18.1 18.5 39.5 0 ] Adelie
[ 34.1 18.1 19.3 34.75 0 ] Adelie
[ 48.1 16.4 19.9 33.25 2 ] Chinstrap
[ 48.4 16.3 22.0 54.0 1 ] Gentoo
[ 46.5 14.5 21.3 44.0 1 ] Gentoo
[ 45.3 13.7 21.0 43.0 1 ] Gentoo
[ 48.5 17.5 19.1 34.0 2 ] Chinstrap

As is common in a supervised learning problem, we'll split the dataset into a set of records with which to train the model, and a smaller set with which to validate the trained model.

from sklearn.model_selection import train_test_split

features = ['CulmenLength','CulmenDepth','FlipperLength','BodyMass']
label = 'Species'
   
# Split data 70%-30% into training set and test set
x_train, x_test, y_train, y_test = train_test_split(penguins[features].values,
                                                    penguins[label].values,
                                                    test_size=0.30,
                                                    random_state=0)

print ('Training Set: %d, Test Set: %d \n' % (len(x_train), len(x_test)))
print("Sample of features and labels:")

# Take a look at the first 25 training features and corresponding labels
for n in range(0,24):
    print(x_train[n], y_train[n], '(' + penguin_classes[y_train[n]] + ')')
Training Set: 957, Test Set: 411 

Sample of features and labels:
[51.1 16.5 22.5 52.5] 1 (Gentoo)
[50.7 19.7 20.3 40.5] 2 (Chinstrap)
[49.5 16.2 22.9 58. ] 1 (Gentoo)
[39.3 20.6 19.  36.5] 0 (Adelie)
[42.5 20.7 19.7 45. ] 0 (Adelie)
[50.  15.3 22.  55.5] 1 (Gentoo)
[50.2  18.7  19.8  37.75] 2 (Chinstrap)
[50.7 19.7 20.3 40.5] 2 (Chinstrap)
[49.1  14.5  21.2  46.25] 1 (Gentoo)
[43.2 16.6 18.7 29. ] 2 (Chinstrap)
[38.8  17.6  19.1  32.75] 0 (Adelie)
[37.8 17.1 18.6 33. ] 0 (Adelie)
[45.8 14.2 21.9 47. ] 1 (Gentoo)
[43.8 13.9 20.8 43. ] 1 (Gentoo)
[36.  17.1 18.7 37. ] 0 (Adelie)
[43.3 13.4 20.9 44. ] 1 (Gentoo)
[36.  18.5 18.6 31. ] 0 (Adelie)
[41.1  19.   18.2  34.25] 0 (Adelie)
[33.1 16.1 17.8 29. ] 0 (Adelie)
[40.9 13.7 21.4 46.5] 1 (Gentoo)
[45.2 17.8 19.8 39.5] 2 (Chinstrap)
[48.4 14.6 21.3 58.5] 1 (Gentoo)
[43.6 13.9 21.7 49. ] 1 (Gentoo)
[38.5  17.9  19.   33.25] 0 (Adelie)

The features are the measurements for each penguin observation, and the label is a numeric value that indicates the species of penguin that the observation represents (Adelie, Gentoo, or Chinstrap).

Install and import the PyTorch libraries

Since we plan to use PyTorch to create our penguin classifier, we'll need to run the following two cells to install and import the PyTorch libraries we intend to use. The specific installation of of PyTorch depends on your operating system and whether your computer has graphics processing units (GPUs) that can be used for high-performance processing via cuda. You can find detailed instructions at https://pytorch.org/get-started/locally/.

!pip install torch==1.9.0+cpu torchvision==0.10.0+cpu torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
import torch
import torch.nn as nn
import torch.utils.data as td

# Set random seed for reproducability
torch.manual_seed(0)

print("Libraries imported - ready to use PyTorch", torch.__version__)
/Users/tc390714/miniconda3/envs/mle/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Libraries imported - ready to use PyTorch 1.12.1

Prepare the data for PyTorch

PyTorch makes use of data loaders to load training and validation data in batches. We've already loaded the data into numpy arrays, but we need to wrap those in PyTorch datasets (in which the data is converted to PyTorch tensor objects) and create loaders to read batches from those datasets.

train_x = torch.Tensor(x_train).float()
train_y = torch.Tensor(y_train).long()
train_ds = td.TensorDataset(train_x,train_y)
train_loader = td.DataLoader(train_ds, batch_size=20,
    shuffle=False, num_workers=1)

# Create a dataset and loader for the test data and labels
test_x = torch.Tensor(x_test).float()
test_y = torch.Tensor(y_test).long()
test_ds = td.TensorDataset(test_x,test_y)
test_loader = td.DataLoader(test_ds, batch_size=20,
    shuffle=False, num_workers=1)
print('Ready to load data')
Ready to load data

Define a neural network

Now we're ready to define our neural network. In this case, we'll create a network that consists of 3 fully-connected layers:

  • An input layer that receives an input value for each feature (in this case, the four penguin measurements) and applies a ReLU activation function.
  • A hidden layer that receives ten inputs and applies a ReLU activation function.
  • An output layer that generates a non-negative numeric output for each penguin species (which a loss function will translate into classification probabilities for each of the three possible penguin species).
hl = 10

# Define the neural network
class PenguinNet(nn.Module):
    def __init__(self):
        super(PenguinNet, self).__init__()
        self.fc1 = nn.Linear(len(features), hl)
        self.fc2 = nn.Linear(hl, hl)
        self.fc3 = nn.Linear(hl, len(penguin_classes))

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        return x

# Create a model instance from the network
model = PenguinNet()
print(model)
PenguinNet(
  (fc1): Linear(in_features=4, out_features=10, bias=True)
  (fc2): Linear(in_features=10, out_features=10, bias=True)
  (fc3): Linear(in_features=10, out_features=3, bias=True)
)

Train the model

To train the model, we need to repeatedly feed the training values forward through the network, use a loss function to calculate the loss, use an optimizer to backpropagate the weight and bias value adjustments, and validate the model using the test data we withheld.

To do this, we'll create a function to train and optimize the model, and function to test the model. Then we'll call these functions iteratively over 50 epochs, logging the loss and accuracy statistics for each epoch.

def train(model, data_loader, optimizer):
    # Set the model to training mode
    model.train()
    train_loss = 0
    
    for batch, tensor in enumerate(data_loader):
        data, target = tensor
        #feedforward
        optimizer.zero_grad()
        out = model(data)
        loss = loss_criteria(out, target)
        train_loss += loss.item()

        # backpropagate
        loss.backward()
        optimizer.step()

    #Return average loss
    avg_loss = train_loss / (batch+1)
    print('Training set: Average loss: {:.6f}'.format(avg_loss))
    return avg_loss
           
            
def test(model, data_loader):
    # Switch the model to evaluation mode (so we don't backpropagate)
    model.eval()
    test_loss = 0
    correct = 0

    with torch.no_grad():
        batch_count = 0
        for batch, tensor in enumerate(data_loader):
            batch_count += 1
            data, target = tensor
            # Get the predictions
            out = model(data)

            # calculate the loss
            test_loss += loss_criteria(out, target).item()

            # Calculate the accuracy
            _, predicted = torch.max(out.data, 1)
            correct += torch.sum(target==predicted).item()
            
    # Calculate the average loss and total accuracy for this epoch
    avg_loss = test_loss/batch_count
    print('Validation set: Average loss: {:.6f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        avg_loss, correct, len(data_loader.dataset),
        100. * correct / len(data_loader.dataset)))
    
    # return average loss for the epoch
    return avg_loss

# Specify the loss criteria (we'll use CrossEntropyLoss for multi-class classification)
loss_criteria = nn.CrossEntropyLoss()

# Use an "Adam" optimizer to adjust weights
# (see https://pytorch.org/docs/stable/optim.html#algorithms for details of supported algorithms)
learning_rate = 0.001
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
optimizer.zero_grad()

# We'll track metrics for each epoch in these arrays
epoch_nums = []
training_loss = []
validation_loss = []

# Train over 50 epochs
epochs = 50
for epoch in range(1, epochs + 1):

    # print the epoch number
    print('Epoch: {}'.format(epoch))
    
    # Feed training data into the model to optimize the weights
    train_loss = train(model, train_loader, optimizer)
    
    # Feed the test data into the model to check its performance
    test_loss = test(model, test_loader)
    
    # Log the metrics for this epoch
    epoch_nums.append(epoch)
    training_loss.append(train_loss)
    validation_loss.append(test_loss)
Epoch: 1
Training set: Average loss: 1.118814
Validation set: Average loss: 1.023595, Accuracy: 148/411 (36%)

Epoch: 2
Training set: Average loss: 1.010274
Validation set: Average loss: 0.983460, Accuracy: 163/411 (40%)

Epoch: 3
Training set: Average loss: 0.965314
Validation set: Average loss: 0.934165, Accuracy: 191/411 (46%)

Epoch: 4
Training set: Average loss: 0.911513
Validation set: Average loss: 0.867269, Accuracy: 250/411 (61%)

Epoch: 5
Training set: Average loss: 0.817720
Validation set: Average loss: 0.742112, Accuracy: 272/411 (66%)

Epoch: 6
Training set: Average loss: 0.733329
Validation set: Average loss: 0.691639, Accuracy: 302/411 (73%)

Epoch: 7
Training set: Average loss: 0.696301
Validation set: Average loss: 0.661350, Accuracy: 312/411 (76%)

Epoch: 8
Training set: Average loss: 0.671731
Validation set: Average loss: 0.640087, Accuracy: 327/411 (80%)

Epoch: 9
Training set: Average loss: 0.653092
Validation set: Average loss: 0.624311, Accuracy: 338/411 (82%)

Epoch: 10
Training set: Average loss: 0.638097
Validation set: Average loss: 0.610605, Accuracy: 345/411 (84%)

Epoch: 11
Training set: Average loss: 0.625696
Validation set: Average loss: 0.598022, Accuracy: 345/411 (84%)

Epoch: 12
Training set: Average loss: 0.614685
Validation set: Average loss: 0.588183, Accuracy: 353/411 (86%)

Epoch: 13
Training set: Average loss: 0.605506
Validation set: Average loss: 0.578678, Accuracy: 358/411 (87%)

Epoch: 14
Training set: Average loss: 0.597361
Validation set: Average loss: 0.569911, Accuracy: 361/411 (88%)

Epoch: 15
Training set: Average loss: 0.590228
Validation set: Average loss: 0.562248, Accuracy: 361/411 (88%)

Epoch: 16
Training set: Average loss: 0.583250
Validation set: Average loss: 0.556146, Accuracy: 372/411 (91%)

Epoch: 17
Training set: Average loss: 0.576846
Validation set: Average loss: 0.549725, Accuracy: 375/411 (91%)

Epoch: 18
Training set: Average loss: 0.571098
Validation set: Average loss: 0.544390, Accuracy: 382/411 (93%)

Epoch: 19
Training set: Average loss: 0.565975
Validation set: Average loss: 0.540335, Accuracy: 384/411 (93%)

Epoch: 20
Training set: Average loss: 0.561476
Validation set: Average loss: 0.536972, Accuracy: 389/411 (95%)

Epoch: 21
Training set: Average loss: 0.557517
Validation set: Average loss: 0.532509, Accuracy: 390/411 (95%)

Epoch: 22
Training set: Average loss: 0.553931
Validation set: Average loss: 0.529417, Accuracy: 396/411 (96%)

Epoch: 23
Training set: Average loss: 0.550773
Validation set: Average loss: 0.528216, Accuracy: 397/411 (97%)

Epoch: 24
Training set: Average loss: 0.547976
Validation set: Average loss: 0.523656, Accuracy: 397/411 (97%)

Epoch: 25
Training set: Average loss: 0.545466
Validation set: Average loss: 0.521025, Accuracy: 397/411 (97%)

Epoch: 26
Training set: Average loss: 0.543647
Validation set: Average loss: 0.519855, Accuracy: 400/411 (97%)

Epoch: 27
Training set: Average loss: 0.542047
Validation set: Average loss: 0.517385, Accuracy: 398/411 (97%)

Epoch: 28
Training set: Average loss: 0.540234
Validation set: Average loss: 0.515388, Accuracy: 400/411 (97%)

Epoch: 29
Training set: Average loss: 0.538976
Validation set: Average loss: 0.512899, Accuracy: 401/411 (98%)

Epoch: 30
Training set: Average loss: 0.537303
Validation set: Average loss: 0.512066, Accuracy: 404/411 (98%)

Epoch: 31
Training set: Average loss: 0.536062
Validation set: Average loss: 0.511284, Accuracy: 404/411 (98%)

Epoch: 32
Training set: Average loss: 0.534580
Validation set: Average loss: 0.508444, Accuracy: 404/411 (98%)

Epoch: 33
Training set: Average loss: 0.533200
Validation set: Average loss: 0.507806, Accuracy: 404/411 (98%)

Epoch: 34
Training set: Average loss: 0.532376
Validation set: Average loss: 0.505557, Accuracy: 404/411 (98%)

Epoch: 35
Training set: Average loss: 0.531220
Validation set: Average loss: 0.503028, Accuracy: 404/411 (98%)

Epoch: 36
Training set: Average loss: 0.529759
Validation set: Average loss: 0.502396, Accuracy: 404/411 (98%)

Epoch: 37
Training set: Average loss: 0.528576
Validation set: Average loss: 0.501712, Accuracy: 404/411 (98%)

Epoch: 38
Training set: Average loss: 0.527694
Validation set: Average loss: 0.499238, Accuracy: 404/411 (98%)

Epoch: 39
Training set: Average loss: 0.526515
Validation set: Average loss: 0.498586, Accuracy: 404/411 (98%)

Epoch: 40
Training set: Average loss: 0.525752
Validation set: Average loss: 0.496938, Accuracy: 404/411 (98%)

Epoch: 41
Training set: Average loss: 0.524745
Validation set: Average loss: 0.496314, Accuracy: 405/411 (99%)

Epoch: 42
Training set: Average loss: 0.524034
Validation set: Average loss: 0.494481, Accuracy: 404/411 (98%)

Epoch: 43
Training set: Average loss: 0.523150
Validation set: Average loss: 0.492949, Accuracy: 404/411 (98%)

Epoch: 44
Training set: Average loss: 0.522167
Validation set: Average loss: 0.492328, Accuracy: 404/411 (98%)

Epoch: 45
Training set: Average loss: 0.521537
Validation set: Average loss: 0.490820, Accuracy: 401/411 (98%)

Epoch: 46
Training set: Average loss: 0.521010
Validation set: Average loss: 0.489736, Accuracy: 401/411 (98%)

Epoch: 47
Training set: Average loss: 0.520252
Validation set: Average loss: 0.489686, Accuracy: 404/411 (98%)

Epoch: 48
Training set: Average loss: 0.519929
Validation set: Average loss: 0.488752, Accuracy: 401/411 (98%)

Epoch: 49
Training set: Average loss: 0.519249
Validation set: Average loss: 0.488609, Accuracy: 405/411 (99%)

Epoch: 50
Training set: Average loss: 0.518899
Validation set: Average loss: 0.487255, Accuracy: 401/411 (98%)

While the training process is running, let's try to understand what's happening:

  1. In each epoch, the full set of training data is passed forward through the network. There are four features for each observation, and four corresponding nodes in the input layer - so the features for each observation are passed as a vector of four values to that layer. However, for efficiency, the feature vectors are grouped into batches; so actually a matrix of multiple feature vectors is fed in each time.
  2. The matrix of feature values is processed by a function that performs a weighted sum using initialized weights and bias values. The result of this function is then processed by the activation function for the input layer to constrain the values passed to the nodes in the next layer.
  3. The weighted sum and activation functions are repeated in each layer. Note that the functions operate on vectors and matrices rather than individual scalar values. In other words, the forward pass is essentially a series of nested linear algebra functions. This is the reason data scientists prefer to use computers with graphical processing units (GPUs), since these are optimized for matrix and vector calculations.
  4. In the final layer of the network, the output vectors contain a calculated value for each possible class (in this case, classes 0, 1, and 2). This vector is processed by a loss function that converts these values to probabilities and determines how far they are from the expected values based on the actual classes - so for example, suppose the output for a Gentoo penguin (class 1) observation is [0.3, 0.4, 0.3]. The correct prediction would be [0.0, 1.0, 0.0], so the variance between the predicted and actual values (how far away each predicted value is from what it should be) is [0.3, 0.6, 0.3]. This variance is aggregated for each batch and maintained as a running aggregate to calculate the overall level of error (loss) incurred by the training data for the epoch.
  5. At the end of each epoch, the validation data is passed through the network, and its loss and accuracy (proportion of correct predictions based on the highest probability value in the output vector) are also calculated. It's important to do this because it enables us to compare the performance of the model using data on which it was not trained, helping us determine if it will generalize well for new data or if it's overfitted to the training data.
  6. After all the data has been passed forward through the network, the output of the loss function for the training data (but not the validation data) is passed to the opimizer. The precise details of how the optimizer processes the loss vary depending on the specific optimization algorithm being used; but fundamentally you can think of the entire network, from the input layer to the loss function as being one big nested (composite) function. The optimizer applies some differential calculus to calculate partial derivatives for the function with respect to each weight and bias value that was used in the network. It's possible to do this efficiently for a nested function due to something called the chain rule, which enables you to determine the derivative of a composite function from the derivatives of its inner function and outer functions. You don't really need to worry about the details of the math here (the optimizer does it for you), but the end result is that the partial derivatives tell us about the slope (or gradient) of the loss function with respect to each weight and bias value - in other words, we can determine whether to increase or decrease the weight and bias values in order to decrease the loss.
  7. Having determined in which direction to adjust the weights and biases, the optimizer uses the learning rate to determine by how much to adjust them; and then works backwards through the network in a process called backpropagation to assign new values to the weights and biases in each layer.
  8. Now the next epoch repeats the whole training, validation, and backpropagation process starting with the revised weights and biases from the previous epoch - which hopefully will result in a lower level of loss.
  9. The process continues like this for 50 epochs.

Review training and validation loss

After training is complete, we can examine the loss metrics we recorded while training and validating the model. We're really looking for two things:

  • The loss should reduce with each epoch, showing that the model is learning the right weights and biases to predict the correct labels.
  • The training loss and validation loss should follow a similar trend, showing that the model is not overfitting to the training data.

Let's plot the loss metrics and see:

%matplotlib inline
from matplotlib import pyplot as plt

plt.plot(epoch_nums, training_loss)
plt.plot(epoch_nums, validation_loss)
plt.xlabel('epoch')
plt.ylabel('loss')
plt.legend(['training', 'validation'], loc='upper right')
plt.show()

View the learned weights and biases

The trained model consists of the final weights and biases that were determined by the optimizer during training. Based on our network model we should expect the following values for each layer:

  • Layer 1: There are four input values going to ten output nodes, so there should be 10 x 4 weights and 10 bias values.
  • Layer 2: There are ten input values going to ten output nodes, so there should be 10 x 10 weights and 10 bias values.
  • Layer 3: There are ten input values going to three output nodes, so there should be 3 x 10 weights and 3 bias values.
for param_tensor in model.state_dict():
    print(param_tensor, "\n", model.state_dict()[param_tensor].numpy())
fc1.weight 
 [[-0.00374341  0.2682218  -0.41152257 -0.3679695 ]
 [-0.17916068 -0.08960585  0.11843112  0.51802725]
 [-0.04437202  0.13230628 -0.15110654 -0.09828269]
 [-0.47767425 -0.33114105 -0.20611155  0.01852179]
 [ 0.2208657   0.57115114 -0.40086344 -0.1869742 ]
 [ 0.3158045   0.24776892 -0.20200168  0.39890498]
 [-0.08059168  0.05290705  0.4527381  -0.46383518]
 [-0.3545517  -0.15797205 -0.23337851  0.39141223]
 [-0.32408983 -0.23016644 -0.34932023 -0.4682805 ]
 [-0.4734978   0.80028427  0.3018041   0.1544414 ]]
fc1.bias 
 [ 0.02629578 -0.20744474  0.08459234 -0.46684736 -0.35585785 -0.45410076
  0.31546897  0.2572897  -0.22174752  0.24439514]
fc2.weight 
 [[ 0.20224687  0.3143725   0.12550515  0.04272011  0.21202639 -0.18619564
   0.05892715 -0.24517313 -0.21917307 -0.16335806]
 [ 0.14308453  0.08098823 -0.18731831  0.09553465  0.74755687 -0.01170833
   0.01207405  0.03671876  0.19618031  0.7177287 ]
 [-0.24369258 -0.09593     0.12428063  0.2620103   0.44033977  0.32761893
   0.06293392 -0.24256472  0.02909058 -0.6438863 ]
 [-0.29470977  0.4369506   0.2404469  -0.31544605 -0.6518737  -0.03367813
  -0.05203882 -0.09720273  0.12160733 -0.44795   ]
 [ 0.11592636  0.15991893  0.22637847  0.11824107 -0.31298175 -0.20513597
   0.15789726  0.0661869  -0.24668422 -0.1820901 ]
 [ 0.29749104  0.3398366  -0.13788326 -0.07958971 -1.0037646   0.04011778
  -0.23813814 -0.21048178 -0.01742402 -0.21410413]
 [-0.12950484  0.18764248 -0.19243696  0.2869356   0.21671084 -0.26666948
  -0.07870413  0.01426902  0.04613796  0.07500109]
 [ 0.12409672  0.01894209 -0.15429662  0.1496355  -0.30334112 -0.1874303
  -0.07916126 -0.15403877 -0.11062703 -0.25918713]
 [-0.06726643  0.1659871  -0.20601156 -0.01622862 -0.10633212 -0.07815903
   0.00878868  0.00450951  0.06399861  0.46543378]
 [ 0.29954556  0.20082232  0.3002309  -0.02287012 -0.2840742  -0.14991638
   0.21532115 -0.00204995 -0.15717986 -0.24232906]]
fc2.bias 
 [-0.2959424  -0.09140166 -0.24091294  0.11557584  0.17096573 -0.32246786
  0.19725719 -0.24745122  0.03521878 -0.1282217 ]
fc3.weight 
 [[-0.06091028 -0.06208903 -0.28376698 -0.27304304 -0.04948315  0.0040895
  -0.14365433  0.11912274 -0.28462344 -0.02134135]
 [ 0.27809682 -0.4130026   0.27310097  0.7309681  -0.2853832   0.65255636
  -0.03649095 -0.14116624 -0.00454545 -0.25554216]
 [ 0.03393281 -0.19290859  0.71934223 -0.31080094  0.15194914 -0.33142653
  -0.07604478 -0.06650442 -1.1165307   0.17134616]]
fc3.bias 
 [ 0.25107792  0.10447465 -0.24180874]

Evaluate model performance

So, is the model any good? The raw accuracy reported from the validation data would seem to indicate that it predicts pretty well; but it's typically useful to dig a little deeper and compare the predictions for each possible class. A common way to visualize the performance of a classification model is to create a confusion matrix that shows a crosstab of correct and incorrect predictions for each class.

from sklearn.metrics import confusion_matrix
import numpy as np

# Set the model to evaluate mode
model.eval()

# Get predictions for the test data
x = torch.Tensor(x_test).float()
_, predicted = torch.max(model(x).data, 1)

# Plot the confusion matrix
cm = confusion_matrix(y_test, predicted.numpy())
plt.imshow(cm, interpolation="nearest", cmap=plt.cm.Blues)
plt.colorbar()
tick_marks = np.arange(len(penguin_classes))
plt.xticks(tick_marks, penguin_classes, rotation=45)
plt.yticks(tick_marks, penguin_classes)
plt.xlabel("Predicted Species")
plt.ylabel("Actual Species")
plt.show()

The confusion matrix should show a strong diagonal line indicating that there are more correct than incorrect predictions for each class.

Save the trained model

Now that we have a model we believe is reasonably accurate, we can save its trained weights for use later.

model_file = 'penguin_classifier.pt'
torch.save(model.state_dict(), model_file)
del model
print('model saved as', model_file)
model saved as penguin_classifier.pt

Use the trained model

When we have a new penguin observation, we can use the model to predict the species.

x_new = [[50.4,15.3,20,50]]
print ('New sample: {}'.format(x_new))

# Create a new model class and load weights
model = PenguinNet()
model.load_state_dict(torch.load(model_file))

# Set model to evaluation mode
model.eval()

# Get a prediction for the new data sample
x = torch.Tensor(x_new).float()
_, predicted = torch.max(model(x).data, 1)

print('Prediction:',penguin_classes[predicted.item()])
New sample: [[50.4, 15.3, 20, 50]]
Prediction: Gentoo

Learn more

This notebook was designed to help you understand the basic concepts and principles involved in deep neural networks, using a simple PyTorch example. To learn more about PyTorch, take a look at the tutorials on the PyTorch web site.