0x- Monster GANs: generate monster art
In this blog, we are going to generate monsters using a Generative Adversarial Network (GAN). It is a special type of generative model that can generate novel outputs given some set of inputs. GANs learn features/traits from the input data and produce outputs with similar characteristics. Generated monsters are a great starting point for further artwork.
The hobgoblins under my bed thank you for sharing.
We will turn random colored noise (64x64 pixels):
Into monsters like these (64x64 pixels):
We will produce a cool animation at the end, so read on!
Here are the steps we’ll take:
1 — Collect and prepare the data.
2 — Create a GAN.
3 — Train the GAN.
4 — Generate monsters!
1 — Collect and prepare the data
We need data that look like monsters. I started scrounging up pixel art on pinterest and other places, but the problem with most of that is that it's not at the same resolution, and it's not at the same style. You can deal with both these issues, but I wanted to keep things simple. I settled on using some open source pokemon art found on two different pokemon community forums. Overall in my dataset, there are 14,774 JPG images of varying sizes. I'm calling this dataset the mgan, 'monster gan,' data set.
2 — Setup and GAN Creation
Now we'll start laying down some code. We're going to use pytorch to create our model and train it. I do not recommend that you run this without a powerful GPU. If you need a GPU, check out Floydhub. You can pay by the hour there, and it's pretty affordable. In a GAN, our generator model will learn how to make monsters by trying to make some, then asking another model, 'does my monster look like a real monster?' This other model will be called the discriminator. The discriminator will tell the generator how bad the generator's monster was. The generator will then take this information and improve its parameters to be able to make better monsters. We repeat this process 100–200 times and eventually get a generator that can create novel monsters.
Let’s load some data:
import os | |
import numpy as np | |
import matplotlib.pyplot as plt | |
import torch | |
import torch.nn as nn | |
import torch.optim as optim | |
from torch.autograd import Variable | |
from torch.utils import data | |
from torchvision.datasets import ImageFolder | |
from torchvision import transforms, datasets | |
from torchvision.utils import make_grid | |
# the number of images to process in one go | |
batch_size = 64 | |
# the path where our images are | |
image_path = os.path.join('/scratch','yns207','imgs','mgan') | |
# check that we have access to a GPU, pytorch version, and the number of gpus | |
print('do we have gpu access? {} \n what is torch version? {} \n how many gpus? {}'.format(torch.cuda.is_available(), | |
torch.__version__, | |
torch.cuda.device_count())) | |
# this loads the monster data | |
# scales it to be 64x64, converts | |
# it to a torch tensor and then | |
# normalizes the input to be between | |
# -1 and 1, also shuffle the dataset | |
monster_transform = transforms.Compose([ | |
transforms.Scale(64), | |
transforms.CenterCrop(64), | |
transforms.ToTensor(), | |
transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5)), | |
]) | |
# this loads the actual files and applies the above transformation | |
monster_dataset = ImageFolder(image_path, monster_transform) | |
# thes describes how to load that data, whether to shuffle, | |
# how mnay cpus (workers to use), ett. it gets batches of | |
# data so for example grabbing 64 images at once | |
monster_loader = data.DataLoader(dataset=monster_dataset, | |
batch_size=batch_size, | |
shuffle=True, | |
num_workers=1) | |
# check the shape of our pach of images, | |
# this seems right, we have 64 images and | |
# they are sized 64x64x3 | |
for img in monster_loader: | |
# renomalize a single image | |
single_image = img[0][0] | |
single_image= (single_image*0.5)+0.5 | |
single_image = single_image.clamp(0,1) | |
single_image = single_image.numpy() | |
# move the dimensions around to get them right | |
single_image = np.transpose(single_image, (1, 2, 0)) | |
# plot image | |
print('image size: ',single_image.shape) | |
plt.imshow(single_image) | |
plt.axis('off') | |
plt.show() | |
break |
This code should output a pokemon, mine looks like this:
I wanted to check that we could read and output one image from the mgan dataset. Now we’ll get to the good stuff. The GAN, the stuff that generates new monsters:
# parameters for various parts of the model | |
n_epochs = 125 | |
lr = 0.0002 | |
label_smooth = 0.9 | |
pokemon_models = os.path.join('/scratch', 'yns207', 'pokemon_models') | |
noise_dim = 100 | |
d_filter_depth_in = 3 | |
# create our generator network | |
# this network will take in | |
# random noise and output a | |
# monster. | |
class Generator(nn.Module): | |
# define the model it has 5 transpose | |
# convolutions and uses relu activations | |
# it has a TanH activation on the last | |
# layer | |
def __init__(self): | |
super(Generator, self).__init__() | |
self.main = nn.Sequential( | |
nn.ConvTranspose2d(noise_dim, | |
512, | |
kernel_size=4, | |
stride=1, | |
padding=0, | |
bias=False), | |
nn.BatchNorm2d(512), | |
nn.ReLU(), | |
nn.ConvTranspose2d(512, | |
256, | |
kernel_size=4, | |
stride=2, | |
padding=1, | |
bias=False), | |
nn.BatchNorm2d(256), | |
nn.ReLU(), | |
nn.ConvTranspose2d(256, | |
128, | |
kernel_size=4, | |
stride=2, | |
padding=1, | |
bias=False), | |
nn.BatchNorm2d(128), | |
nn.ReLU(), | |
nn.ConvTranspose2d(128, | |
64, | |
kernel_size=4, | |
stride=2, | |
padding=1, | |
bias=False), | |
nn.BatchNorm2d(64), | |
nn.ReLU(), | |
nn.ConvTranspose2d(64, | |
d_filter_depth_in, | |
kernel_size=4, | |
stride=2, | |
padding=1, | |
bias=False), | |
nn.Tanh() | |
) | |
# define how to propagate | |
# through this network | |
def forward(self, inputs): | |
output = self.main(inputs) | |
return output | |
# create the model that will evaluate | |
# the generated monsters | |
class Discriminator(nn.Module): | |
def __init__(self): | |
super(Discriminator, self).__init__() | |
self.main = nn.Sequential( | |
nn.Conv2d(in_channels=d_filter_depth_in, | |
out_channels=64, | |
kernel_size=4, | |
stride=2, | |
padding=1, | |
bias=False), | |
nn.LeakyReLU(0.2), | |
nn.Conv2d(in_channels=64, | |
out_channels=128, | |
kernel_size=3, | |
stride=2, | |
padding=1, | |
bias=False), | |
nn.BatchNorm2d(128), | |
nn.LeakyReLU(0.2), | |
nn.Conv2d(in_channels=128, | |
out_channels=256, | |
kernel_size=4, | |
stride=2, | |
padding=1, | |
bias=False), | |
nn.BatchNorm2d(256), | |
nn.LeakyReLU(0.2), | |
nn.Conv2d(in_channels=256, | |
out_channels=512, | |
kernel_size=4, | |
stride=2, | |
padding=1, | |
bias=False), | |
nn.BatchNorm2d(512), | |
nn.LeakyReLU(0.2), | |
nn.Conv2d(in_channels=512, | |
out_channels=1, | |
kernel_size=4, | |
stride=1, | |
padding=0, | |
bias=False), | |
nn.Sigmoid() | |
) | |
# define forward porpagation | |
# through that model | |
def forward(self, inputs): | |
output = self.main(inputs) | |
return output.view(-1, 1).squeeze(1) | |
# utility functions | |
# this iniitilaizes the parameters | |
# to good rnadom values, you can | |
# do more research on your own | |
def weights_init(m): | |
classname = m.__class__.__name__ | |
if classname.find('Conv2d') != -1: | |
m.weight.data.normal_(0.0, 0.02) | |
elif classname.find('BatchNorm2d') != -1: | |
m.weight.data.normal_(1.0,0.02) | |
m.bias.data.fill_(0) | |
# this converts any pytorch tensor, | |
# an n-dimensional array, to a | |
# variable and puts it on a gpu if | |
# a one is available | |
def to_variable(x): | |
''' | |
convert a tensor to a variable | |
with gradient tracking | |
''' | |
if torch.cuda.is_available(): | |
x = x .cuda() | |
return Variable(x) | |
# we're going normalize our images | |
# to make training the generator easier | |
# this de-normalizes the images coming out | |
# of the generator so they look intelligble | |
def denorm_monsters(x): | |
renorm = (x*0.5)+0.5 | |
return renorm.clamp(0,1) | |
| |
# this plots a bunch of pokemon | |
# at the end of each trainign round so | |
# we can get a sense for how our network | |
# is doing. | |
def plot_figure(fixed_noise): | |
plt.figure() | |
fixed_imgs = generator(fixed_noise) | |
result = denorm_monsters(fixed_imgs.cpu().data) | |
result = make_grid(result) | |
result = transforms.Compose([transforms.ToPILImage()])(result) | |
plt.imshow(result) | |
plt.axis('off') | |
plt.show() | |
# create a generator and | |
# initialize its weights | |
generator = Generator() | |
generator = generator.apply(weights_init) | |
# create a discriminator and | |
# initialize its weights | |
discriminator = Discriminator() | |
discriminator = discriminator.apply(weights_init) | |
# create a loss object and optimizers | |
loss_func = nn.BCELoss() | |
d_optimizer = optim.Adam(discriminator.parameters(), lr=lr, betas=(0.5, 0.99)) | |
g_optimizer = optim.Adam(generator.parameters(), lr=lr, betas=(0.5, 0.99)) | |
# it a gpu is available, move all | |
# the models and the loss function | |
# to the gpu (more performant) | |
if torch.cuda.is_available(): | |
generator.cuda() | |
discriminator.cuda() | |
loss_func.cuda() | |
# create a fixed_noise variable so we can evaluate results | |
# consistently. if we don't do this we'll get different monsters | |
# everytime we re-run and it will be hard to eavluate our generator | |
fixed_noise = to_variable(torch.randn(batch_size, noise_dim, 1, 1)) |
This looks long, but if you read the comments, it’s not too bad. It describes a generator model and a discriminator model and then creates one of each.
3 — Training the GAN
# keep track of losses | |
# from the generator and | |
# discriminator | |
generator_losses = [] | |
discriminator_losses = [] | |
# for some number of rounds (epochs) | |
for epoch in range(n_epochs): | |
# track losses for this epoch | |
gen_loss_epoch = 0 | |
dis_loss_epoch_fake = 0 | |
dis_loss_epoch_real = 0 | |
batches_processed = 0 | |
# for every batch of images | |
for i, image_batch in enumerate(monster_loader): | |
# track the number of batches processed | |
batches_processed += 1 | |
# get the batch size of the | |
# current batch () | |
image_batch = image_batch[0] | |
batch_size = image_batch.shape[0] | |
# --- train discriminator --- | |
# clear gradients | |
discriminator.zero_grad() | |
# train discriminator on real images | |
real_images = to_variable(image_batch) | |
real_outputs = discriminator(real_images) | |
real_loss = loss_func(real_outputs, to_variable(torch.ones(real_outputs.data.shape))*label_smooth) | |
real_loss.backward() | |
dis_loss_epoch_real += torch.mean(real_loss.data) | |
# train dsicriminator on generated images | |
noise = to_variable(torch.randn(batch_size, noise_dim, 1, 1)) | |
fake_images = generator(noise) | |
fake_outputs = discriminator(fake_images) | |
fake_loss = loss_func(fake_outputs, to_variable(torch.zeros(fake_outputs.data.shape))) | |
fake_loss.backward() | |
dis_loss_epoch_fake += torch.mean(fake_loss.data) | |
# update discriminator params | |
d_optimizer.step() | |
# --- train generator --- | |
generator.zero_grad() | |
# generate noise and feed it to the generator | |
# to make an image | |
noise = to_variable(torch.randn(batch_size, noise_dim, 1, 1)) | |
fake_images = generator(noise) | |
dis_outputs = discriminator(fake_images) | |
gen_loss = loss_func(dis_outputs, to_variable(torch.ones(dis_outputs.data.shape))) | |
gen_loss.backward() | |
gen_loss_epoch += torch.mean(gen_loss.data) | |
# update generator params | |
g_optimizer.step() | |
discriminator_losses.append([dis_loss_epoch_real/batches_processed, dis_loss_epoch_fake/batches_processed]) | |
generator_losses.append(gen_loss_epoch/batches_processed) | |
print('epoch {}'.format(epoch)) | |
print('generator loss: {:0.2f}, discriminator loss real: {:0.2f}, discriminator loss fake: {:0.2f}'.format(generator_losses[-1], discriminator_losses[-1][0], discriminator_losses[-1][1])) | |
plot_figure(fixed_noise) | |
# save this epoch's model. | |
torch.save(generator.state_dict(), os.path.join(pokemon_models, 'generator_ep_%d' % epoch)) | |
torch.save(discriminator.state_dict(), os.path.join(pokemon_models, 'discriminator_ep_%d' % epoch)) |
During training, this should output lines like so:
They show the epoch and how bad the generator is at generating images in each round. The discriminator losses are shown as well; they show how good or bad the discriminator is at its job of criticizing the generated pieces. These discriminator losses are not necessarily interpretable. Each line reveals an image of 64 sample monsters created by the generator from the fixed_noise value at that training step. The quality of these monsters should be increasing steadily. Every iteration’s generator saves itself to disk with the epoch (iteration) tacked onto the end, so if you see one that generates particularly realistic monsters or a style you prefer, you can use that generator from that epoch.
4 — Generating Monsters!
# create new noise to pass to the generator | |
noise = to_variable(torch.randn(batch_size, noise_dim, 1, 1)) | |
# load the generator from epoch 90 of training | |
load_model = os.path.join(pokemon_models, 'generator_dec317_ep_%d' % 90) | |
generator_final = Generator() | |
generator_final.load_state_dict(torch.load(load_model)) | |
generator_final.cuda() | |
# generate new monsters | |
fixed_imgs = generator_final(noise) | |
result = denorm_monsters(fixed_imgs.cpu().data) | |
result = make_grid(result) | |
result = transforms.Compose([transforms.ToPILImage()])(result) | |
# plot those monsters | |
plt.figure(figsize=(20,15)) | |
plt.imshow(result) | |
plt.axis('off') | |
_ = plt.show() |
I trained 125 generator epochs, and I decided that the generator from epoch 90 was the best one (choosing is more art than science). After generating the monsters, you may notice most of the monsters generated look like blobs or cronenberg monsters. This doesn't matter; we can generate an infinite number of monsters by passing in new random noise. As long as a few are decent, we can keep generating monsters until we get enough good ones. Let's generate more!
# create new noise to pass to the generator | |
noise = to_variable(torch.randn(batch_size, noise_dim, 1, 1)) | |
# load the generator from epoch 90 of training | |
load_model = os.path.join(pokemon_models, 'generator_dec317_ep_%d' % 90) | |
generator_final = Generator() | |
generator_final.load_state_dict(torch.load(load_model)) | |
generator_final.cuda() | |
# generate new monsters | |
fixed_imgs = generator_final(noise) | |
result = denorm_monsters(fixed_imgs.cpu().data) | |
result = make_grid(result) | |
result = transforms.Compose([transforms.ToPILImage()])(result) | |
# plot those monsters | |
plt.figure(figsize=(20,15)) | |
plt.imshow(result) | |
plt.axis('off') | |
_ = plt.show() |
This could output something like this:
Most of these look like blobs, but some are cool. These two are my favorite from this batch:
If you re-run the 'noise =' line in the code, you'll keep getting new monsters. You should be able to build up a pretty decent collection of unique monsters.
Conclusion and GIF
We walked through how to use a GAN to generate monsters. You can use them in your games/game prototypes. You can use them as a starting point for more developed artwork. You can use them to get ideas. GANs are a really great creative tool. One warning, though: it is possible to spend a week messing with a GAN and have it not work. Stick to the numbers here, and you should be ok.
Room for improvement:
removing colored backgrounds from the inputs that have those would probably improve our generator, dark backgrounds were awful
more experimentation with leaky relu activations in the generator
trying a wasserstein gan loss function
As promised below is an animation that illustrates the process of training of the generator each frame is a snapshot of its progress as it goes from grainy to crisp and detailed:
If you like this post share it with a like-minded friend.