Effectively train neural networks on GPU

Piotr Niewiński
2 min readOct 17, 2021

How many deep learning neural networks can you effectively run on a single GPU?

First of all, there is no simple answer to this question. Some big models may not run on an average GPU with 11–12GB of RAM even separately. However, suppose your model is relatively small, e.g. it is a classifier with the following number of variables: 1e5–1e6. Such a model would use very little GPU RAM, and also would not utilize the full GPU computing power (even while training).

Below you can find the example of a small NN model with 3.2e5 trainable variables running training on a single GTX 1080 GPU with 11GB of RAM. As you can see, this model uses 343MB of the GPU memory, and utilizes only about 12% of the GPU computing power.

One small model on a GPU

Oftentimes, multiple NN model training sessions need to be run, and Hyperparameter Search Procedure would be a good example. But the question is how to run many NN on a single GPU. My advice is to run each model with a separate subprocess. The quickest way is to embed code (initialization, training, etc.) into a single Python function and run that function with a subprocess. In this post I have shown a simple method of running a function with a subprocess using a Python decorator. To run many trainings simultaneously, start a subprocess many times — one after another.

How much is too much?

If your GPU computing power is 100% used, running many models at the same time usually will not slow down the task, but it will not speed it up either. There is also a possibility that your GPU RAM is fully loaded and any new model process is very likely to crash. In that case, you need to be careful and implement a subprocess crash management system, which I will write about in my following posts.

Below is the example of a hyperparameter optimization framework from my ptools repo running 20 NN models simultaneously on two GPUs. When one model finishes, another one starts right after it. Models utilize about 9GB of the GPU RAM and 98% of core computing power.

20 NN models running simultaneously on two GPUs while optimizing hyperparameters

--

--