Low NVIDIA GPU Usage with Keras and Tensorflow

tensorflow low gpu utilization
tensorflow-gpu
keras low cpu usage
tensorflow gpu 100
tensorflow cpu usage low
gpu utilization 0 tensorflow
pytorch low gpu utilization
keras gpu

I'm running a CNN with keras-gpu and tensorflow-gpu with a NVIDIA GeForce RTX 2080 Ti on Windows 10. My computer has a Intel Xeon e5-2683 v4 CPU (2.1 GHz). I'm running my code through Jupyter (most recent Anaconda distribution). The output in the command terminal shows that the GPU is being utilized, however the script I'm running takes longer than I expect to train/test on the data and when I open the task manager it looks like the GPU utilization is very low. Here's an image:

Note that the CPU isn't being utilized and nothing else on the task manager suggests anything is being fully utilized. I don't have an ethernet connection and am connected to Wifi (don't think this effects anything but I'm not sure with Jupyter since it runs through the web broswers). I'm training on a lot of data (~128GB) which is all loaded into the RAM (512GB). The model I'm running is a fully convolutional neural network (basically a U-Net architecture) with 566,290 trainable parameters. Things I tried so far: 1. Increasing batch size from 20 to 10,000 (increases GPU usage from ~3-4% to ~6-7%, greatly decreases training time as expected). 2. Setting use_multiprocessing to True and increasing number of workers in model.fit (no effect).

I followed the installation steps on this website: https://www.pugetsystems.com/labs/hpc/The-Best-Way-to-Install-TensorFlow-with-GPU-Support-on-Windows-10-Without-Installing-CUDA-1187/#look-at-the-job-run-with-tensorboard

Note that this installation specifically DOESN'T install CuDNN or CUDA. I've had trouble in the past with getting tensorflow-gpu running with CUDA (although I haven't tried in over 2 years so maybe it's easier with the latest versions) which is why I used this installation method.

Is this most likely the reason why the GPU isn't being fully utilized (no CuDNN/CUDA)? Does it have something to do with the dedicated GPU memory usage being a bottleneck? Or maybe something to do with the network architecture I'm using (number of parameters, etc.)?

Please let me know if you need any more information about my system or the code/data I'm running on to help diagnose. Thanks in advance!

EDIT: I noticed something interesting in the task manager. An epoch with batch size of 10,000 takes around 200s. For the last ~5s of each epoch, the GPU usage increases to ~15-17% (up from ~6-7% for the first 195s of each epoch). Not sure if this helps or indicates there's a bottleneck somewhere besides the GPU.


Everything works as expected; your dedicated memory usage is nearly maxed, and neither TensorFlow nor CUDA can use shared memory -- see this answer.

If your GPU runs OOM, the only remedy is to get a GPU with more dedicated memory, or decrease model size, or use below script to prevent TensorFlow from assigning redundant resources to the GPU (which it does tend to do):

## LIMIT GPU USAGE
config = tf.ConfigProto()  
config.gpu_options.allow_growth = True  # don't pre-allocate memory; allocate as-needed
config.gpu_options.per_process_gpu_memory_fraction = 0.95  # limit memory to be allocated
K.tensorflow_backend.set_session(tf.Session(config=config)) # create sess w/ above settings

The unusual increased usage you observe may be shared memory resources being temporarily accessed due to exhausting other available resources, especially with use_multiprocessing=True - but unsure, could be other causes

Low GPU utilization while training with Keras, I am fine-tuning the Xception last block with 5000 images and I realized that my laptop (with GeForce GTX 1050) uses only 1-2% of the GPU and 20-30% of CPU​  Alternatively, if you want to install Keras on Tensorflow with CPU support only that is much simpler than GPU installation, there is no need of CUDA Toolkit & Visual Studio & will take 5–10 minutes.


I would first start by running one of the short "tests" to ensure Tensorflow is utilizing the GPU. For example, I prefer @Salvador Dali's answer in that linked question

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

If Tensorflow is indeed using your GPU you should see the result of the matrix multplication printed. Otherwise a fairly long stack trace stating that "gpu:0" cannot be found.


If this all works well that I would recommend utilizing Nvidia's smi.exe utility. It is available on both Windows and Linux and AFAIK installs with the Nvidia driver. On a windows system it is located at

C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe

Open a windows command prompt and navigate to that directory. Then run

nvidia-smi.exe -l 3

This will show you a screen like so, that updates every three seconds.

Here we can see various information about the state of the GPUs and what they are doing. Of specific interest in this case is the "Pwr: Usage/Cap" and "Volatile GPU-Util" columns. If your model is indeed using the/a GPU these columns should increase "instantaneously" once you start training the model.

You most likely will see an increase in fan speed and temperature unless you have a very nice cooling solution. In the bottom of the printout you should also see a Process with a name akin to "python" or "Jupityr" running.


If this fails to provide an answers as to the slow training times than I would surmise the issue lies with the model and code itself. And I think its is actually the case here. Specifically viewing the Windows Task Managers listing for "Dedicated GPU Memory Usage" pinged at basically maximum.

tensorflow-gpu using only 10% of my GPU · Issue #26758 , name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate(GHz): 1.2785 I just tried a batch size of 1000 and the GPU usage has gone up to 20%, Yes, I'm using Keras, and I'm currently using normal LSTMs so I'm going to do but it's also going to save my poor CPU of quite a few rough hours. I'm having trouble making tensorflow to use the Nvidia GeForce GTX 1080 GPU on my system efficiently. I reduced my code to the very simple version shown below; I'm only looping through a session.run() operation that should use the GPU, the data is only fetched once and is reused inside the loop, so this code should only utilize the GPU.


read following two pages ,u will get idea to properly setup with GPU https://medium.com/@kegui/how-do-i-know-i-am-running-keras-model-on-gpu-a9cdcc24f986

https://datascience.stackexchange.com/questions/41956/how-to-make-my-neural-netwok-run-on-gpu-instead-of-cpu

Fully Utilizing Your Deep Learning GPUs - Colin Shaw, Sometimes you see custom CUDA code for unusual features as well. In the past I used TensorFlow and Keras a lot. These typically are low utilization episodes of watching the magnitude and rate of change of the loss,  It means that you don’t have data to process on GPU. One reason can be IO as Tony Petrov wrote. Two other reasons can be: 1. complex preprocessing. The program is spending too much time on CPU preparing the data.


If you have tried @KDecker's and @OverLordGoldDragon's solution, low GPU usage is still there, I would suggest first investigating your data pipeline. The following two figures are from tensorflow official guides data performance, they are well illustrated how data pipeline will affect the GPU efficiency.

As you can see, prepare data in parallel with the training will increase the GPU usage. In this situation, CPU processing is becoming the bottleneck. You need to find a mechanism to hide the latency of preprocessing, such as changing the number of processes, size of butter etc. The efficiency of CPU should match the efficiency of the GPU. In this way, the GPU will be maximally utilized.

Take a look at Tensorpack, and it has detailed tutorials of how to speed up your input data pipeline.

“TensorFlow with multiple GPUs”, is also not at maximum then you have other issues, which you can check out below. The docs say that when using a TensorFlow backend, Keras automatically runs on a GPU if it is detected. I'm logged into a remote GPU, and I try to run a Keras program, but I'm only using the CPUs for


There seems to have been a change to the installation method you referenced : https://www.pugetsystems.com/labs/hpc/The-Best-Way-to-Install-TensorFlow-with-GPU-Support-on-Windows-10-Without-Installing-CUDA-1187 It is now much easier and should eliminate the problems you are experiencing.

Important Edit You don't seem to be looking at the actual compute of the GPU, look at the attached image:

Fix Low GPU Usage in Games [Nvidia & AMD Graphics Cards], I have an NVIDIA GTX 1080. nope, keras is doing parallel cpu/gpu utilization. problem is rather that the batches are so small that it makes the overhead larger than the speed Create ML vs Tensorflow AND Core ML vs Tensorflow Lite. I am trying to use my gpu NVIDIA GEFORCE GTX 1050, with tensorflow to train a neural network. I have tried with different code examples of different neural networks and the result is always a GPU utilization of 8% with computation that are much slower than with CPU.


[SOLVED], In nvidia-smi , if the process ID of your python code shows up , it means the GPU I believe the low GPU usage may be due to the architecture of the model since it is However I don't know if Keras is actually supporting this. When I run nvidia-smi I can see the memory is still used, but there is no process using a GPU. Also, If I try to run another model, it fails much sooner. Nothing in the first five pages of google results works. (and most solutions are for TF1) Is there any way to release GPU memory in tensorflow 2? 👍


GPU utilization is low during training. Batch size? : MLQuestions, On a system with devices CPU:0 and GPU:0 , the GPU:0 device will be selected to run the available memory, or to only grow the memory usage as is needed by the process. If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default. model = tf.keras.models. import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers import numpy as np Introduction. Keras provides default training and evaluation loops, fit() and evaluate(). Their usage is coverered in the guide Training & evaluation with the built-in methods.


Noob question on maximising GPU utilization - Part 1 (2017), You can install CUDA® Toolkit 10.2 or later to profile multiple GPUs. The lower pane shows a table that reports data about TensorFlow ops with one Programmatic mode using the TensorBoard Keras Callback ( tf.keras.callbacks. Vectorize user-defined functions; Reduce memory usage when applying transformations. System information Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 (1809) TensorFlow installed from (source or binary