Tensorflow占满内存问题解决

问题介绍

项目中发现使用Tensorflow和keras架构的代码在Jetson上启动时会占满内存，导致系统卡顿，而这在电脑端并未发现。而使用memory_profiler插件分析，可发现代码本身占用内存不高，在电脑端测试内存占用943.44MB，其中模型占了556.2MB 图片测试部分占了103.7MB。而在板子上测试代码本身内存也只占了大约2200多M。因此怀疑是Tensorflow自身的问题。

问题分析

经查阅资料，发现TensorFlow新API支持在物理GPU上设置内存增长。但是，这最终会耗尽所有内存。

默认情况下，TensorFlow会映射进程可见的所有GPU（取决于 CUDA_VISIBLE_DEVICES）的几乎全部内存。这是为了减少内存碎片，更有效地利用设备上相对宝贵的 GPU 内存资源。而在某些情况下，我们希望进程最好只分配可用内存的一个子集，或者仅在进程需要时才增加内存使用量。TensorFlow为此提供了两种控制方法。

第一个选项是通过调用 tf.config.experimental.set_memory_growth 来打开内存增长。此选项会尝试根据运行时分配需求来分配尽可能充足的GPU内存：首先分配非常少的内存，但随着程序的运行，需要的GPU内存会逐渐增多，于是扩展分配给TensorFlow进程的GPU内存区域。

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

第二种方法是使用 tf.config.experimental.set_virtual_device_configuration 配置虚拟GPU设备，并且设置可在GPU上分配多少总内存的硬性限制。

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

问题解决

经memory_profiler插件分析，项目中代码实际使用内存不到3G，因此这里使用第二种方法对内存分配进行限制。添加如下代码：

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])

解决了该问题，这里为了留有一定的余量，设置分配内存限制最高为4G。