作者:8023pxeb_256 | 来源:互联网 | 2023-09-25 21:54
今天写代码,连接服务器使用服务器的显卡时出现了奇怪的报错InternalError(seeabovefortraceback):cudnnPoolForwardlaunchfail
今天写代码,连接服务器使用服务器的显卡时出现了奇怪的报错
InternalError (see above for traceback): cudnn PoolForward launch failed
[[Node: MaxPool = MaxPool[T=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 1, 20], padding="VALID", strides=[1, 1, 1, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"](Relu)]]
[[Node: Neg/_5 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_244_Neg", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
离谱的是,当我通过
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
指定显卡1,也就是1080ti的时候并不会报错
而通过
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
指定显卡0,也就是2080ti的时候就会报错
但是通过nvidia-smi查看
1080ti才是显卡0,2080ti是显卡1
这就离谱了
在pycharm打印的报告里有
2020-09-30 12:49:41.147085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10230 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:03:00.0, compute capability: 7.5)
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10230 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:03:00.0, compute capability: 7.5)
按理说两张显卡的内存都是够用的才对,不应该出现内存不够的问题。。。