作者:愛情伈語gg | 来源:互联网 | 2023-08-16 17:14
12345678910111213141516171819202122232425262728293031323334353637383940414243TensorBoard 1.14 -- on
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
| TensorBoard 1.14 -- on Ubuntu 18.04 LTS. A production training server, so apps
cannot be upgraded here.
It makes BOINC client (daemon) unresponsive and dead.
root-107:~# boinccmd --read_global_prefs_override
Operation failed: read() failed
...and after restaring Tensorboard -- boinc suddenly works again ! (for a few minutes... and fails again)
Problem is very much reproducible on my server.
---
steps to reproduce (BOINC side):
# apt-get install boinc-client
cd /var/lib/boinc-client/
boinccmd --project_attach http://www.worldcommunitygrid.org/ $KEY
boinccmd --set_network_mode always
boinccmd --set_run_mode always
boinccmd --set_gpu_mode never
root-107:~# boinccmd --read_global_prefs_override
root-107:~#
...
this is on a server that was running BOINC for an entire *month* with zero issues,
before tensorflow and tensorboard were installed.
...
this server has enough RAM memory and disk space, so those issues can be ruled out:
root-107:~# uptime
00:31:18 up 44 days, 2:37, 16 users, load average: 57.85, 59.09, 57.01
root-107:~# free -h
total used free shared buff/cache available
Mem: 125G 36G 882M 1.0G 88G 87G
Swap: 8.0G 100M 7.9G
root-107:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 63G 0 63G 0% /dev
tmpfs 13G 2.8M 13G 1% /run
/dev/sda2 916G 358G 512G 42% /
tmpfs 63G 100K 63G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/loop2 90M 90M 0 100% /snap/core/8039
tmpfs 13G 0 13G 0% /run/user/1000
/dev/loop0 90M 90M 0 100% /snap/core/8213
tmpfs 13G 0 13G 0% /run/user/0 |
该提问来源于开源项目:tensorflow/tensorboard
: Thanks for investigating and checking back in! The memory issues
are a known issue generally (#766), unless you’re seeing differentially
high memory usage with BOINC in play. The X session sounds like a bug in
BOINC: it’s true X servers frequently use 60xx ports, but BOINC
shouldn’t be assuming that behavior just because one of those ports
happens to be open.
Perhaps as a workaround for the port issue you could ask TensorBoard to
run on a different port, by passing
or similar?
I’m going to close this as it doesn’t look like there’s anything
actionable on our side. Thanks again for the report, and feel free to
follow up if there’s more that we can do for you.