This post is more than 5 years old
1 Rookie
•
12 Posts
0
3512
March 20th, 2017 13:00
ECS 3 CE single node services unavailable / root file system full
I have seen mention of out of space issues as well as unresponsive web UI and services. I exec'd into the container to find that root was full.
ecs0:/tmp # ls -lsa
total 52
0 drwxrwxrwt 5 root root 192 Mar 20 19:45 .
4 drwxr-xr-x 28 root root 4096 Feb 16 19:14 ..
0 -rw-r--r-- 1 storageos storageos 0 Mar 20 19:17 FPVMhealthcheck.heartbeat
0 -rw-r--r-- 1 root root 0 Mar 20 19:17 certtool.lock
0 drwxr-xr-x 2 root root 19 Mar 20 19:52 hsperfdata_root
0 drwxr-xr-x 2 storageos storageos 32 Mar 20 19:52 hsperfdata_storageos
0 drwx------ 2 root root 94 Feb 16 19:30 run-crons.6yRf5Z
48 -rwxr-xr-x 1 storageos storageos 48432 Feb 16 19:55 snappy-1.0.5-libsnappyjava.so
0 -rw-r--r-- 1 root root 0 Mar 20 19:17 systool.lock
ecs0:/tmp # cd run-crons.6yRf5Z/
ecs0:/tmp/run-crons.6yRf5Z # ls -la
total 8898796
drwx------ 2 root root 94 Feb 16 19:30 .
drwxrwxrwt 5 root root 192 Mar 20 19:45 ..
-rw-r--r-- 1 root root 9112358912 Mar 20 19:17 run-crons.hourly.16288
-rw-r--r-- 1 root root 32 Feb 16 19:30 run-crons_mail.16288
-rw-r--r-- 1 root root 0 Feb 16 19:30 run-crons_output.16288
ecs0:/tmp/run-crons.6yRf5Z # file run-crons.hourly.16288
run-crons.hourly.16288: ASCII text
ecs0:/tmp/run-crons.6yRf5Z # head run-crons.hourly.16288
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
ecs0:/tmp/run-crons.6yRf5Z # tail run-crons.hourly.16288
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timeout --help' for more information.
timeout: invalid time interval 'find'
Try 'timecs0:/tmp/run-crons.6yRf5Z # rm run-crons.hourly.16288
Right or wrong, I removed the file vs truncating it, but I question whether there is a bug to address, expected behavior, or a combination thereof. After exiting the container, I issued a docker restart on it, and all the services began to function normally.
Thoughts?
Thanks!! Mike
JasonCwik
281 Posts
0
March 21st, 2017 08:00
That's an interesting one. How long was your system up? What base OS are you installed on?
Mike_Phelps
1 Rookie
•
12 Posts
0
March 21st, 2017 12:00
CentOS 7.3.1611 created on 2017 Feb 16
Docker Engine 1.13.1
emccorp/ecs-software-3.0.0:latest image id e3022d56bf25
4 vCPU, 32GB RAM, 80GB OS
4 1TB LUNs for ECS
JasonCwik
281 Posts
1
April 11th, 2017 10:00
Chris, actually the opposite. The setting does exist in real ECS (7200) and our patch inadvertently removed it.
travis_wichert
16 Posts
0
July 21st, 2017 12:00
Yes, that time in real ECS is two hours (7200s).
Also, this should issue should be resolved in ECS CE since we are no longer overwriting vnest.object.properties when building the CE image from the upstream ECS release artifact.