As a SA, it not uncommon to have regularly requests about big
differences between the du and df outputs on a UFS file system. (For
ZFS specific considerations, please see the ZFS FAQ.)
The du utility reports the sum of space allocated to all files in the
file hierarchy rooted in the directory plus the space allocated to the
directory itself. The df utility reports the amount of disk space
occupied by a mounted file system.
When a file is remove from the file system, i.e. is unlinked (the hard
link count goes to zero), the space belonging to this file is accounted
against the du tool, but is not visible to the df utility until all
references to it (open file descriptors) are closed. In order to find
the guilty process, one can follow the information found in the
SunManagers Frequently Asked Questions. Here is an
example of such finding, but using a slightly different method to get
the process currently holding the open descriptor to the deleted file.
Find the file which has been unlinked through the procfs interface:
# find /proc/*/fd \( -type f -a ! -size 0 -a -links 0 \) -print | xargs \ls -li
415975 --w------- 0 user group 2125803025 Oct 15 23:59 /proc/1252/fd/3
Eventually, get more detail about it:
# pargs -c 1252
1252: rvd.basic -reliability 5 -listen tcp:9876 -logfile /path/to/log/rvd_9876.l
argv[0]: rvd.basic
argv[1]: -reliability
argv[2]: 5
argv[3]: -listen
argv[4]: tcp:9876
argv[5]: -logfile
argv[6]: /path/to/log/rvd_9876.log
Check to see if you can understand what is the content of the unlinked file:
# tail /proc/1252/fd/3
-------------------------------------------------------------------------------
2008-10-15 23:59:32.002116 - [MSG] BBG_Transmitter_class.cc, line 792 (thread 25087:4)
[4060] Sent a heartbeat
-------------------------------------------------------------------------------
BBG_Transmitter_class.cc: [4111] No activity detected. Send a Heartbeat message
-------------------------------------------------------------------------------
2008-10-15 23:59:32.134829 - [MSG] BBG_Transmitter_class.cc, line 1138 (thread 25087:4)
[4065] Heartbeat acknowledged by Bloomberg
You can correlate the size of the removed, but always referenced, file
to the space accounted from the du and df tools:
# df -k /path/to
Filesystem kbytes used avail capacity Mounted on
/dev/md/dsk/d5 6017990 5874592 83219 99% /path/to
# du -sk /path/to
3791632 /data
# echo "(5874616-3791632)*1024" | bc
2132975616
So, we now found the ~2GB log file which was always opened (used) by a process. Now, there are two solutions to be able to get back the freed space:
- Truncate the unlinked file (quick workaround).
- Simply restart properly the corresponding program (better option).
Use the solution which fits the best your need in your environment.
