As a SA, it not uncommon to have regularly requests about big differences
between the du and df outputs on a UFS file system.
(For ZFS specific considerations, please see the ZFS FAQ.)
The du utility reports the sum of space allocated to all files
in the file hierarchy rooted in the directory plus the space allocated to the
directory itself. The df utility reports the amount of disk space
occupied by a mounted file system.
When a file is remove from the file system, i.e. is unlinked (the hard link
count goes to zero), the space belonging to this file is accounted against the
du tool, but is not visible to the df utility until
all references to it (open file descriptors) are closed. In order to find the
guilty process, one can follow the information found in the SunManagers Frequently Asked Questions. Here is an example of such
finding, but using a slightly different method to get the process currently
holding the open descriptor to the deleted file.
Find the file which has been unlinked through the procfs
interface:
# find /proc/*/fd \( -type f -a ! -size 0 -a -links 0 \) -print | xargs \ls -li
415975 --w------- 0 user group 2125803025 Oct 15 23:59 /proc/1252/fd/3
Eventually, get more detail about it:
# pargs -c 1252
1252: rvd.basic -reliability 5 -listen tcp:9876 -logfile /path/to/log/rvd_9876.l
argv[0]: rvd.basic
argv[1]: -reliability
argv[2]: 5
argv[3]: -listen
argv[4]: tcp:9876
argv[5]: -logfile
argv[6]: /path/to/log/rvd_9876.log
Check to see if you can understand what is the content of the unlinked
file:
# tail /proc/1252/fd/3
-------------------------------------------------------------------------------
2008-10-15 23:59:32.002116 - [MSG] BBG_Transmitter_class.cc, line 792 (thread 25087:4)
[4060] Sent a heartbeat
-------------------------------------------------------------------------------
BBG_Transmitter_class.cc: [4111] No activity detected. Send a Heartbeat message
-------------------------------------------------------------------------------
2008-10-15 23:59:32.134829 - [MSG] BBG_Transmitter_class.cc, line 1138 (thread 25087:4)
[4065] Heartbeat acknowledged by Bloomberg
You can correlate the size of the removed, but always referenced, file to
the space accounted from the du and df tools:
# df -k /path/to
Filesystem kbytes used avail capacity Mounted on
/dev/md/dsk/d5 6017990 5874592 83219 99% /path/to
# du -sk /path/to
3791632 /data
# echo "(5874616-3791632)*1024" | bc
2132975616
So, we now found the ~2GB log file which was always opened (used) by a
process. Now, there are two solutions to be able to get back the freed
space:
- Truncate the unlinked file (quick workaround).
- Simply restart properly the corresponding program (better option).
Use the solution which fits the best your need in your environment.