Article summary
Applications that work with files on disk can encounter incomplete or corrupted files if a target file is actively being written to disk by another process. Typically, this happens when two different systems or processes are interacting with the same file independently.
For example, if a delivery system (e.g. SFTP) is writing a file, and an independent application (e.g. file importer) expects to read and process the file, issues may arise if the processing application reads the file before all data has been transferred and written to disk. While write()
calls are generally atomic, the process writing to the file may use multiple write()
calls, and a read()
could be interleaved between them.
In order to prevent incomplete or corrupted file issues, the application that is reading the file from disk must make sure that no other processes are writing to the target file. There are a few different strategies to coordinate access, including: lock files, sentinel files, monitoring filesystem events, and monitoring open file handles.
While each approach has strengths and weaknesses, circumstances may force the use of one particular method. In a recent case, I found that only subscribing to filesystem events or monitoring open file handles would be sufficient. Below are scripts from the two approaches that I used to coordinate access to a file that was being delivered via SFTP:
Scenario
An SFTP server is accepting large text files for processing. When files are received, a separate process needs to read the files and manipulate the data. The files may arrive periodically throughout the day, and they must be processed immediately after delivery.
Monitoring filesystem events with inotify
inotifywait
is used to track files and exit when specific filesystem events are received (in this case, close_write
). This indicates that the file is no longer being transferred (written), and it may be safely read by the import process.
Generally, if the event is not received, inotifywait
will block indefinitely. In case inotifywait
starts tracking a file after the close_write
event (perhaps it was transferred very fast), a timeout is provided, after which time inotifywait
exits. When inotifywait
exits, it is assumed the file transferred successfully:
inotifywait -e close_write -t 120 /path/to/file.txt
Context in script:
#!/bin/bash
while true
do sleep 1s
for file in $(find /opt/sftp -type f -not -name "\.*")
do inotifywait -e close_write -t 120 "${file}" > /dev/null 2>&1
then echo "Processing: ${file}..."
file_importer.rb "${file}"
mv "${file}" /tmp
done
find /tmp -type f -ctime +7 -exec rm {} \+
done
Monitoring file handles with lsof
lsof
will show information about file descriptors opened by processes, including regular files, directories, block special files, character special files, libraries, streams, and network files (e.g. sockets). lsof
will exit with an exit code of zero if there are any processes that have an open file handle to the target file, and a non-zero exit code if there are no processes that have an open file handle to the target file.
If lsof
does not find any processes that have file handles on the target file, then it indicates that the file is no longer being transferred (written), and it may be safely read by the import process:
lsof /path/to/file.txt
Context in the script:
#!/bin/bash
while true
do sleep 1s
for file in $(find /opt/sftp -type f -not -name "\.*")
do lsof "${file}" > /dev/null 2>&1
retval=$?
if [ ${retval} -eq 0 ]
then echo "Waiting on ${file}..."
continue
fi
echo "Processing: ${file}..."
file_importer.rb "${file}"
mv "${file}" /tmp
done
find /tmp -type f -ctime +7 -exec rm {} \+
done
Conclusion
In the end, I chose to use lsof
to monitor open file handles as it avoided the possibility of inotifywait
hitting its timeout before a file was completely transferred via SFTP. While files should always have been transferred before the two-minute timeout, it seemed possible that an unexpectedly slow network connection could be an edge case. Furthermore, if inotifywait
missed the close_write
event because a file was transferred extremely quickly, processing the file would be delayed by at least two minutes (waiting for the timeout to be reached).
Hope these examples are helpful to you.