Java: Watching a directory to move large files

java watchservice multiple directories
java 8 file watcher example
java watchservice nfs
spring boot watchservice
sftp file watcher java
java check if file has been modified
java filesystem monitor
java directory scanner

I have been writing a program that watches a directory and when files are created in it, it changes the name and moves them to a new directory. In my first implementation I used Java's Watch Service API which worked fine when I was testing 1kb files. The problem that came up is that in reality the files getting created are anywhere from 50-300mb. When this happened the watcher API would find the file right away but could not move it because it was still being written. I tried putting the watcher in a loop (which generated exceptions until the file could be moved) but this seemed pretty inefficient.

Since that didn't work, I tried up using a timer that checks the folder every 10s and then moves files when it can. This is the method I ended up going for.

Question: Is there anyway to signal when a file is done being written without doing an exception check or continually comparing the size? I like the idea of using the Watcher API just once for each file instead of continually checking with a timer (and running into exceptions).

All responses are greatly appreciated!

nt

Write another file as an indication that the original file is completed. I.g 'fileorg.dat' is growing if done create a file 'fileorg.done' and check only for the 'fileorg.done'.

With clever naming conventions you should not have problems.

Watching a Directory for Changes (The Java™ Tutorials > Essential , It does not scale to applications that have hundreds of open files or directories to monitor. The java.nio.file package provides a file change notification API, called  With an ATOMIC_MOVE you can move a file into a directory and be guaranteed that any process watching the directory accesses a complete file. The following shows how to use the move method: import static java.nio.file.StandardCopyOption.*;

I ran into the same problem today. I my usecase a small delay before the file is actually imported was not a big problem and I still wanted to use the NIO2 API. The solution I choose was to wait until a file has not been modified for 10 seconds before performing any operations on it.

The important part of the implementation is as follows. The program waits until the wait time expires or a new event occures. The expiration time is reset every time a file is modified. If a file is deleted before the wait time expires it is removed from the list. I use the poll method with a timeout of the expected expirationtime, that is (lastmodified+waitTime)-currentTime

private final Map<Path, Long> expirationTimes = newHashMap();
private Long newFileWait = 10000L;

public void run() {
    for(;;) {
        //Retrieves and removes next watch key, waiting if none are present.
        WatchKey k = watchService.take();

        for(;;) {
            long currentTime = new DateTime().getMillis();

            if(k!=null)
                handleWatchEvents(k);

            handleExpiredWaitTimes(currentTime);

            // If there are no files left stop polling and block on .take()
            if(expirationTimes.isEmpty())
                break;

            long minExpiration = min(expirationTimes.values());
            long timeout = minExpiration-currentTime;
            logger.debug("timeout: "+timeout);
            k = watchService.poll(timeout, TimeUnit.MILLISECONDS);
        }
    }
}

private void handleExpiredWaitTimes(Long currentTime) {
    // Start import for files for which the expirationtime has passed
    for(Entry<Path, Long> entry : expirationTimes.entrySet()) {
        if(entry.getValue()<=currentTime) {
            logger.debug("expired "+entry);
            // do something with the file
            expirationTimes.remove(entry.getKey());
        }
    }
}

private void handleWatchEvents(WatchKey k) {
    List<WatchEvent<?>> events = k.pollEvents();
    for (WatchEvent<?> event : events) {
        handleWatchEvent(event, keys.get(k));
    }
    // reset watch key to allow the key to be reported again by the watch service
    k.reset();
}

private void handleWatchEvent(WatchEvent<?> event, Path dir) throws IOException {
    Kind<?> kind = event.kind();

    WatchEvent<Path> ev = cast(event);
        Path name = ev.context();
        Path child = dir.resolve(name);

    if (kind == ENTRY_MODIFY || kind == ENTRY_CREATE) {
        // Update modified time
        FileTime lastModified = Attributes.readBasicFileAttributes(child, NOFOLLOW_LINKS).lastModifiedTime();
        expirationTimes.put(name, lastModified.toMillis()+newFileWait);
    }

    if (kind == ENTRY_DELETE) {
        expirationTimes.remove(child);
    }
}

Moving a File or Directory (The Java™ Tutorials > Essential Classes , With an ATOMIC_MOVE you can move a file into a directory and be guaranteed that any process watching the directory accesses a complete file. The following  In this example, we will learn to watch a directory along with all sub-directories and files inside it, using java 8 WatchService API. How to register Java 8 WatchService. To Register WatchService, get the directory path and use path.register() method.

Two solutions:

The first is a slight variation of the answer by stacker:

Use a unique prefix for incomplete files. Something like myhugefile.zip.inc instead of myhugefile.zip. Rename the files when upload / creation is finished. Exclude .inc files from the watch.

The second is to use a different folder on the same drive to create / upload / write the files and move them to the watched folder once they are ready. Moving should be an atomic action if they are on the same drive (file system dependent, I guess).

Either way, the clients that create the files will have to do some extra work.

A Guide to WatchService in Java NIO2, A quick and practical guide to Java NIO2 WatchService. Basically, we can write code to poll the filesystem for changes on specific files and directories. when an entry is deleted, moved or renamed in the watched directory. The java.nio.file package provides a file change notification API, called the Watch Service API. This API enables you to register a directory (or directories) with the watch service. When registering, you tell the service which types of events you are interested in: file creation, file deletion, or file modification.

I know it's an old question but maybe it can help somebody.

I had the same issue, so what I did was the following:

if (kind == ENTRY_CREATE) {
            System.out.println("Creating file: " + child);

            boolean isGrowing = false;
            Long initialWeight = new Long(0);
            Long finalWeight = new Long(0);

            do {
                initialWeight = child.toFile().length();
                Thread.sleep(1000);
                finalWeight = child.toFile().length();
                isGrowing = initialWeight < finalWeight;

            } while(isGrowing);

            System.out.println("Finished creating file!");

        }

When the file is being created, it will be getting bigger and bigger. So what I did was to compare the weight separated by a second. The app will be in the loop until both weights are the same.

Java File change notification example with Watch Service API, Java 7 adds a new feature for its NIO package called Watch Service API which allows applications monitoring directories and files for change  Now you can run the class to start watching a directory. When you navigate to the user home directory and perform any file manipulation activity like creating a file or directory, changing contents of a file or even deleting a file, it will all be logged at the console.

Looks like Apache Camel handles the file-not-done-uploading problem by trying to rename the file (java.io.File.renameTo). If the rename fails, no read lock, but keep trying. When the rename succeeds, they rename it back, then proceed with intended processing.

See operations.renameFile below. Here are the links to the Apache Camel source: GenericFileRenameExclusiveReadLockStrategy.java and FileUtil.java

public boolean acquireExclusiveReadLock( ... ) throws Exception {
   LOG.trace("Waiting for exclusive read lock to file: {}", file);

   // the trick is to try to rename the file, if we can rename then we have exclusive read
   // since its a Generic file we cannot use java.nio to get a RW lock
   String newName = file.getFileName() + ".camelExclusiveReadLock";

   // make a copy as result and change its file name
   GenericFile<T> newFile = file.copyFrom(file);
   newFile.changeFileName(newName);
   StopWatch watch = new StopWatch();

   boolean exclusive = false;
   while (!exclusive) {
        // timeout check
        if (timeout > 0) {
            long delta = watch.taken();
            if (delta > timeout) {
                CamelLogger.log(LOG, readLockLoggingLevel,
                        "Cannot acquire read lock within " + timeout + " millis. Will skip the file: " + file);
                // we could not get the lock within the timeout period, so return false
                return false;
            }
        }

        exclusive = operations.renameFile(file.getAbsoluteFilePath(), newFile.getAbsoluteFilePath());
        if (exclusive) {
            LOG.trace("Acquired exclusive read lock to file: {}", file);
            // rename it back so we can read it
            operations.renameFile(newFile.getAbsoluteFilePath(), file.getAbsoluteFilePath());
        } else {
            boolean interrupted = sleep();
            if (interrupted) {
                // we were interrupted while sleeping, we are likely being shutdown so return false
                return false;
            }
        }
   }

   return true;
}

Java WatchService API Tutorial, By Lokesh Gupta | Filed Under: Java 8 How to register Java 8 WatchService Watch Directory, Sub-directories and Files for Changes Example directory having big size then, Watch Service will not wait to complete the transfer of this file? 2. Directory Monitor. Directory Monitor is a tool that can watch for file and folder changes, modifications, deletions and the creation of new files, and can do this while being able to handle multiple locations at once. There is the ability to watch network shares in addition to local folders.

Event Driven Architecture over Polling Architecture for File Transport , A polling transport scans a directory or set of directories repetitively with a given This package provided a sub-package java.nio.file containing a file system a new directory get registered with the watch service and at a moment a huge  In this article I show how to watch files and directories for additions/changes/deletions using the new file system support in Java 7. Finally, after years of native solutions, it is now possible to listen for file changes without resorting to OS specific solutions.</p>

Watching Files With Java NIO, It enables us to register a folder with the watch service. When registering, we tell the service which types of events we are interested in: file  Java – Convert File to InputStream. How to open an InputStream from a Java File - using plain Java, Guava and the Apache Commons IO library. Java – Read from File. Read contents from a file in Java - using any of these: BufferedReader, Scanner, StreamTokenizer, DataInputStream, SequenceInputStream, FileChannel, etc.

Java NIO: Event-Driven Architecture Over Polling Architecture, Let's take a dive into the Java NIO package to see how it handles file transport A polling transport scans a directory or set of directories repetitively with a a huge number of files (say 10,000) are copied to that directory or  Java Tutorial For Beginners 38 - Create a File and Write in it Using PrintWriter and File class - Duration: 8:38. ProgrammingKnowledge 142,249 views

Comments
  • I tried putting the watcher in a loop (which generated exceptions until the file could be moved) but this seemed pretty inefficient. Yes, this is an awful solution. Exceptions are not made for managing control flow.
  • Sadly @ntmp, from what I've tested so far, looking for exceptions was the best way to tell that the OS was still "writing" or "copying" the file. But I agree with @Sean Patrick Floyd that it is a terrible way to make it work. Personally I wish the check was part of the java.io.File API. Not sure why it wasn't. Would be left up to the JVM guys to implement and make it easier for us developers....
  • The "check for exception" approach won't even work on UNIX, since UNIX filesystems do not lock files that are being written. On UNIX, java will happily move the partially written file, resulting in corrupted data.
  • Best answer in this thread - it's 2013 and have they already fixed this in Java, or is it still necessary to use code like this?
  • Method handleExpiredWaitTimes is removing entry while iterating so iterator should be used.
  • The problem is I have very little control over the client creating the files. I cannot add a unique prefix. I can specify the folder the files are written too but I can't tell the client to move them to another folder when they are done writing.
  • @ntmp Did you get some solution regarding this problem , please share with me as I am also facing the same kind of issue
  • Not sure if this will work on Win7 because, when copying a file, Win7 allocates all the necessary space in the hard disk and then "fills" it with the file's bytes.
  • I've tried a quick test with one thread writing to the file while other thread checks the canWrite() method but it always returns true.
  • actually I believe it just checks the OS to see if you have permission to write. You may have permission from a security standpoint, but not from the standpoint of waiting for it to be finished writing to.