performance - Why does the java DirectoryStream perform so slow? -


i've done testing streams in special directorystreams of nio-package. try list of files in directory sorted last modified date , size.

the javadoc of old file.listfiles() stated note method in files:

note files class defines newdirectorystream method open directory , iterate on names of files in directory. may use less resources when working large directories.

i run code down below lot of times (first 3 times below):

first-run:

run time of arrays.sort: 1516 run time of stream.sorted array: 2912 run time of stream.sorted list: 2875 

second-run:

run time of arrays.sort: 1557 run time of stream.sorted array: 2978 run time of stream.sorted list: 2937 

third-run:

run time of arrays.sort: 1563 run time of stream.sorted array: 2919 run time of stream.sorted list: 2896 

my question is: why streams perform bad?

import java.io.file; import java.io.ioexception; import java.io.uncheckedioexception; import java.nio.file.files; import java.nio.file.path; import java.nio.file.paths; import java.nio.file.attribute.filetime; import java.util.arrays; import java.util.comparator; import java.util.list; import java.util.stream.collectors; import java.util.stream.stream;  public class filesorter {    // sorts old young , big small   comparator<path> timesizecomparator = (path o1, path o2) -> {     int sorter = 0;     try {       filetime lm1 = files.getlastmodifiedtime(o1);       filetime lm2 = files.getlastmodifiedtime(o2);       if (lm2.compareto(lm1) == 0) {         long s1 = files.size(o1);         long s2 = files.size(o2);         sorter = s2.compareto(s1);       } else {         sorter = lm1.compareto(lm2);       }     } catch (ioexception ex) {       throw new uncheckedioexception(ex);     }     return sorter;   };    public string[] getsortedfilelistasarray(path dir) throws ioexception {     stream<path> stream = files.list(dir);     return stream.sorted(timesizecomparator).             map(path::getfilename).map(path::tostring).toarray(string[]::new);   }    public list<string> getsortedfilelistaslist(path dir) throws ioexception {     stream<path> stream = files.list(dir);     return stream.sorted(timesizecomparator).             map(path::getfilename).map(path::tostring).collect(collectors.             tolist());   }    public string[] sortbydateandsize(file[] filelist) {     arrays.sort(filelist, (file o1, file o2) -> {       int r = long.compare(o1.lastmodified(), o2.lastmodified());       if (r != 0) {         return r;       }       return long.compare(o1.length(), o2.length());     });     string[] filenames = new string[filelist.length];     (int = 0; < filenames.length; i++) {       filenames[i] = filelist[i].getname();     }     return filenames;   }    public static void main(string[] args) throws ioexception {     // file (io package)     file f = new file("c:\\windows\\system32");     // path (nio package)     path dir = paths.get("c:\\windows\\system32");      filesorter fs = new filesorter();      long before = system.currenttimemillis();     string[] names = fs.sortbydateandsize(f.listfiles());     long after = system.currenttimemillis();     system.out.println("run time of arrays.sort: " + ((after - before)));      long before2 = system.currenttimemillis();     string[] names2 = fs.getsortedfilelistasarray(dir);     long after2 = system.currenttimemillis();     system.out.             println("run time of stream.sorted array: " + ((after2 - before2)));      long before3 = system.currenttimemillis();     list<string> names3 = fs.getsortedfilelistaslist(dir);     long after3 = system.currenttimemillis();     system.out.             println("run time of stream.sorted list: " + ((after3 - before3)));   } } 

update

after applying code peter have results:

run time of arrays.sort: 1615 run time of stream.sorted array: 3116 run time of stream.sorted list: 3059 run time of stream.sorted list caching: 378 

update 2

after doing research on solution of peter, can say, reading file attributes ex. files.getlastmodified must heavy crunch. changing part in comparator to:

  comparator<path> timesizecomparator = (path o1, path o2) -> {     file f1 = o1.tofile();     file f2 = o2.tofile();     long lm1 = f1.lastmodified();     long lm2 = f2.lastmodified();     int cmp = long.compare(lm2, lm1);     if (cmp == 0) {       cmp = long.compare(f2.length(), f1.length());     }     return cmp;   }; 

gets better result on computer:

run time of arrays.sort: 1968 run time of stream.sorted array: 1999 run time of stream.sorted list: 1975 run time of stream.sorted list caching: 488 

but can see, caching object best way. , jtahlborn mentioned, kind of stable sort.

update 3 (best solution i've found)

after bit more research, i've seen, methods files.lastmodified , files.size, both huge job on same thing: attributes. made 3 versions of pathinfo class test:

  1. peters version described down below
  2. an old style file version, path.tofile() once in constructor , values file f.lastmodified , f.length
  3. an version of peters solution, read attribute object files.readattributes(path,basicfileattributes.class) , done things on this.

putting in loop doing 100 times each, came these results:

after doing hundred times mean performance of peters solution: 432.26 mean performance of old file solution: 343.11 mean performance of read attribute object once solution: 255.66 

code in constructor of pathinfo best solution:

public pathinfo(path path) {   try {     // read whole attributes once     basicfileattributes bfa = files.readattributes(path, basicfileattributes.class);     filename = path.getfilename().tostring();     modified = bfa.lastmodifiedtime().tomillis();     size = bfa.size();   } catch (ioexception ex) {     throw new uncheckedioexception(ex);   } } 

my result: never read attributes twice , caching in object bursting performance.

files.list() o(n) operation whereas sorting o(n log n). far more operations inside sorting matter. given comparisons don't same thing, explanation. there lot of files same modification date under c:/windows/system32 meaning size checked quite often.

to show of time not spent in files.list(dir) stream, have optimise comparison data file obtained once per file.

import java.io.file; import java.io.ioexception; import java.io.uncheckedioexception; import java.nio.file.files; import java.nio.file.path; import java.nio.file.paths; import java.nio.file.attribute.filetime; import java.util.arrays; import java.util.comparator; import java.util.list; import java.util.stream.collectors; import java.util.stream.stream;  public class filesorter {      // sorts old young , big small     comparator<path> timesizecomparator = (path o1, path o2) -> {         int sorter = 0;         try {             filetime lm1 = files.getlastmodifiedtime(o1);             filetime lm2 = files.getlastmodifiedtime(o2);             if (lm2.compareto(lm1) == 0) {                 long s1 = files.size(o1);                 long s2 = files.size(o2);                 sorter = s2.compareto(s1);             } else {                 sorter = lm1.compareto(lm2);             }         } catch (ioexception ex) {             throw new uncheckedioexception(ex);         }         return sorter;     };      public string[] getsortedfilelistasarray(path dir) throws ioexception {         stream<path> stream = files.list(dir);         return stream.sorted(timesizecomparator).                 map(path::getfilename).map(path::tostring).toarray(string[]::new);     }      public list<string> getsortedfilelistaslist(path dir) throws ioexception {         stream<path> stream = files.list(dir);         return stream.sorted(timesizecomparator).                 map(path::getfilename).map(path::tostring).collect(collectors.                 tolist());     }      public string[] sortbydateandsize(file[] filelist) {         arrays.sort(filelist, (file o1, file o2) -> {             int r = long.compare(o1.lastmodified(), o2.lastmodified());             if (r != 0) {                 return r;             }             return long.compare(o1.length(), o2.length());         });         string[] filenames = new string[filelist.length];         (int = 0; < filenames.length; i++) {             filenames[i] = filelist[i].getname();         }         return filenames;     }      public list<string> getsortedfile(path dir) throws ioexception {         return files.list(dir).map(pathinfo::new).sorted().map(p -> p.getfilename()).collect(collectors.tolist());     }      static class pathinfo implements comparable<pathinfo> {         private final string filename;         private final long modified;         private final long size;          public pathinfo(path path) {             try {                 filename = path.getfilename().tostring();                 modified = files.getlastmodifiedtime(path).tomillis();                 size = files.size(path);             } catch (ioexception ex) {                 throw new uncheckedioexception(ex);             }         }          @override         public int compareto(pathinfo o) {             int cmp = long.compare(modified, o.modified);             if (cmp == 0)                 cmp = long.compare(size, o.size);             return cmp;         }          public string getfilename() {             return filename;         }     }      public static void main(string[] args) throws ioexception {         // file (io package)         file f = new file("c:\\windows\\system32");         // path (nio package)         path dir = paths.get("c:\\windows\\system32");          filesorter fs = new filesorter();          long before = system.currenttimemillis();         string[] names = fs.sortbydateandsize(f.listfiles());         long after = system.currenttimemillis();         system.out.println("run time of arrays.sort: " + ((after - before)));          long before2 = system.currenttimemillis();         string[] names2 = fs.getsortedfilelistasarray(dir);         long after2 = system.currenttimemillis();         system.out.println("run time of stream.sorted array: " + ((after2 - before2)));          long before3 = system.currenttimemillis();         list<string> names3 = fs.getsortedfilelistaslist(dir);         long after3 = system.currenttimemillis();         system.out.println("run time of stream.sorted list: " + ((after3 - before3)));         long before4 = system.currenttimemillis();         list<string> names4 = fs.getsortedfile(dir);         long after4 = system.currenttimemillis();         system.out.println("run time of stream.sorted list caching: " + ((after4 - before4)));     } } 

this prints on laptop.

run time of arrays.sort: 1980 run time of stream.sorted array: 1295 run time of stream.sorted list: 1228 run time of stream.sorted list caching: 185 

as can see, 85% of time spent obtaining modification date , size of files repeatedly.


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -