node.js - Fast folder hashing in Windows Node -


i'm building nodewebkit app keeps local directory in sync remote ftp. build initial index when app run first time download index file remote server containing hash files , folders. run through list , find matches in user's local folder.

the total size of remote/local folder can on 10gb. can imagine, scanning 10gb worth of individual files can pretty slow, on normal hdd (not ssd).

is there way in node efficiently hash of folder without looping through , hashing every individual file inside? way if folder hash differs can choose expensive individual file checking or not (which how once have local index compare against remote one).

you iteratively walk directories, stat directory , each file contains, not following links , produce hash. here's example:

'use strict';  // npm install siphash var siphash = require('siphash'); // npm install walk var walk = require('walk');  var key = siphash.string16_to_key('0123456789abcdef'); var walker  = walk.walk('/tmp', {followlinks: false});  walker.on('directories', directoryhandler); walker.on('file', filehandler); walker.on('errors', errorshandler); // plural walker.on('end', endhandler);  var directories = {}; var directoryhashes = [];  function addrootdirectory(name, stats) {     directories[name] = directories[name] || {         filestats: []     };      if(stats.file) directories[name].filestats.push(stats.file);     else if(stats.dir) directories[name].dirstats = stats.dir; }  function directoryhandler(root, dirstatsarray, next) {     addrootdirectory(root, {dir:dirstatsarray});     next(); }  function filehandler(root, filestat, next) {     addrootdirectory(root, {file:filestat});     next(); }  function errorshandler(root, nodestatsarray, next) {     nodestatsarray.foreach(function (n) {         console.error('[error] ' + n.name);         console.error(n.error.message || (n.error.code + ': ' + n.error.path));     });     next(); }  function endhandler() {     object.keys(directories).foreach(function (dir) {         var hash = siphash.hash_hex(key, json.stringify(dir));         directoryhashes.push({             dir: dir,             hash: hash         });     });      console.log(directoryhashes); } 

you want of course turn kind of command-line app take arguments , double check files returned in correct order every time (maybe sort file stats based on file name prior hashing!) siphash returns right hash every time.

this not tested code.. provide example of i'd start sort of thing.

edit: , reduce dependencies, use node's crypto lib instead of siphash if want require('crypto'); , walk/stat directories , files if you'd of course.


Comments

Popular posts from this blog

PHP DOM loadHTML() method unusual warning -

python - How to create jsonb index using GIN on SQLAlchemy? -

c# - TransactionScope not rolling back although no complete() is called -