optimization - How to list directories faster? -
i have few situations need list files recursively, implementations have been slow. have directory structure 92784 files. find
lists files in less 0.5 seconds, haskell implementation lot slower.
my first implementation took bit on 9 seconds complete, next version bit on 5 seconds , i'm down bit less 2 seconds.
listfilesr :: filepath -> io [filepath] listfilesr path = let isdodd "." = false isdodd ".." = false isdodd _ = true in allfiles <- getdirectorycontents path dirs <- form allfiles $ \d -> if isdodd d let p = path </> d isdir <- doesdirectoryexist p if isdir listfilesr p else return [d] else return [] return $ concat dirs
the test takes 100 megabytes of memory (+rts -s), , program spends around 40% in gc.
i thinking of doing listing in writert monad sequence monoid prevent concats , list creation. helps? else should do?
edit: have edited function use readdirstream, , helps keeping memory down. there's still allocation happening, productivity rate >95% , runs in less second.
this current version:
list path = de <- opendirstream path readdirstream de >>= go de closedirstream de go d [] = return () go d "." = readdirstream d >>= go d go d ".." = readdirstream d >>= go d go d x = let newpath = path </> x in e <- doesdirectoryexist newpath if e list newpath >> readdirstream d >>= go d else putstrln newpath >> readdirstream d >>= go d
i think system.directory.getdirectorycontents
constructs whole list , therefore uses memory. how using system.posix.directory
? system.posix.directory.readdirstream
returns entry 1 one.
also, filemanip library might useful although have never used it.
Comments
Post a Comment