Python - How traverse filesystem directory
Every so often you will find yourself needing to write code that traverse a directory. They tend to be one-off scripts or clean up scripts that run in cron in my experience. Anyway, Python provides a very useful methods of walking a directory structure. We cover best of them.
Testing directory structure
Here is my testing filesystem tree. Root is in /test
~] tree -a /test
/test
├── A
│ ├── AA
│ │ └── aa.png
│ ├── a.png
│ └── a.txt
├── B
│ ├── BB
│ └── b.txt
├── broken_symlink -> /aaa
├── symlink -> /etc
├── .test
├── test.png
└── test.txt
python os.walk()
os.walk()
os.walk(top, topdown=True, onerror=None, followlinks=False)
- Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple
(dirpath, dirnames, filenames)
- dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '
.
' and '..
'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, doos.path.join(dirpath, name)
. Whether or not the lists are sorted depends on the file system. If a file is removed from or added to the dirpath directory during generating the lists, whether a name for that file be included is unspecified. - If optional argument topdown is True or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top-down). If topdown is False, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up). No matter the value of topdown, the list of subdirectories is retrieved before the tuples for the directory and its subdirectories are generated.
- By default, walk() will not walk down into symbolic links that resolve to directories. Set followlinks to True to visit directories pointed to by symlinks, on systems that support them.
Be aware that setting followlinks to True can lead to infinite recursion if a link points to a parent directory of itself. walk()
does not keep track of the directories it visited already.
If you pass a relative pathname, don’t change the current working directory between resumptions of walk()
. walk()
never changes the current directory, and assumes that its caller doesn’t either.
os.walk() example 1
~] tree -a /test
/test
├── A
│ ├── AA
│ │ └── aa.png
│ ├── a.png
│ └── a.txt
├── B
│ ├── BB
│ └── b.txt
├── broken_symlink -> /aaa
├── symlink -> /etc
├── .test
├── test.png
└── test.txt
import os
for root, subfolders, filenames in os.walk("/test"):
print(root, subfolders, filenames)
# output:
/test ['A', 'B', 'symlink'] ['.test', 'test.png', 'test.txt', 'broken_symlink']
/test/A ['AA'] ['a.txt', 'a.png']
/test/A/AA [] ['aa.png']
/test/B ['BB'] ['b.txt']
/test/B/BB [] []
Be avare: os.walk()
evaluate broken symlink as file, but symlink to existing directory as a directory!!! See first line in output from in os.walk()
example 1.
os.walk() example 2
~] tree -a /test
/test
├── A
│ ├── AA
│ │ └── aa.png
│ ├── a.png
│ └── a.txt
├── B
│ ├── BB
│ └── b.txt
├── broken_symlink -> /aaa
├── symlink -> /etc
├── .test
├── test.png
└── test.txt
import os
for root, subfolders, filenames in os.walk("/test"):
for file in filenames:
print(os.path.join(root, file))
# output
/test/.test
/test/test.png
/test/test.txt
/test/broken_symlink
/test/A/a.txt
/test/A/a.png
/test/A/AA/aa.png
/test/B/b.txt
Note: we do not see in output symlink file