Python List Files

Summary: in this tutorial, you’ll learn how to list files in a directory using the Python os.walk() function.

Sometimes, you may want to list all files from a directory for processing. For example, you might want to find all images of a directory and resize each of them. To list all files in a directory, you can use the os.walk() function.

The os.walk() function generates file names in a directory by walking the tree either top-down or bottom-up. The os.walk() function yields a tuple with three fields (dirpath, dirnames, and filenames) for each directory in the directory tree.

Note that the os.walk() function examines the whole directory tree. Therefore, you can use it to get all files from all directories and their subdirectories of a root directory.

Python list file example

Suppose you have a folder D:\web with the following directories and files:

D:\web
├── assets
|  ├── css
|  |  └── style.css
|  └── js
|     └── app.js
├── blog
|  ├── read-file.html
|  └── write-file.html
├── about.html
├── contact.html
└── index.htmlCode language: Python (python)

The following example shows how to use the os.walk() function to list all HTML files from the D:\web directory:

import os


path = 'D:\\web'

html_files = []

for dirpath, dirnames, filenames in os.walk(path):
    for filename in filenames:
        if filename.endswith('.html'):
            html_files.append(os.path.join(dirpath, filename))

for html_file in html_files:
    print(html_file)Code language: Python (python)

Output:

D:\web\about.html
D:\web\contact.html
D:\web\index.html
D:\web\blog\read-file.html
D:\web\blog\write-file.htmlCode language: Python (python)

How it works.

First, initialize a list to store the path to HTML files:

html_files = []Code language: Python (python)

Second, call os.walk() function to examine directories of the D:\web folder:

for dirpath, dirnames, filenames in os.walk(path):Code language: Python (python)

The dirpath stores the directory and filenames store files in that directory.

Third, loop over the filenames and add them to the html_files list if their extensions are .html:

# ...
for filename in filenames:
        if filename.endswith('.html'):
            html_files.append(os.path.join(dirpath, filename))Code language: Python (python)

Note that the os.path.join() returns the full path of the filename by joining the dirpath with the filename.

Finally, print output the filenames in the html_files list:

for html_file in html_files:
    print(html_file)Code language: Python (python)

Defining a reusable list files function

By using the os.walk() function, we can define a reusable list_files() function like this:

import os


def list_files(path, extentions=None):
    """ List all files in a directory specified by path
    Args:
        path - the root directory path
        extensions - a iterator of file extensions to include, pass None to get all files.
    Returns:
        A list of files specified by extensions
    """
    filepaths = []
    for root, _, files in os.walk(path):
        for file in files:
            if extentions is None:
                filepaths.append(os.path.join(root, file))
            else:
                for ext in extentions:
                    if file.endswith(ext):
                        filepaths.append(os.path.join(root, file))

    return filepaths


if __name__ == '__main__':
    filepaths = list_files(r'D:\web', ('.html', '.css'))
    for filepath in filepaths:
        print(filepath)Code language: Python (python)

Output:

D:\web\about.html
D:\web\contact.html
D:\web\index.html
D:\web\assets\css\style.css
D:\web\blog\read-file.html
D:\web\blog\write-file.htmlCode language: Python (python)

Make list files function more efficient

If the number of files is small, the list_files() function works fine. However, when the number of files is large, returning a large list of files is not memory efficient.

To resolve this, you can use a generator to yield each file at a time instead of returning a list:

import os


def list_files(path, extentions=None):
    """ List all files in a directory specified by path
    Args:
        path - the root directory path
        extensions - a iterator of file extensions to include, pass None to get all files.
    Returns:
        A list of files specified by extensions
    """
    for root, _, files in os.walk(path):
        for file in files:
            if extentions is None:
                yield os.path.join(root, file)
            else:
                for ext in extentions:
                    if file.endswith(ext):
                        yield os.path.join(root, file)


if __name__ == '__main__':
    filepaths = list_files(r'D:\web', ('.html', '.css'))
    for filepath in filepaths:
        print(filepath)Code language: Python (python)

Summary

  • Use the os.walk() function to list files in a directory recursively.
  • Define a reusable function for listing files in a directory using the os.walk() function.
Did you find this tutorial helpful ?