Let's develop a data representation for file systems using the method of iterative refinement. The first decision we need to make is what to focus on and what to ignore.
Consider the directory tree in figure
and let's imagine how it
is created. When a user first creates a directory, it is empty. As time goes by,
the user adds files and directories. In general, a user refers to files by names
but thinks of directories as containers of other things.
Model 1: Our thought experiment suggests that our first and most primitive model should focus on files as atomic entities, say, a symbol that represents a file's name, and on the directories' nature as containers. More concretely, we should think of a directory as just a list that contains files and directories.
All of this suggests the following two data definitions:
A file is a symbol.
A directory (short: dir) is either
- empty;
- (cons f d) where f is a file and d is a dir; or
- (cons d1 d2) where d1 and d2 are dirs.
The first data definition says that files are represented by their names. The second one captures how a directory is gradually constructed by adding files and directories.
A closer look at the second data definition shows that the class of
directories is the class of Web pages of section
. Hence we
can reuse the template for Web-page processing functions to process
directory trees. If we were to write a function that consumes a directory
(tree) and counts how many files are contained, it would be identical to a
function that counts the number of words in a Web tree.
Translate the file system in figure
into a Scheme
representation according to model 1. Solution
Exercise 16.2.2
Develop the function how-many, which consumes a dir and
produces the number of files in the dir
tree. Solution
Model 2: While the first data definition is familiar to us and easy to use, it obscures the nature of directories. In particular, it hides the fact that a directory is not just a collection of files and directories but has several interesting attributes. To model directories in a more faithful manner, we must introduce a structure that collects all relevant properties of a directory. Here is a minimal structure definition:
(define-struct dir (name content))It suggests that a directory has a name and a content; other attributes can now be added as needed.
The intention of the new definition is that a directory has two attributes: a name, which is a symbol, and a content, which is a list of files and directories. This, in turn, suggests the following data definitions:
A directory (short: dir) is a structure:
(make-dir n c) where n is a symbol and c is a list of files and directories.
A list-of-files-and-directories (short: LOFD) is either
- empty;
- (cons f d) where f is a file and d is a LOFD; or
- (cons d1 d2) where d1 is a dir and d2 is a LOFD.
Since the data definition for dir refers to the definition for LOFDs, and the definition for LOFDs refers back to that of dirs, the two are mutually recursive definitions and must be introduced together.
Roughly speaking, the two definitions are related like those of parent and
list-of-children in section
. This, in turn, means
that the design recipe for programming from section
directly applies to dirs and LOFDs. More concretely, to design a
function that processes dirs, we must develop templates for
dir-processing functions and LOFD-processing functions
in parallel.
Show how to model a directory with two more attributes: a size and a systems attribute. The former measures how much space the directory itself (as opposed to its files and subdirectories) consumes; the latter specifies whether the directory is recognized by the operating system. Solution
Exercise 16.2.4
Translate the file system in figure
into a Scheme
representation according to model 2. Solution
Exercise 16.2.5
Develop the function how-many, which consumes a dir according to
model 2 and produces the number of files in the dir
tree. Solution
Model 3: The second data definition refined the first one with the introduction of attributes for directories. Files also have attributes. To model those, we proceed just as above. First, we define a structure for files:
(define-struct file (name size content))Second, we provide a data definition:
A file is a structure:
(make-file n s x) where n is a symbol, s is a number, and x is some Scheme value.
For now, we think of the content field of a file as set to empty. Later, we will discuss how to get access to the data in a file.
Finally, let's split the content field of dirs into two pieces: one for a list of files and one for a list of subdirectories. The data definition for a list of files is straightforward and relies on nothing but the definition for files:
A list-of-files is either
- empty, or
- (cons s lof) where s is a file and lof is a list of files.
In contrast, the data definitions for dirs and its list of subdirectories still refer to each other and must therefore be introduced together. Of course, we first need a structure definition for dirs that has a field for files and another one for subdirectories:
(define-struct dir (name dirs files))Here are the data definitions:
A dir is a structure:
(make-dir n ds fs) where n is a symbol, ds is a list of directories, and fs is a list of files.
A list-of-directories is either
- empty or
- (cons s lod) where s is a dir and lod is a list of directories.
This third model (or data representation) of a directory
hierarchy captures the nature of a file system as a user typically
perceives it. With two structure definitions and four data definitions, it
is, however, far more complicated than the first model. But, by starting
with a the simple representation of the first model and refining it step by
step, we have gained a good understanding of how to work with this complex
web of classes. It is now our job to use the design recipe from
section
for developing functions on this set of data
definitions. Otherwise, we cannot hope to understand our functions at all.