Many objects in our world are related to each other in a circular manner. We have parents; our parents have children. A computer may connect to another computer, which in turn may connect to the first. And we have seen data definitions that refer to each other.
Since data represents information about real-world objects, we will encounter
situations that call for the design of a class of structures with a circular
relationship. In the past, we have skirted the issue, or we used a trick to
represent such collections. The trick is to use an indirection. For example, in
section
, we associated each structure with a symbol,
kept a table of symbols and structures around, and placed symbols into
structures. Then, when we needed to find out whether some structure refers to
another, we extracted the relevant symbol and looked in the table to find the
structure for the symbol. While this use of indirection allows us to represent
structures with mutual references or structures in a cyclic relationship, it
also leads to awkward data representations and programs. This section
demonstrates that we can simplify the representation of collections with
structure mutation.
To make this idea concrete, we discuss two examples: family trees and simple graphs. Consider the case of family trees. Thus far, we have used two kinds of family trees to record family relationships. The first is the ancestor tree; it relates people to their parents, grandparents, and so on. The second is the descendant tree; it relates people to their children, grandchildren, and so on. In other words, we have avoided the step of combining the two family trees into one, the way it is done in the real world. The reason for skirting the joint representation is also clear. Translated into our data language, a joint tree requires that a structure for a father should contain the structures for his children, and each of the child structures should contain the father structure. In the past, we couldn't create such collections of structures. With structure mutations, we can now create them.
Here is structure definition that makes this discussion concrete:
(define-struct person (name social father mother children))The goal is to create family trees that consist of person structures. A person structure has five fields. The content of each is specified by the following data definition:
An family-tree-node (short: ftn) is either A person is a structure:
- false or
- a person.
(make-person n s f m c) where n is a symbol, s is number, f and m are ftns, and c is a (listof person).
As usual, the false in the definition of family tree nodes represents missing information about a portion of the family tree.
Using make-person alone, we cannot establish the mutual reference between a family tree node for a father and his child. Suppose we follow an ancestral tree strategy, that is, we create the structure for the father first. Then we can't add any child to the children field, because, by assumption, the corresponding structure doesn't exist yet. Conversely, if we follow a descendant tree strategy, we first create a structure for all of a father's children, but those structures can't contain any information about the father yet.
What this suggests is that a simple constructor for this kind of data isn't really enough. Instead, we should define a GENERALIZED CONSTRUCTOR that not only creates a person structure but also initializes it properly when possible. To develop this function, it is best to follow the real world, where upon the birth of a child, we create a new entry in the family tree, record the child's parents, and record in the existing parents' entries that they have a newborn. Here is the specification for just such a function:
;; add-child! : symbol number person person -> person ;; to construct a person structure for a newborn ;; effect: to add the new structure to the children of father and mother (define (add-child! name soc-sec father mother) ...)Its task is to create a new structure for a newborn child and to add the structure to an existing family tree. The function consumes the child's name, social security number, and the structures representing the father and the mother.
The first step of the design of add-child! is to create the new structure for the child:
(define (add-child! name soc-sec father mother)
(local ((define the-child
(make-person name soc-sec father mother empty)))
...))
This covers the first part of the contract. By naming the structure in a
local-expression we can mutate it in the body of the expression.
The second step of the design of add-child! is to add a body to the local-expression that performs the desired effects:
(define (add-child! name soc-sec father mother)
(local ((define the-child
(make-person name soc-sec father mother empty)))
(begin
(set-person-children! father
(cons the-child (person-children father)))
(set-person-children! mother
(cons the-child (person-children mother)))
the-child)))
Since there are two specified effects and since the purpose statement
also specifies a result, the body of the local-expression is a begin-expression with
three subexpressions. The first mutates father, adding
the-child to the list of children. The second mutates
mother in an analogous manner. The last one produces the desired
result.
Figure
illustrates the evaluation of an
application of add-child!:
(add-child! 'Ludwig 3
(make-person 'Adam ... ... ...)
(make-person 'Eve ... ... ...))
The top-half shows the new structure for 'Ludwig and how it refers
to the father and mother structures. Just as in
section With add-child! we can create family trees, one child at a time. What we need to learn is how to design functions that process this new class of family trees. In this case, we can almost always pick one of the two views that we used before: the ancestor family tree or the descendant family tree. Either view just ignores certain fields in the structures. Once we have chosen a view, we design the desired functions following the known recipes. Even if we decide to use the bi-directional relations in the new family tree representation, designing a function is usually simply a matter of formulating those auxiliary functions that correspond to the real-world family relationships and to compose them properly. The following few exercises demonstrate these principles.
Exercise 43.2.1
Modify add-child! so that it has the following contract:
;; add-child! : symbol number ftn ftn -> personThe function otherwise behaves just like the original version.
Once we have the modified function, there is no need for make-person any more. We can create all forms of person structures with add-child! directly.
Transliterate the family tree in figure
into the new
representation; use the new modified add-child! function
exclusively. Solution
Exercise 43.2.2
Develop the function how-many-ancestors, which consumes a family tree node and determines how many ancestors there are. The node itself counts as an ancestor. Solution
Exercise 43.2.3
Develop how-many-descendants, which consumes a family tree node and determines how many descendants there are. The node itself counts as a descendant. Solution
Exercise 43.2.4
Develop names-of-cousins. The function consumes a person and produces the names of the cousins.
Hints: (1) Don't forget to use Scheme's built-in functions for processing
lists. (2) Use a sufficiently large portion of your own family tree to test
the functions. (3) For the testing step,
compare the names of the results of the auxiliary functions with the
expected results. Because the structures are mutually referential, it is
difficult to compare them automatically. Alternatively, use eq?,
Scheme's intensional equality predicate, to compare two structures. Why
does this work? Solution
In sections
and
, we
encountered the problem of representing and traversing graphs. Recall that
a graph is a collection of nodes and connections between nodes. The graph
traversal problem is to determine whether there is a route from a node
labeled orig to one called dest. In a simple graph, each
node has exactly one one-way connection to another node.
Originally, we represented a graph as a list of named nodes. If one node
was connected to another, the corresponding structure for the first node
contained the name of the second node, not the node itself.
Exercise
introduced a vector-based
representation. Still, all of our representations used the indirection
trick, so that if we wanted to move from one node to another, we first had
to look up the connection in a table.
Using structure mutation, we can eliminate this indirection and create
structures for nodes that contain each other, even if the graph contains a
cycle. To understand how this works in a concrete manner, let's discuss how
to model simple graphs such as those in figure
and how
to design programs that find routes through such graphs. First, we need a
structure definition for nodes:
(define-struct node (name to))The name field records the name of the node, and the to field specifies to which other node it is connected. Second, we need a data definition:
A simple-graph-node (node) is a structure:
(make-node n t) where n is a symbol and t is a node.
The data definition is unusual in that it is self-referential, but it doesn't consist of several clauses. This immediately raises the question of how we can construct a node that complies with this definition. Clearly, applying make-node doesn't work; instead, we need to define a generalized constructor that immediately sets the to field of a node.
The generalized constructor consumes the atomic data for a node structure and constructs a legal node structure from there:
;; create-node : symbol -> node ;; to create a simple legal graph node with a-name in the name field (define (create-node a-name) (local ((define the-node (make-node a-name false))) ...))The natural candidate to place into the to field is the node itself. In other words, the generalized constructor creates a node that contains itself:
;; create-node : symbol -> node
;; to create a simple graph node that contains a-name and itself
(define (create-node a-name)
(local ((define the-node (make-node a-name false)))
(begin
(set-node-to! the-node the-node)
the-node)))
The generalized constructor makes the node using the ordinary constructor,
initializing the name field properly and putting false
into the to field. Although the latter is an improper action
according to our data definition, it is acceptable because it is
immediately corrected in the local-expression's body. Hence an application of
create-node produces a node as promised.
With create-node we can create the nodes in a graph, but we can't establish the connections between them. To connect two nodes, we must modify the to field of one of the structures so that it contains the other. While this suggestion is generally on target, it raises the problem of how to identify the nodes. The family tree example suggests one solution, namely, to introduce one variable definition per node. Another comes from our orginal work with graphs, where we represented graphs as lists of symbolic pairs of connections or lists of nodes or vectors of nodes. Here we pursue the second option:
A simple-graph is a (listof node).
Assuming we have a list of all nodes, say the-graph, and a function for looking up the node with a given name, say lookup-node, we can create a connection from one node to the other with a structure mutation:
(set-node-to! (lookup-node from-name the-graph)
(lookup-node to-name the-graph))
We can make connecting two nodes more convenient than that with an
auxiliary function:
;; connect-nodes : symbol symbol graph -> void
;; effect: to mutate the to field in the structure with
;; from-name in the name field so that it contains
;; the structure with to-name in the name field
(define (connect-nodes from-name to-name a-graph)
(set-node-to! (lookup-node from-name a-graph)
(lookup-node to-name a-graph)))
Defining lookup-node is an exercise in structural function design,
though it is best done using Scheme's assf function, which
abstracts this situation.
;; create-node : symbol -> node ;; to create a simple graph node that contains itself (define (create-node name) (local ((define the-node (make-node name false))) (begin (set-node-to! the-node the-node) the-node)));; connect-nodes : symbol symbol graph -> void ;; effect: to mutate the to field in the structure named ;; from-name so that it contains the structure named to-name (define (connect-nodes from-name to-name a-graph) (set-node-to! (lookup-node from-name a-graph) (lookup-node to-name a-graph)))
;; lookup-node : symbol graph -> node or false ;; to lookup up the node named x in a-graph (define (lookup-node x a-graph) ...)
;; the-graph : graph ;; the list of all available nodes (define the-graph (list (create-node 'A) (create-node 'B) (create-node 'C) (create-node 'D) (create-node 'E) (create-node 'F)))
;; setting up the graph: (begin (connect-nodes 'A 'B the-graph) (connect-nodes 'B 'C the-graph) (connect-nodes 'C 'E the-graph) (connect-nodes 'D 'E the-graph) (connect-nodes 'E 'B the-graph))
Now we can transliterate simple graphs into a Scheme
representation. Suppose we start with the graph in
figure
, which is reproduced here in a tabular format:
The first step is to create a list of all the nodes and to name it. The second
step is to establish the connections according to this
table. Figure
shows the corresponding Scheme expressions.
They are straight transliterations of the columns in the tabular representation
of the graph. There is no need to reconnect the 'F node because it is
already connected to itself.
Exercise 43.2.5
Draw a picture of (create-node 'A) using the boxes-in-boxes
approach from part
and the boxes-and-arrow approach
from part
. Solution
Exercise 43.2.6
Transliterate the given simple graph without creating a list of all nodes. Solution
Exercise 43.2.7
Develop the function symbolic-graph-to-structures. It consumes a list of pairs and creates a graph.
Example:
(define the-graph (symbolic-graph-to-structures '((A B) (B C) (C E) (D E) (E B) (F F))))Evaluating this definition is equivalent to evaluating the definitions in figure
Once we have a method for representing simple graphs, we can turn our
attention to the problem of finding a route from one node in the graph to
another. Recall the original specification from
section
:
;; route-exists? : node node simple-graph -> boolean ;; to determine whether there is a route from orig to dest in sg (define (route-exists? orig dest sg) ...)Of course, we must reinterpret the names for our data classes in the new context, but otherwise the specification is perfectly fine.
The development of the original function demonstrated two new ideas. First, the function uses generative recursion. Once it is known that orig and dest are distinct nodes, the search resumes from the node to which orig is connected. Second, the function requires an accumulator to remember which nodes have been visited. Without the accumulator, the function may revisit the same node over and over again.
So, let's start from the template for generative recursion:
(define (route-exists? orig dest sg)
(cond
[(eq-node? orig dest) true]
[else
(route-exists? ...
... dest sg)]))
The function eq-node? determines whether the two nodes are the same;
this may just use eq?, Scheme's intentional equality predicate, or
it may compare the names of the nodes, assuming they are unique. If the
nodes are the same, a route exists. If not, we can generate a new,
potentially useful problem by moving to the node to which orig is
connected. In the graph representation of section
,
this requires looking in sg. In our new graph representation, the
connection is a part of the node representation. Hence we can use
node-to instead of looking in sg:
(define (route-exists? orig dest sg)
(cond
[(eq-node? orig dest) true]
[else (route-exists? (node-to orig) dest sg)]))
The function definition shows that, so far, sg is useless. Because a node in the new graph representation contains its neighbors, and the neighbor contains its neighbor, and so on, there is no need to use the table.
The termination argument for this function fails, just as for the original
one in section
. To see why our new function may
fail to terminate, take a look at its definition. It doesn't contain
false, and the function cannot possibly produce
false--even though we know that our sample graph, for example,
doesn't contain a path from 'F to 'A or anywhere else.
If we inspect what happens with
(route-exists? (lookup-node the-graph 'F)
(lookup-node the-graph 'A))
we see that route-exists? repeatedly visits the node 'F. In short, it forgets what it has processed so far.
We know that equipping route-exists? with an accumulator overcomes this lack of knowledge, but that requires another table lookup. We can do better than that with a structure mutation that records a visit by the route-exists? function. To do that, the node structures need an addtional field; we call it visited:
(define-struct node (name visited to))
Initially the field contains false. As route-exists? visits a node, it puts true into the field:
(define (route-exists? orig dest sg)
(cond
[(eq-node? orig dest) true]
[(node-visited orig) false]
[else
(begin
(set-node-visited! orig true)
(route-exists? (node-to orig) dest sg))]))
To exploit this new knowledge, the function checks the new structure field
as one of the new termination conditions. If orig has been visited
before, there is no route because the function has discovered a cycle in
the graph.
The second structure mutation of this example illustrates two ideas. First, structure mutation can replace a table-based accumulator. In general, though, it is best to study a table-based version and to add structure mutations based on a solid understanding of the accumulated knowledge. Second, structure mutations can play a role in termination tests for generative recursion. After all, state change is motivated by the desire to remember things across function applications, and termination tests must discover whether things have changed. While the combination is rare, it is useful, and it appears time and again in the study of algorithms.
Exercise 43.2.8
The function route-exists? assumes that the visited fields of all the nodes are initially false. A single use of the function, however, sets some of the fields in a graph to true. This implies that the function cannot be used twice in a row.
Develop a revised version of route-exists?, with the same specification, that sets all visited fields to false before it searches for a route between the given nodes.
Determine the abstract running time of the new function, assuming the graph has N nodes. Solution
Exercise 43.2.9
Develop the function reachable. It consumes a node in a simple graph. Its effect is to place true into the visited fields of all those nodes that are reachable from the given node and to ensure that the visited fields of all other nodes are false. Solution
Exercise 43.2.10
Develop make-simple-graph, a function that manages the state of a locally defined graph. The function accepts a simple graph in the form of lists of pairs of symbols: (listof (list symbol symbol)). It supports four services:
Hint: Instead of using a list, the manager should use a node sequence,
which is analogous to the hand structure from
section
. A node sequence relies on the following
structure:
(define-struct sequence (node next))A sequence is similar to a list, but it supports structure mutations. Solution
The discussion of this section confirms the usefulness of the design recipes,
even for collections of structures that refer to each other. The most
important lesson is that such situations call for a generalized constructor, a
function that creates a structure and immediately establishes the necessary
connections. Generalized constructors correspond to the initializers of
section
; we have also seen the idea in
section
where we created a hand from a single card.
In some cases, such as the one for simple graphs, we may also want to introduce
auxiliary functions for mutating the structures a second time. Once we have
those functions, we can use the standard recipes, including those for
introducing additional structure fields.