[previous] [up] [next]     [index]
Next: Extended Exercise: Evaluating Scheme Up: More Self-referential Data Definitions Previous: Extended Exercise: Binary Search

Lists in Lists

The World Wide Web, or just ``the Web,'' has become the most interesting part of the Internet, a global network of computers. Roughly speaking, the Web is a collection of Web pages. Each Web page is a sequence of words, pictures, movies, audio messages, and many more things. Most important, Web pages also contain links to other Web pages.

A Web browser enables people to view Web pages. It presents a Web page as a sequence of words, images, and so on. Some of the words on a page may be underlined. Clicking on underlined words leads to a new Web page. Most modern browsers also provide a Web page composer. These are tools that help people create collections of Web pages. A composer can, among other things, search for words or replace one word with another. In short, Web pages are things that we should be able to represent on computers, and there are many functions that process Web pages.

To simplify our problem, we consider only Web pages of words and nested Web pages. One way of understanding such a page is as a sequence of words and Web pages. This informal description suggests a natural representation of Web pages as lists of symbols, which represent words, and Web pages, which represent nested Web pages. After all, we have emphasized before that a list may contain different kinds of things. Still, when we spell out this idea as a data definition, we get something rather unusual:

A Web-page (short: WP) is either
  1. empty;
  2. (cons s wp)
    where s is a symbol and wp is a Web page; or
  3. (cons ewp wp)
    where both ewp and wp are Web pages.

This data definition differs from that of a list of symbols in that it has three clauses instead of two and that it has three self-references instead of one. Of these self-references, the one at the beginning of a constructed list is the most unusual. We refer to such Web pages as immediately embedded Web pages.

Because the data definition is unusual, we construct some examples of Web pages before we continue. Here is a plain page:

'(The TeachScheme! Project aims to improve the 
  problem-solving and organization skills of high 
  school students. It provides software and lecture 
  notes as well as exercises and solutions for teachers.) 
It contains nothing but words. Here is a complex page:
'(The TeachScheme Web Page
  Here you can find: 
  (LectureNotes for Teachers) 
  (Guidance for (DrScheme: a Scheme programming environment)) 
  (Exercise Sets) 
  (Solutions for Exercises) 
  For further information: write to scheme@cs) 
The immediately embedded pages start with parentheses and the symbols 'LectureNotes, 'Guidance, 'Exercises, and 'Solutions. The second embedded Web page contains another embedded page, which starts with the word 'DrScheme. We say this page is embedded with respect to the entire page.

Let's develop the function size, which consumes a Web page and produces the number of words that it and all of its embedded pages contain:

;; size : WP -> number
;; to count the number of symbols that occur in a-wp 
(define (size a-wp) ...) 

The two Web pages above suggest two good examples, but they are too complex. Here are three examples, one per subclass of data:

(= (size empty)
   0) 
(= (size (cons 'One empty))
   1) 
(= (size (cons (cons 'One empty) empty))
   1) 
The first two examples are obvious. The third one deserves a short explanation. It is a Web page that contains one immediately embedded Web page, and nothing else. The embedded Web page is the one of the second example, and it contains the one and only symbol of the third example.

To develop the template for size, let's carefully step through the design recipe. The shape of the data definition suggests that we need three cond-clauses: one for the empty page, one for a page that starts with a symbol, and one for a page that starts with an embedded Web page. While the first condition is the familiar test for empty, the second and third need closer inspection because both clauses in the data definition use cons, and a simple cons? won't distinguish between the two forms of data.

If the page is not empty, it is certainly constructed, and the distinguishing feature is the first item on the list. In other words, the second condition must use a predicate that tests the first item on a-wp:

;; size : WP -> number
;; to count the number of symbols that occur in a-wp 
(define (size a-wp) 
  (cond 
    [(empty? a-wp) ...] 
    [(symbol? (first a-wp)) ... (first a-wp) ... (size (rest a-wp)) ...] 
    [else ... (size (first a-wp)) ... (size (rest a-wp)) ...])) 
The rest of the template is as usual. The second and third cond clauses contain selector expressions for the first item and the rest of the list. Because (rest a-wp) is always a Web page and because (first a-wp) is one in the third case, we also add a recursive call to size for these selector expressions.

Using the examples and the template, we are ready to design size: see figure [cross-reference]. The differences between the definition and the template are minimal, which shows again how much of a function we can design by merely thinking systematically about the data definition for its inputs.


;; size : WP -> number
;; to count the number of symbols that occur in a-wp 
(define (size a-wp) 
  (cond 
    [(empty? a-wp) 0] 
    [(symbol? (first a-wp)) (+ 1 (size (rest a-wp)))] 
    [else (+ (size (first a-wp)) (size (rest a-wp)))])) 
Figure: The definition of size for Web pages


Exercises

Exercise 14.3.1

Briefly explain how to define size using its template and the examples. Test size using the examples from above.

Exercise 14.3.2

Develop the function occurs1. The function consumes a Web page and a symbol. It produces the number of times the symbol occurs in the Web page, ignoring the nested Web pages.

Develop the function occurs2. It is like occurs1, but counts all occurrences of the symbol, including in embedded Web pages. Solution

Exercise 14.3.3

Develop the function replace. The function consumes two symbols, new and old, and a Web page, a-wp. It produces a page that is structurally identical to a-wp but with all occurrences of old replaced by newSolution

Exercise 14.3.4

People do not like deep Web trees because they require too many page switches to reach useful information. For that reason a Web page designer may also want to measure the depth of a page. A page containing only symbols has depth 0. A page with an immediately embedded page has the depth of the embedded page plus 1. If a page has several immediately embedded Web pages, its depth is the maximum of the depths of embedded Web pages plus 1. Develop depth, which consumes a Web page and computes its depth. Solution



[previous] [up] [next]     [index]
Next: Extended Exercise: Evaluating Scheme Up: More Self-referential Data Definitions Previous: Extended Exercise: Binary Search

PLT