This transition can rarely be expressed easily in words. Communication depends more on the ability of the teacher to demonstrate to the student how the nature of interpretation involves more than playing the notes exactly as they are written. Such demonstration requires cultivation of the student's ability to perceive that what the teacher does is really different from what the student is doing, and it also involves the teacher's ability to perceive what the student is or is not doing by way of appropriate interpretation. This is a very subtle process which has been examined considerably by Donald Schön [Sch87] as a problem of educating talent in designing. Schön has analyzed several case studies in master classes in musical performance, which he interprets as dealing with "designing performance."
The greatest impediment to the teacher in this task is time. The music only exists while it is being played; so the teacher's ability to perceive the student goes as soon as it comes, so to speak. A teacher with a well-developed memory can generally manage under such temporal constraints; but less-experienced teachers are likely to find it more difficult for their perceptions to "keep up with the music."
One of the major virtues of MIDI [Loy89] technology is that it is now very easy to make a faithful recording, in digital form, of a performance on an electric piano whose keys are weighted to provide the same physical sensations as an acoustic piano [1]. Once these data have been recorded, they may be analyzed in a variety of different ways in the service of piano education. One of the most systematic approaches to this application has been the Piano Tutor system developed by Roger Dannenberg and his colleagues at Carnegie Mellon University [D+90]. As illustrated in Figure 1, Piano Tutor draws upon an expert system to coordinate the presentation of lessons from a database and to analyze errors in the performance of each lesson. Error analysis is based on a score following module which is responsible for matching the MIDI events of a piano performance with the actual notes in the music being performed. The error analysis, in turn, drives which lessons are next to be presented to the student on the basis of a model of what the student has learned. The multimedia elements of graphical display, demonstrations prerecorded on videodisc, and performances through a synthesizer all serve to supplement the basic content of each lesson.
Figure 1. Block diagram of Piano Tutor [D+90].
Piano Tutor was designed to be a system for beginning students who still have to acquire notation literacy, learning all the foundations concerned with what the different constructs of music notation represent. It assumes that this is knowledge that the student can acquire strictly through interaction with a computer, without a piano teacher explicitly being part of the educational process. Because about the only way notation literacy can be acquired is through extensive practice and exposure to the notation, this is probably a valid assumption. Good teachers can play a significant role in motivating their students, but the only issue which really matters in this case is how much time the student puts into becoming familiar with the notation. If software can be designed attractively enough to encourage the student to invest that time, there is no reason why it cannot substitute for a music teacher.
Indeed, while Piano Tutor demands rather extensive hardware and software, the basic principles behind its design have been simplified for the sake of commercialization. The Miracle system is perhaps the most successful of these commercial endeavors. Like Piano Tutor it introduces the student to the constructs of the notation through a series of graded lessons. However, its score following is far more simplistic than Piano Tutor's; and there is no expert system which plans the next lesson to give to the student. While there are no demonstration videos, added diversion is provided in the form of games which familiarize the student with the notation in a slightly more subversive manner. There are also facilities, like a "recording studio," which encourage the student to "just play around" with the instrument, a practice which is absolutely vital to developing a love for the music itself, as well as the proper technique for its performance.
Reading music notation, however, is not the goal of piano education. One takes piano lessons to learn how to perform music, and learning music notation is usually simply the first subgoal to be satisfied in the course of that educational process. Once that subgoal has been achieved, the role of the piano teacher becomes more significant.
If the computer is to play a role in this second, more critical, phase of piano education, we should not try to envisage that role as one of automating the teacher; rather, we should ask how the computer can facilitate what is clearly a very difficult communication process [Cla89]. One of the conclusions we may draw from the studies reported by Schön is that the role of the teacher is just as important (if not more so) as that of the "information being acquired" by the student. This is because one of the most critical difficulties is in the matter of whether or not the student has really grasped what the teacher is trying to say (and in whether or not the teacher can establish whether or not the student has grasped it). This matter cannot be established through words. A student who knows how to say the right thing may not necessarily actually do it when the time comes to bring the fingers to the keyboard.
The problem thus has far more to do with perception than with the conventional paradigm of "knowledge acquisition" for expert systems [Cla89]; and, as far as music education is concerned, what most needs to be acquired is the skill of listening. The teacher can only assess the success of communication with the student upon hearing how the student is actually performing, and the student's own knowledge ultimately depends on an ability to hear a performance the same way the teacher does. A teacher who tries to correct a performance is usually trying to discuss certain features which the student does not yet hear. Improvement will only come when the student finally masters the ability to hear those same features.
One of the key limitations to effective teacher-student communication is memory. Because the music only exists while it is being played, communication requires reference to past experiences. Therefore, the first step towards facilitating communication is to make the past more accessible; and this may be achieved through the ability to record and play back performances and, more specifically, selected portions whose extent may best be specified by events in the score being performed. This need for selective playback is beyond the capability of the ordinary tape recorder, particularly if one wishes to make selections with respect to a musical score, rather than in terms of minutes and seconds from the beginning of the recording. There is also a need for selective playback of the same material from several different recorded performances, allowing the student performance to be compared against those of one or more teachers or against earlier recordings of the same student.
Given that the listening memory of both teacher and student may be enhanced, it is next necessary to ask the question of what one is listening for when such an experience is recalled (or, for that matter, while it is taking place). On the basis of interviewing piano teachers about the observations they tend to make during a student performance, this question may be restructured as a collection of related questions. We have identified four of these questions as having the greatest importance, and we shall now review them.
Are the dynamics being properly interpreted? This is one of the best examples of how interpretation must be more than simply "decoding" the notation. In the work of many composers, such as Wolfgang Amadeus Mozart, dynamic markings are few, if they are present at all, [3] while they are far more abundant in compositions by, for example, Franz Liszt. However, even when they figure heavily in the notation, a precise rendering of all dynamic markings is not necessarily an accurate interpretation; or, put another way, dynamic markings indicate how a performance could sound, which need not also be a specification for how it should be executed. Furthermore, even when it is not explicitly notated, variation in dynamic level tends to be preferable to uniformity. Often very specific variations in dynamics serve as cues to delimit phrasing. These variations are not random but rather are patterns which have emerged from years of piano performance experience. Thus, while one expects dynamic levels to be relatively even, rather than wildly varying, one also does not want them to be strictly uniform.
Are the onset times of the notes being interpreted correctly? Absolutely precise interpretation is rarely desirable in performance. We all know the experience of tapping the foot to a "beat" defined by these onset times; but we are probably not aware of subtle variations in the rate at which the foot is tapped. Indeed, variation from a uniform tempo is as important as variation from uniform dynamics, often for the same reason-as an indication of "inflection" which sets off phrasing [DH92].
Is the articulation correct? This is basically the percentage of the time until the next note onset during which the current note sounds. Notes which are notated as staccato tend to sound for only about 50% of the time between successive onsets, while the sound of a slurred note tends to endure right up to the onset of its successor. In fact keyboard instruments, such as the piano, allow for the possibility that a note may sound beyond the onset of its successor, either because the key is still down or because the damper pedal is down. Articulation tends to define the character of a phrase, just as dynamics and duration contribute to its shape; so observation of a student's articulation technique is very important.
Are notes which are supposed to be played together properly synchronized? Keyboard music allows for the possibility of notes performed as either successions or simultaneities. Successions of notes tend to be called voices, and simultaneities are usually called chords. In general the simultaneities of chords are expected to be honored (unless some sort of arpeggiation is specified). Thus, the final aspect of performance which teachers listen for is the proper synchronization of notes which are supposed to sound together and for the even execution of arpeggiated chords.
There are actually psychological problems which surround the concept of average loudness. Most important is the fact that perceived loudness is a non-linear function of frequency [Han89]. We have not tried to take this relationship into account in our representation. We recognize that the arithmetic mean of MIDI key velocities may turn out to be only a very rough approximation to what is perceived as average loudness. However, this approximation appears to have been suitable for the displays being presented.
The dynamic level of each note is coded by coloring the head of that note. A color coding technique was chosen as an easily understood means of annotating each note individually. Red indicates loud, and blue indicates soft. The degree of the dynamic level is indicated by the saturation of the color. An example of such a display is given in Figure 2. Mezzo-piano and mezzo-forte dynamics correspond to gray coloration, while piano, pianissimo, forte, and fortissimo are represented by their respective shadings of blue and red. (Unfortunately, this image had to be reproduced with gray levels; but color images may be obtained through the authors. Also, black indicates a note which was not played, while an X represents performance of a note not in the score. The combination of these two annotations is generally a sign of performance of a wrong note.) Clearly, it is not difficult to be aware of the overall dynamic level of a performance without any graphic assistance; but Figure 2 is most useful in providing a more detailed account of lack of evenness in dynamic control, which is manifest through abrupt changes in coloration.
Figure 2. A display of the dynamics in a performance.
The resulting values are plotted on a "speedometer graph," an example of which is illustrated in Figure 3; and they provide an annotation which represents the tempo of a performance. The horizontal axis of this graph corresponds to time and is aligned with the notes in the musical score. The vertical axis represents the speed values as the inverses of the normalized duration intervals, and it is scaled to be linear in these inverse duration values. If a metronome marking is given, then it is displayed as a baseline-a horizontal dashed line which runs through the entire score. Thus, the baseline in Figure 3 corresponds to a speed of 92 dotted quarter notes per minute (as specified in the score excerpt displayed in Figure 2). A speed of 95 dotted quarter notes per minute would correspond to a displacement above this baseline, and the linear scaling implies that a speed of 89 dotted quarter notes per minute will correspond to an equal displacement below the baseline.
Figure 3. A display of the "speedometer graph" annotation.
Speeds of events in the right and left hands are computed separately and plotted on the graph in different colors. In this example note how the variation in the right hand graph indicates an uneven performance of the sixteenth notes against the more steady eighth note tempo in the left hand. The graph also indicates that the student is not slowing down the tempo to indicate the end of section at the double bar.
Clearly, there are some significant elements of performance which the speedometer graph does not capture. There is no attempt to represent speed for individual notes when there are multiple notes in the right or left hand. In such cases the graph is generally driven by the durations of the longest notes, which effectively provides an averaging process. Similarly, no attempts are made to alter the baseline to account for explicitly required changes in tempo. It is up to the teacher to combine listening with examination of the speedometer graph to assess whether the student is executing such alterations in tempo properly. Finally, the speedometer graph gives no explicit representation of whether or not the student is playing on the beat, although sharp variations in the graph often indicate uncertainty in honoring that beat.
It is also important to note that the speedometer graph is not a new construct. In [DH92] it is called a "tempo curve;" and other researchers of musical performance have probably coined other names for it. [DH92] questions the value of tempo curves on the grounds that they do not capture adequate information about how timing contributes to performance. For purposes of reproducing or synthesizing a performance from a score, this may be true; but, in our case, we are concerned with how a teacher may best examine the details of a student's timing. The speedometer graph has been seen to be useful in this respect.
Figure 4. A display of the articulation annotation.
It is also important to observe that this display indicates how articulations are executed. Whether or not they are perceived as they are executed depends on whether or not the dampers are raised by the damper pedal. Therefore, the articulation display is further annotated by a line with triangles indicating the raising and lowering of the dampers. This is illustrated in Figure 5. The articulation triangles then provide information about the student's fingering technique which may not be audible due to the effect of the damper pedal; and they also display how pedal use contributes to the perceived articulation [Lhe72]. The teacher may thus determine whether the student is using the damper pedal as a crutch to compensate for poor articulation technique or whether the pedal actually enhances the expression of articulation.
Figure 5. A display of use of the damper pedal.
Figure 6. A display of the synchronization annotation.
We are currently running usability tests at a local music school. Several teachers have been selected to participate, some with computer experience, some without, some generally favourable to the introduction of new technology into the classroom, some less so. The teachers also represent a wide variety of teaching experience. The students selected are taken from those preparing for the Grade 3 examination of the Associated Board of the Royal Schools of Music (ABRSM). Evaluation will take place over the next several months.
While the only evidence we have accumulated thus far has been anecdotal, many of the anecdotes have been very encouraging. Most important has been an eagerness of the students to work with the pianoFORTE environment. While many of the teachers were, at first, reluctant to work with the system, often due to personal insecurity regarding any form of computer usage, they were quickly encouraged by the eagerness of many of the students, who tended to approach interaction with the computer with the same enthusiasm they generally brought to computer games. One positive outcome was that students would occasionally assist the teacher in remembering the various aspects of operation instructions. Indeed, one external observer of the project commented that, by using pianoFORTE, two students could work together, alternating the roles of student and teacher between them. We feel there is a lot of potential in this approach. Students who are occasionally allowed to play the role of the teacher often feel a greater personal commitment to their work, not to mention a greater appreciation for just what the teacher has to do.
However, it is also important to observe that this new approach to communication will demand a new kind of skill from the teachers, who are not currently used to the idea that, for purposes of a critique, "viewing" a performance can be a valuable supplement to listening to it. The very perception of music may change through the use of these displays; but, ultimately, their primary function is to cultivate listening skills. With sufficient exposure and practice, the attentive student should eventually learn to hear those features which are first presented visually; so, even without the aid of computer displays, that student will then be better equipped to discuss problems of performance with any teacher, even one who has not had experience with the pianoFORTE system.
The graphic displays presented in Figures 2 through 6 all tend to concentrate on properties of individual notes. We like to think of pianoFORTE as providing the teacher with a metaphorical magnifying glass for certain significant aspects of student performances. However, one should not look at the world only through a magnifying glass. Performances need to be assessed at larger time scales as well. To some extent this may be achieved through the displays by detecting overall trends, but this is another example where listening must supplement seeing. "Stepping back" from an entire performance and attempting to assess it "as a whole" cannot be achieved with a magnifying glass. Once again, listening must come first; and pianoFORTE is not so much concerned with assuming the role of listening as with providing additional information, should that information turn out to be relevant. If one of the key contributions of pianoFORTE has been to establish a set of concrete foundations for communication, then in the not-too-distant future it may no longer be necessary for teacher and student to occupy the same physical space. Thus, the possibility of remote teaching becomes viable for piano education. Because the bandwidth of MIDI data is comparatively low, relatively simple network facilities will allow a student to work at home and communicate with his teacher who is at another site. Because the MIDI data will be no different from that collected when the student is in the teacher's own classroom, the teacher will be able to observe the student's performances in exactly the same way and use the same displays to discuss progress with the student. Such communication could take place either in "real time," with the teacher observing and commenting from a remote site, or in "delayed time," where the teacher collects examples of a performance, examines them, and subsequently provides the student with comments. This will allow for greater flexibility in scheduling lessons. It will also increase the accessibility to music teachers for students and probably enable any given music school to provide quality education to more students. Nevertheless, there are definite disadvantages to any "delayed time" interactions. Often, the most valuable thing a teacher can do is demonstrate to the student while the student's performance is still fresh in both their minds. Such a comparative approach tends to lose its impact if the teacher's contribution is delayed.
Because comparison is so important, one of our current activities involves developing a suitable "split screen" approach to presenting graphic displays. Rather than displaying two lines of piano music on the screen, a single line is displayed in both the upper and lower half of the screen, each with its own annotations. This makes it easier for two performances of the same passage to be compared. We have elected to pursue this split screen approach because the approaches we have been taking to these displays do not facilitate overlaying annotations of different performances.
One of the biggest dangers of computer music as a technology is that it detracts from the behavioral aspects of music as an art form. Music is neither the notes on a printed page nor the motor skills required for the proper technical execution of those notes. Rather, it is a far more elaborate complex of behavior in which the making of sounds is tightly coupled with their perception [Smo94]. Whether one is composing music, performing it, or just improvising, listening is still the paramount skill. Any music technology which does not account for listening runs the risk of short-changing its users. The ultimate goal of pianoFORTE is to make us all better listeners. This skill will be equally important whether we make our music at a piano keyboard or at a computer workstation.
One potential problem which should be acknowledged concerns the availability of repertoire. Usability testing is being performed with students who are preparing for a standardized examination procedure involving a limited number of pieces to prepare. (An ABRSM examination book consists of six compositions grouped into two lists for the lower grades. The student must prepare the first and second compositions on either list, as long as both are from the same list, and then select the third composition from either list. Alternatives may also be provided, but students tend to stick with what is available in the examination book.) Standardization has made it easy to begin to build a library of digitally-represented scores which correspond to ABRSM needs. Fortunately, once a composition has been encoded, it may be saved indefinitely; so the plan is to accumulate a library of encoded compositions, driven initially by ABRSM demands and subsequently by further requests from teachers and students [4]. Encoding is currently a difficult process, as the computer display is intended to reproduce the same page layout that the student sees in the examination book. However, recent progress in OCR technology is beginning to show signs that it will eventually be possible to automate this process [KI90].
[2] This behavior is potentially very rich, as has been observed by David Lewin [Lew86].
[3] Sometimes they are inserted by an editor who wishes to indicate a particular approach to performance.
[4] The content of any such library, however, would do well to honor the aforementioned limitations of electric pianos.