The importance of reading skills is grossly underestimated, or simply ignored in computer science education. Early computer science courses tend to focus on activities we typically consider constructive (the result is a tangible product). The explicit goal is to have the students producing programs written in a high-level language a quickly as possible. The justification for such behavior is fairly straightforward, student must demonstrate their competence in language constructs, and progam writing provides the opportunity. Hence the focus is language acquisition, and the use of language in solving a variety of problems. Skill at code reading is, at best, an incidental outcome learned during the course of these activities.
It may be difficult to pose a counter argument, since we know all programming books contain source code as examples, and students are expected to "read" these in order to understand an algorithm or data structure. Furthermore, students are expected to learn new languages, and the language constructs are typically provided in code fragment form (what a looping or selection statement works like). Students read code fragments to understand the behavior of a construct. The focus, and hence the student's attention, is on the language construct, not on understanding how such an assembly of statements solves a particular piece of the problem. This instructional regime in combination with the contructive approach mentioned in the previous paragraph leaves program comprehension an accidental outcome of computer science education. Systematic or thoughtful approaches to reading technical material such as this is not explicit in the educational program. Any claim that students learn to read these products is probably exaggerated. As Deimel and Naveda (Deimel90) write, the available evidence suggests that our current neglect of the topic cannot be justified by the argument that adequate program reading skills develop naturally and without special encouragement in students otherwise well prepared to enter professional practice. (p. 6)
The lack of attention to program comprehension is somewhat puzzling. Code reading, program understanding, and program comprehension appear to be essential for the learning of computer science and programming, as well as, the practicalities of the practitioner. Certainly understanding an algorithm requires considerable effort. If the practitioner/student cannot fully comprehend the material presented then they are doomed to invent their own strategy. An agreeably less than optimal situation.
The world of software development abounds with opportunities to engage coded passages in an attempt to understand the material, or the intent of the writer. Reading competence is necessary for activities where the reader must gain a sufficient understanding of the artifact to accomplish a task. To accomplish the task the individual must first gain a substantial understanding of the existing program through interacting with the code and program documentation, if it exists. The software maintainer must comprehend a given program sufficiently well to plan, design, and implement modifications (to extend, adapt or correct an artifact, without undue harm to its original integrity or structure). Though there may or may not be adequate supporting documentation, the programmer is bound inextricably to the code to determine how the current program performs its tasks. It would appear from this that ones ability to "make sense" of this legacy artifact is an economic issue for the industry, making it a formidable teaching/training issue for those that encounter the novice practitioner. The problems of program understanding to the maintenance programmer are magnified by unstructured or poorly structured legacy code. These problems are being tackled by researchers in reverse engineering (Müller94, Storey96) who are attempting to build tools to help in the comprehension activity.
Program comprehension is an important activity for other industrial practices too. There are numerous activities during development where the individual is required to take a foreign artifact and gain enough from reading it to accomplish tasks. Reviews, walk-throughs and inspections require individuals not acquainted with particular artifacts to become sufficiently versed in their operations to identify problems and defects, or to simply conduct an intelligent conversation regarding its operation. The reader attempts to understand the product sufficiently to identify defects. Though usually considered an effective mechanism for defect removal, Rifkin and Deimel (Rifkin94) hypothesized that these techniques were sometimes unsuccessful due to the inability of the reviewers to deal with the work product. Interestingly, organizations understand the importance of thorough training in the aspects of inspections, but do not provide any guidance looking for defects or program understanding. What they were able to demonstrate was that reading and understanding is "teachable/learnable" and that training programs to help individuals learn these skills is beneficial in the inspection/review task.
Aside from the importance of comprehension for industrial practice, improvement in code reading should support learning in computer science and software engineering classes. Students are typically given coded examples illustrative of a particular point. The effectiveness of the example is dependent on the ease with which the appropriate information can be extracted by the learner. A learner without strategies may not gain what was expected from the example or may do so only through the expenditure of undue effort. By improving the ability of students to interact with code-based material, instructional practices in these courses may be improved.
Program comprehension through reading appears to be a essential skill for software developers. As an essential skill, it seems reasonable to devote more explicit attention to helping students develop that skill during their studies. The basis of skilled reading behavior would come from the fact that students are engaged early and often in guided, deliberate instructional activities requiring the acquisition of information from code or code fragments. For our purpose, reading skills refers to those activities one engages in to make "sense" of software related products (e.g., code, specifications, designs, test cases). The activity is guided and deliberate in that principles from the research on program comprehension, software maintenance, and reverse engineering are integrated into the learning activities.
Programs are not read like novels. The flow of control in a program is dependent on the nature of the input and the structure of the program. Thus, programs have more of a hypertext flavor for the reader. As the reader encounters an item the decision must be made whether to transfer attention to that region of the program or to continue reading the current passage. Yet, like a novel, the programmer must uncover the plot(s) involved in the program and the roles various construct play in the plot.
Contrary to our intuition, the process of program understanding is a constructive process. The reader builds a model of how the program works through interacting with the artifact. A knowledge of the program is acquired from the code, comments, and associated documentation. The reader's existing knowledge structures are particularly important in providing specific information for the comprehension task. What particular knowledge is required to completely understand programs is still open. Brooks suggests that comprehension requires a problem domain model, a model for the problem, generic design templates, and programming language specific characteristics. Letovsky suggests a reader has a complete understanding of a program when she understands the goals of the program, how the goals and subgoals are achieved through program characteristics, and the role of the various program components in achieving those goals. The reader has a great effect on how successfully a particular program is read. Comprehension models suggest that the reader's knowledge of programming, of the programming language, and of the application domain-as well as his reading strategy-are important variables.
One begins by gaining an understanding of the overall goals or purpose of the program. Each component of the program is then viewed from the perspective of how it relates to conduct of that purpose.
The strategy employed in this activity is hypothesis formation and evaluation (Brooks83). For each component (procedure, function, loop, fragment) the reader conjectures, typically a mental act, its role, then attempts to confirm or reject that hypothesis through more thorough examination. Rejection forces the reformulation of a hypothesis, and a continuation of the process. This continues until the reader is satisfied of her understanding of the program.
A bottom-up strategy requires progress in the opposite direction. The reader begins with the details, understanding small fragments. These small fragments are composed into large aggregates whose purpose is constructed from its composite parts. This process continues until the program function is discovered. The bottom-up approach is sometime referred to as stepwise abstraction (Linger79) . During code reading, the reader looks at critical subroutines in the program and determines their function. Once the function is determined then the function, as a behavior, can be used to describe that block of code (abstraction). The reader works through the program hierarchy in this manner assembling abstractions to describe higher level components until the function of the program is determined. This is a bottom-up strategy requiring the understanding of code, and requiring the reader to map the code to suggested problem domain activity.
The literature (vonMayrhauser95) indicates programmers use some combination of the two strategies. Furthermore, the approach to reading/comprehension seems to be influenced by the nature of the task required of the individual (e.g., review, maintenance, testing). This raises important concerns in light of the variety of places code reading is useful, and may suggest the need for increased attention across the task areas. Furthermore, it clearly indicates the need for increased attention to the nature of the code reading activity in the educational setting.
Researchers (Deimel90) at the Software Engineering Institute, Carnegie Mellon University, offer a general set of guidelines as a starting point: