An assessment of reliability and performance quality analyses for direct behavioral observation



Journal Title

Journal ISSN

Volume Title



Analyses of the performance of the data gathering system in a continuous time-sampling, direct behavioral observation method were performed in order (a) to determine if high levels of interobserver-intercoder performance could be maintained under routine data-gathering conditions, (b) to document the characteristics of the observation system so newer, more efficient versions could be tested, and (c) to develop a means of analysis that could be used for performance feedback to the observers and coders. The chosen method of analysis was through paired-comparison assessment of independently processed observational records. An average of ten percent of all observations scheduled across seven patients were randomly selected for performance evaluation. A narrative and a form version of the observational methodology were analyzed, using three different modes of comparison. For all paired comparisions, two observers simultaneously, but independently, observed the target patient. These records were independently coded by three different coders, one observer's version being duplicated to provide a pure intercoder comparison. The agreement analyses were based upon a second-by-second structured comparison of time according to recorded behavior. Agreement rates were calculated both for the entire observation and for activity-related behaviors. Analysis was also performed on the frequency distributions of behaviors over time to determine if there were any differential rates of agreement or disagreement with respect to behavior code. A series of measures were developed from these analyses to describe observer-coder team performance in terms related to actual observer and coder performance. An evaluation method for comparing performance with a predicted agreement rate based on past performance was developed, as were proportional measures that gave meaning to overall and non-idle agreement rates. Observer-coder team disagreements were also classed by the type of difference that was recorded. It was found from these performance analyses that this particular observational method could be used with high rates of agreement (M=93%) over seven patients and two versions of the observational method. The narrative and the form versions of observing were shown to be variations of the same method. There were few differences in the rates of agreement or disagreement according to behavior code, and they were shown to be due to differences in the way time was treated in the form and narrative versions. This study demonstrates that a series of related measures, presented as a constellation of performance rates, provides a much clearer picture of observer-coder team performance than does a single-number index of reliability. The developed constellation of measures was shown to have value as performance feedback for observers and coders and allows for a detailed evaluation of the actual performance capabilities of the observational method.