School performance feedback systems: Design and implementation issues
Goedele Verhaeghe
Promotor: Prof. Dr. Martin Valcke
Proefschrift ingediend tot het behalen van de academische graad van Doctor in de Pedagogische Wetenschappen
2011
This Ph.D. research project has been funded by: The agency for Innovation by Science and Technology (IWT) GRANT NUMBER SBO 50194 (SCHOOL FEEDBACK PROJECT)
VOORWOORD Those that can, do research. Those that cannot, teach. Those that cannot teach, teach teachers. Those that cannot teach teachers, do educational research. (Anoniem, Bron: www.vob-ond.be) Dit citaat, dat ik ooit goedbedoeld doorgestuurd kreeg van een natuurwetenschapper, zal bij menige collega’s wenkbrauwen doen fronsen. Ik kan niet ontkennen dat ik gedurende mijn doctoraatstraject aan deze uitspraak getwijfeld heb. Wat is nu juist onderzoek voeren? Voor mij dekte dit vele ladingen. Dat ik uit deze zoektocht zelf veel uit opgestoken heb, dat is zeker. Daarnaast hoop ik eveneens te hebben bijgedragen aan de onderzoeksliteratuur. Een dankwoordje is hier dan ook aangewezen voor de kansen die ik kreeg in dit leerproces. Hierbij dank ik in de eerste plaats mijn promotor Prof. Dr. Martin Valcke en de projectcoördinator en mijn bureaugenoot Dr. Jean Pierre Verhaeghe voor de nodige ondersteuning en waardering. Eveneens dank ik de faculteit, de universiteit en het IWT voor de kansen die ze aan jonge mensen bieden. Een volgend dankwoordje is er voor iedereen die dit doctoraatsonderzoek praktisch mogelijk heeft gemaakt. Daarbij verdient mijn collega uit Antwerpen, Prof. Dr. Jan Vanhoof, meer dan een eervolle vermelding. Daarnaast zijn er natuurlijk de studenten en de schoolleiders bij wie ik mijn gegevens verkregen heb. Zonder hun medewerking valt er niets te onderzoeken. De leden van mijn begeleidingscommissie (Prof. Dr. Peter Van Petegem, Prof. Dr. Patrick Onghena, Jean Pierre en Martin) en de beoordelaars van tijdschriften wil ik danken omdat zij mijn werk op een hoger niveau tilden met hun constructieve opmerkingen. Voorts wil ik al mijn collega’s bedanken voor de aangename sfeer op de vakgroep. Voor iedere gemoedstoestand was er wel een luisterend oor. Hoewel iedereen het wel druk had met zijn/haar eigen bezigheden, was er steeds ruimte voor een aangenaam en interessant gesprek. Tenslotte zou ik dit proefschrift willen opdragen aan drie belangrijke mannen in mijn leven. Aan jou papa, omdat je ons goed inpeperde: “Als je iets doet, doe het dan goed”. Ik hoop hierin geslaagd te zijn. Collin, bedankt voor “alles”, een woord dat meer omvat dan de meeste mensen in hun leven ooit krijgen van iemand. En kleine Remi, ma vraie joie de vivre, als er iemand een glimlach kan toveren ben jij het wel. Goedele Gent, december 2010
TABLE OF CONTENTS CHAPTER 1: GENERAL INTRODUCTION 1. Introduction: Moving forward by looking backward 2. Conceptual framework for School Performance Feedback Systems 3. Research context: Each school its own mirror 4. Problem statement 5. Dissertation overview: Purpose, research questions and research design References
19
CHAPTER 2: CHARACTERISTICS OF SCHOOL PERFORMANCE FEEDBACK SYSTEMS Abstract 1. Introduction 2. Conceptual framework 3. Method 4. Results: Application of the framework 5. Discussion 6. Conclusion References
24 25 26 27 31 35 45 50 51
CHAPTER 3: PERCEPTIONS OF PRIMARY SCHOOL PRINCIPALS ABOUT SCHOOL
55
1 2 3 14 15 16
PERFORMANCE FEEDBACK USE
Abstract 1. Introduction 2. Theoretical framework 3. Research questions 4. Research context 5. Research design 6. Findings and discussion 7. Implications, limitations and conclusion References
56 57 57 61 62 63 67 79 83
CHAPTER 4: VALUE-ADDED RESULTS OF SCHOOLS: HOW TO REPRESENT SCHOOL
87
FEEDBACK INFORMATION
Abstract 1. Introduction 2. Method 3. Results and discussion 4. General discussion and conclusion
88 89 96 103 107
References
111
CHAPTER 5: THE INFLUENCE OF COMPETENCES AND SUPPORT ON SCHOOL 114 PERFORMANCE FEEDBACK USE
Abstract 115 1. Introduction and research questions 116 2. Theoretical framework 117 3. Methodology: research design, procedure and research 120 instruments 4. Results 123 5. Conclusion and discussion 127 References 130 CHAPTER 6: EFFECTEN VAN ONDERSTEUNING BIJ SCHOOLFEEDBACKGEBRUIK Abstract Samenvatting 1. Probleemstelling 2. Conceptueel kader 3. Methode 4. Resultaten 5. Discussie en conclusie Literatuur
134 135 136 136 137 142 147 156 160
CHAPTER 7: GENERAL DISCUSSION AND CONCLUSION: FEEDBACK ON FEEDBACK 1. Introduction 2. Overview of research objectives and main findings 3. General discussion: “Mirror, mirror on the wall” 4. Limitations of the studies and directions for future research 5. Implications of the results 6. Final conclusion References
164 165 165 174 177 182 186 187
NEDERLANDSTALIGE SAMENVATTING [SUMMARY IN DUTCH] 1. Inleiding 2. Conceptueel kader 3. Het Schoolfeedbackproject: Een spiegel voor elke school 4. Onderzoeksdoelstellingen en -opzet 5. Voornaamste bevindingen 6. Conclusie Literatuur
191 192 193 194 195 196 200 202
RESEARCH VALORISATION: PUBLICATIONS
205
CHAPTER 1 GENERAL INTRODUCTION
1
Chapter 1
CHAPTER 1: GENERAL INTRODUCTION∗ 1. Introduction: Moving forward by looking backward “There was a time in education when decisions were based on the best judgements of the people in authority. It was assumed that school and district leaders, as professionals in the field, had both the responsibility and the right to make decisions about students, schools and even about education more broadly. They did so using a combination of intimate and privileged knowledge of the context, political savvy, experience and logical analysis. Data played almost no part in decisions. Instead, leaders relied on their tacit knowledge to formulate and execute plans. In the past several decades, a great deal has changed. Accountability has become the watchword of education and data hold a central place in the current wave of large-scale reform. At the same time, school leaders find themselves faced with challenges that are ill structured with more than a single, right answer that demand reflective judgements (King & Kitchener, 1994); judgements that require them to have knowledge and understanding in relationship to context and evidence. School leaders are caught in the nexus of accountability and improvement, trying to make sense of the role that data can and should play in school leadership.” (Earl & Fullan, 2003, p 383) In recent years, the trend of decentralizing educational systems has spurred researchers to focus on school-based management and internal evaluation. Because schools are granted autonomy, governmental bodies expect them to be accountable for continuously monitoring their internal quality policy and improving their functioning (Hofman, Dijkstra, & Hofman, 2009; Leithwood, Aitken, & Jantzi, 2006; Nevo, 2002). As public institutions, schools are required to inform about the resources invested. Besides this external drive, schools as learning organizations are supposed to ∗
Based on: Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and School Improvement, 21(2), 167-188. Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in press).The influence of competences and support on school performance feedback use. Educational Studies. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Effecten van ondersteuning bij schoolfeedbackgebruik. Manuscript submitted for publication in Pedagogische Studiën. Verhaeghe, G., Verhaeghe, J. P., & Valcke, M. (2010). Value-added results of schools: How to represent school feedback information. Manuscript submitted for publication in The Journal of Educational Research.
2
Chapter 1
systematically gather data on their school functioning for self-evaluation purposes. The idea behind is that schools need to adapt to and interact with its constantly changing environment as a continuous developing organism (Earl & Fullan, 2003). “Moving forward by looking backward” is characterizing this cyclic process. In this context, the current and past performance level of a school serves as a starting point for developing future plans and educational targets. Related buzz words of the current educational jargon are data-driven decision making, school accountability, school improvement and value added. Many of these terms are deduced from managerial literature, which stresses the function of schools as professional organizations. In order to make proper decisions, schools need to get informed about their functioning. Besides experiences, intuitions and impressions of school staff, several data sources are embedded in the school’s self-evaluation process. Not only own class tests, school questionnaires, class lists, inspection reports and the like are used, but also school performance feedback provided by specific systems. These so-called school performance feedback systems are specifically designed for providing schools with confidential information on their functioning. They follow the trend of datadriven schools improvement by fulfilling the need of schools of accessible information-rich environments. Several local initiatives have been developed and implemented worldwide. However, little is known yet on the impact of these systems on the schools’ functioning and performance (Coe & Visscher, 2002; Schildkamp, 2007; Schildkamp, Visscher, & Luyten, 2009; Visscher & Coe, 2003). Therefore, the impact of these feedback interventions is an interesting niche to examine in educational research. Not only it is worthwhile to consider possible school improvement effects, but also the intended, unintended, desired and undesired outcomes. Furthermore, before looking at the final outcomes, a closer look is warranted on the process of feedback use, including the influencing (f)actors.
2. Conceptual framework for School Performance Feedback Systems 2.1. Data-driven school improvement Data driven decision making Data-driven/-based decision making or data-driven school improvement can be defined as “systematically analyzing existing data sources within the 3
Chapter 1
school, applying outcomes of analyses to innovate teaching, curricula, and school performance, and, implementing (e.g. genuine improvement actions) and evaluating these innovations” (Schildkamp & Kuiper, 2010, p 482). Gathering data in order to continuously improving school actions is the central goal of data-driven school improvement. This continuous movement is characterized by a cyclic process; illustrated by several models described in educational management literature. A generic model is the Shewhart or Deming cycle (Deming, 1986), often applied in educational contexts. The first P refers to the planning phase, followed by Doing, Checking and Acting. Specific for data-driven decision making, these elements recur in several data-use models both in practical and research literature (Abbott, 2008; Learning Point Associates, 2004; Verhaeghe, Vanhoof, Valcke, & Van Petegem., 2010; Mandinach, Honey, Light, & Brunner, 2008; Zupanc, Urank, & Bren, 2009). Thereby, data are used to inform on the functioning of a school, to set goals and make sound decisions for improvement, and to evaluate the outcomes of these improvement actions. School accountability and improvement Most literature on data use results from studies from the United States; thus from contexts in which school accountability traditionally has been stressed (e.g., Teddlie, Kochan, & Taylor, 2002). Recent studies are often situated within an educational context in which setting high standards and establishing measurable goals is believed to improve individual outcomes in education, as illustrated by the No Child left Behind Act (e.g., Schildkamp & Teddlie, 2008). Therefore, most of these studies report on assessment data only. However, in recent years, more studies on data-driven decision making for school improvement have been published (e.g., Verhaeghe et al., 2010; Visscher & Coe, 2003; Zupanc, Urank, & Bren, 2009). Within these studies, data-based decision making is conceptualized in a broader sense by not merely focusing on improving student outcomes and assessment data. Several data sources are integrated to base decision on, such as selfevaluation data, results of school and student surveys, school inspection data, etc. (Schildkamp & Kuiper, 2010). Systems providing data with the purpose of school accountability are referred to as official accountability systems (Tymms, 1999). In a context of which schools are held accountable for publicly funded activities, external agencies generate data on the schools’ functioning to inform diverse stakeholders on the return on investments. As opposed to these data systems, professional monitoring systems generate data for voluntary and
4
Chapter 1
internal use by schools (Tymms, 1999). Therefore, these monitoring systems are more in accordance with data-driven school improvement. Both motives of data use appear opposite at first sight. However, several studies illustrate the complementary and interacting position of school accountability and improvement (Earl & Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009; Maier, 2010; Vanhoof & Van Petegem, 2007; Zupanc, Urank, & Bren, 2009). For example, assessment data can be used in public rankings for accountability, while these data are also used for internal use within the school after secondary analyses have been performed on the data (e.g., adjustment for pupil background characteristics, calculation of valueadded). The resulting improvement actions are then considered to contribute to better pupil performances, which will be measured in the following assessment. This can be considered as internal evaluation in the service of external evaluation (Vanhoof & Van Petegem, 2007). On the opposite, systems especially designed for providing schools with confidential information can be interesting for school inspectorates to get insight in the school’s functioning. If these inspectors act as critical friends, supporting the school as learning organization, this external evaluation functions in the service of internal evaluation (Vanhoof & Van Petegem, 2007). 2.2. Performance indicator systems and school performance feedback systems Performance indicators To collect data on the schools’ performance and functioning, official accountability and professional monitoring systems make use of performance indicators. Goldstein and Spiegelhalter define a performance indicator as ”a summary statistical measurement on an institution or system which is intended to be related to the ‘quality’ of its functioning” (1996, p 385). Following Rowe and Lievesley, these performance indicators serve as “data indices of information by which the functional quality of institutions or systems may be measured and evaluated” (2002, p 1). FitzGibbon & Tymms (2002) emphasize the systematic character of using performance indicators, as they mention that these indicators are collected at regular intervals to monitor a system’s performance. The content of these performance indicators does not only cover output results of schools, but also input, process and context information. These can include indicators on resource provision and funding, participation rates of pupils,
5
Chapter 1
repetition rates, class sizes, factors affecting students’ progress rates, etc. (Rowe & Lievesley, 2002). To successfully serve schools in their data-driven school improvement, these indicators have to meet certain requirements. First, feedback needs to be relevant and useful (Fitz-Gibbon, 1996; Heck, 2006; Rowe, 2004; Rowe & Lievesley, 2002). Relevant feedback corresponds to the actual information needs of the users (Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher, 2002). Furthermore, feedback needs to be accurate, which refers to the reliability and validity of the data gathered (Fitz-Gibbon, 1996; Heck, 2006; Rowe & Lievesley, 2002). Next, the costeffectiveness of the indicator system is an important consideration to take into account (Fitz-Gibbon, 1996; Rowe & Lievesley, 2002). Related to this utility perspective, the performance indicators should be delivered timely, which both refers to both the currency and punctuality of the delivered feedback (Fitz-Gibbon, 1996; Heck, 2006; Rowe & Lievesley, 2002; Visscher, 2002). Furthermore, users need to accept the performance indicators and consider them to be fair. This fairness does not only refer to the striving towards unbiased results (Heck, 2006), but also to the interpretability, reliability, stability and incorruptibility of the reported performance indicators (Fitz-Gibbon, 1996). Lastly, performance indicators should strive towards beneficial effects and should avoid unwarranted harm (FitzGibbon, 1996; Fitz-Gibbon & Tymms, 2002; Goldstein & Myers, 1996). School Performance Feedback Systems (SPFSs) A particular type of performance indicator systems are School Performance Feedback Systems (SPFSs), which “are information systems external to schools that provide them with confidential information on their performance and functioning as a basis for school self-evaluation” (Visscher & Coe, 2002, p xi). SPFSs primarily aim at supporting school improvement and internal quality policy. The different components of this definition require some explanation. • The systemic organization of the feedback initiative: The feedback providers are bound to an organization and produce school performance feedback not as a one-shot activity but on a systematic basis. According to the definition of performance indicators by Fitz-Gibbon & Tymms (2002), data are collected at regular intervals to monitor a system’s performance. • The external component: SPFSs are external systems that offer their services to schools. These services include data gathering, analysis and reporting and sometimes support in using the feedback provided. 6
Chapter 1
• The goal of school improvement: This implies that SPFS developers provide the school with performance feedback on a confidential basis, in contrast with information made public for accountability reasons. By generating data for voluntary use by schools, SPFSs are considered as professional monitoring systems (Tymms, 1999). • The unit level of information: SPFSs offer feedback on the schools’ functioning, which contains school level information. Therefore, aggregation of individual pupil results is required. • The content of the feedback: The content refers to the schools’ performance and functioning. This schools’ functioning encompasses more than merely output results, but also refers to context, input and process related indicators. • The confidential character of the data: The development and discussion of SPFSs knows increased attention as more studies are published on the confidential use of information within schools. As a reaction on the negative consequences of making results public (e.g. measure fixation, misinterpretation; Smith, 1995) educational governments support research initiatives by which low stake testing is promoted. Instead of competition and external drives for school improvement, the inherent need of monitoring quality in school functioning is stimulated by providing confidential information. • The focus on feedback: In accordance to feedback intervention theories, data delivered by SPFSs is aimed to reduce the gap between the intended and actual performance of actors (Black & William, 1998; Hattie & Timperley, 2007). In order that feedback would be effective, some conditions of the tasks performed, the feedback and situation need to be fulfilled (Kluger & DeNisi, 1996). Applied to SPFSs, this implies that outcomes of feedback use will be determined by characteristics of the feedback reports, the educational context and the users. This will be briefly discussed in the next paragraph. The research domain of school performance feedback systems is recent and rather unexplored. Especially studies on the actual use of these feedback systems and their impact on the schools’ functioning are scarce. Therefore, a firm overall theoretical framework for describing and evaluating school performance feedback use and its effects is lacking. 2.3. Factors influencing school performance feedback utilization Differences in the use of school feedback can be attributed to a variety of factors. The most commonly used framework is that of Visscher (2002; 7
Chapter 1
Visscher & Coe, 2003). It has been applied in several studies on data use (e.g., in Maier, 2010; Schildkamp & Teddlie, 2008; Schildkamp & Visscher, 2009; Verhaeghe et al., 2010; Zupanc, Urank & Bren, 2010). This framework discerns four sets of factors influencing the use of the performance feedback, including the design process and features of the underlying SPFSs, the implementation process and the school organizational features. This framework served as a basis for the studies conducted in this dissertation, although some adaptations were made. Visscher and Coe embed the process of feedback use in the broader school environment, which we define as context-related factors. Furthermore, we distinguish support-related factors as a separate set instead of placing it within the implementation process and characteristics of the feedback system. As a result, the following set of influential factors is outlined: Factors related to the educational context, to school and users, to SPFSs, and to support. As this framework will be described further on in this dissertation, we refer to the main components and ideas. The educational context of SPFSs Context-related factors that impact feedback use include the school’s policy strategies at the regional and/or governmental level (Sun, Creemers, & De Jong, 2007; Visscher, 2002). For instance, policies can contain clear expectations that schools make use of feedback information. Educational governments can stimulate feedback use by pressure and/or support (Visscher, 2002). Furthermore, feedback will be used differently depending on the context in which accountability and/or improvement play a role (Earl & Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009; Maier, 2010; Vanhoof & Van Petegem, 2007; Visscher, 2002; Zupanc, Urank, & Bren, 2009). Moreover, data use will depend on the accessibility of data sources. For example in Flanders, no central examination systems are available. This means there is no public reporting on school examination results and almost no high stakes testing, in contrast to the educational context in the UK. Therefore, the data culture and the related data sources in English schools will differ apparently from those in Flemish schools. Also educational inspectorates in their role as quality guard keepers and critical friends may promote the use of data (Vanhoof & Van Petegem, 2007). For example, Flemish schools are encouraged to inform the inspectorate on their functioning by means of output results. Depending on the prescriptions and expectations of these inspections, certain types of data use will be promoted.
8
Chapter 1
Users of SPFSs School- and user-related characteristics are also key variables explaining differences in school feedback use. Schildkamp and Kuiper (2010) mention the style of school leadership, the degree of teacher collaboration, the shared vision, norms and goals for data use, the available time to use data, the provided training for data management and use, the designated data expert in the school, and the pressure and support if using data as important school characteristics having an influence on data use. Furthermore, school performance levels also influence feedback use (Visscher, 2002; Visscher & Coe, 2003). Schools receiving positive feedback (large value added) will discuss the results differently compared to schools receiving a less positive picture (Schildkamp, 2007). In line with control theory, participants receiving negative feedback are more likely to make an effort to reduce the discrepancy between the negative feedback and the expected standards (Kluger & DeNisi, 1996). This will result in different policy implications. However, this theory does not hold in all cases; it is not unusual for school principals to withhold feedback information that does not fit the current policy plan (Van Petegem & Vanhoof, 2004). Considering personal characteristics of the feedback users, we firstly refer to the motivation and attitudes to use an SPFS. Motivation varies from internal quality development or external accountability to policy preparation (Liket, 1992; van Aanholt & Buis, 1990). A negative attitude towards SPF is – according to Bosker, Branderhorst, and Visscher (2007) – one of the main obstacles in the use of feedback information. The attitude is the most significant aspect that determines a person’s willingness to invest time and energy in dealing with information (Williams and Coles 2007) and the users’ belief that they need the data in order to improve education (Schildkamp and Kuiper 2010). Furthermore, previous experiences with feedback use, general experience with school-related data, and the statistical knowledge and skills needed to interpret feedback reports will also influence feedback use. This data literacy “encompasses the strategies, skills and knowledge needed to define information needs, and to locate, evaluate, synthesize, organize, present and/or communicate information as needed” (Williams and Coles 2007, p 188). Whereas most teachers have experience with school test data, pupil monitoring systems, and self-evaluations, in several studies school staff report that they are lacking the skills and confidence when using data for school policy purposes (Earl & Fullan, 2003; Kerr, Marsh, Ikemoio, Darilek, & Barney, 2006; Saunders, 2000; Williams & Coles, 2007). Data literacy is a condition for being able to convert data into valuable and usable information (Earl and 9
Chapter 1
Fullan 2003). The current lack of know-how on making use of the information is an important obstacle (Kerr et al. 2006; Saunders 2000; Van Petegem and Vanhoof 2004; Williams and Coles 2007). Next to a lack of capacities needed to interpret the data, there often is a lack of well developed research skills such as the formulation of research questions and hypotheses (Earl and Fullan 2003; Herman and Gribbons 2001; Kerr et al. 2006). Characteristics of feedback reports and underlying SPFS Not the characteristics of the feedback (system) but the users’ perception of these characteristics mainly determines how feedback will be used (Visscher, 2002). Therefore, we refer to the quality characteristics of performance indicators outlined before (Fitz-Gibbon, 1996; Heck, 2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher, 2002). Consistent with our definition of SPFSs, feedback systems for school improvement should guarantee confidentiality and anonymity to the subjects and schools. At the level of content, feedback should be perceived as relevant, non-threatening, and corresponding to the actual informational needs (Schildkamp & Teddlie, 2008; Van Petegem & Vanhoof, 2007; Visscher, 2002). Information should also be up-to-date, reliable, and valid (Schildkamp & Teddlie, 2008; Visscher, 2002; Visscher & Coe, 2003). In terms of ethical issues, feedback should at least do no harm (Fitz-Gibbon and Tymms, 2002). For example, in some cases feedback can be threatening to the recipients’ self-esteem, particularly in a system of accountability (Visscher & Coe, 2003). Moreover, feedback should not harm subjects or schools on the basis of misleading information (Goldstein & Myers, 1996). Both features of the feedback reports as the underlying feedback system are influencing the outcomes of the feedback usage. No detailed frameworks or descriptions of these different components have been published. Research is lacking on the variety in school performance feedback systems. The publication of Visscher & Coe (2002) is the first overview of some SPFSs worldwide, but no detailed comparative study has been performed. Furthermore, the question stays unanswered why feedback systems have been developed in a certain way. More information on the rationales of feedback designers for opting for certain features is required.
10
Chapter 1
Support in using SPFSs Considering the lack of data literacy skills, school feedback users are requesting for support, not only when interpreting the data, but also for the further steps in data use. As a result, numerous studies stress the importance of providing feedback support (Schildkamp & Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2009; Van Petegem & Vanhoof, 2007; Verhaeghe et al., 2010; Visscher & Coe, 2003; Zupanc, Urank, & Bren, 2009). This support can be provided by school staff within the school but also by externals (e.g., educational support services or feedback suppliers), either organized formally or informally, by one shot or long term interventions, involving school principals or (parts of) the school team. Furthermore, these support initiatives can be organized within or outside the school, what can be considered as onservice and inservice education and training (Gardner, 1995). School staff that are involved in SPFS training are more likely to read the feedback reports and adopt a more positive attitude (Tymms, 1995). However, research on the impact of support initiatives related to the use of SPF is scarce as current support initiatives often lack empirical verification (Zupanc, Urank, & Bren, 2009). 2.4. School performance feedback use: Types, phases and effects Types of school performance feedback use School feedback can be used in several ways, depending on what feedback users aspire to. Rossi, Lipsey, and Freeman (2004) made a classification of types of evaluation use: instrumental, conceptual and symbolic/convincing use. This classification has been applied in studies on SPF use (Schildkamp, Visscher, & Luyten, 2009; Verhaeghe et al., 2010; Visscher & Coe, 2003; Weiss, 1998). An instrumental use of feedback serves as a starting point for immediate policy-making decisions. For example, new reading methods are introduced as the previous method led to disappointing results. A conceptual use of feedback does not result in concrete actions but influences the decision-making process, which indirectly affects action. An example of conceptual use is the altered way of thinking about repeating classes when confronted with remarkably high numbers for the school. Even if feedback does not influence one’s conceptualizations, it can affect the policy-making process in a symbolic way. This means feedback results serve to convince others of existing opinions and to support viewpoints in discussions (Visscher, 2002). Visscher & Coe (2003) added a fourth type of data use: strategic use. Feedback can be used in a strategic way for 11
Chapter 1
accountability purposes, although this is not in line with a school improvement discourse. These four types of feedback use can be considered as intermediate results of feedback use that eventually will contribute to school improvement. For example, a conceptual use results in an altered way of thinking about pupil performances. This intermediate result can in the end lead to effects of feedback use, such as a stronger achievement orientation. In addition, feedback also can be used as a mean to motivate or stimulate school staff to improve (Verhaeghe et al., 2010; Schildkamp & Kuiper, 2010). Finally, a pupil-directed use of data is observed when pupil level data stimulates supporting individual pupils in their learning process (Verhaeghe et al., 2010). Phases in school performance feedback use In the framework of Visscher (2002; Visscher & Coe, 2003), SPFS usage is described only in types of use. In addition, also phases in use could be discerned (Verhaeghe et al., 2010). In analogy with the definition of datadriven decision making of Schildkamp & Kuiper (2010), SPFS use encompasses following stages: Analyzing the data, applying outcomes of these analyses, implementing innovations, and evaluating these innovations. Also the Learning Point Associates (2004) describes data use in certain phases: Analyzing data patterns, generating hypotheses, developing goal-setting guidelines, designing specific strategies, defining evaluation criteria, and making the commitment with school staff to implement and evaluate these actions. Specific for school performance feedback use, following successive stages in feedback use could be discerned (Verhaeghe et al., 2010; Verhaeghe et al., 2010): • Receiving the feedback on school • Reading and discussing • Interpretation • Diagnosis • Planning of improvement actions • Implementation of improvement actions • Evaluation of both the improvement actions and the process of feedback use. Receiving SPF has turned out to be a necessary yet insufficient step as both the schools and the feedback systems have to meet certain requirements in order to actually use this in practice (Verhaeghe et al., 2010; Visscher and Coe 2003). One of the major phases where school staff gets stuck, is the interpretation phase, due to the lack of data literacy competences needed to process the information. Although several studies 12
Chapter 1
report on the fact that school staff often struggle with data interpretation, an examination of existing SPF systems and their related literature reveals that research on user comprehension is scarce (Schildkamp & Teddlie, 2008). Few studies have examined the effectiveness of the various modes of explaining and representing data in school feedback reports. This is problematic considering the fact that SPF reports use complex concepts and graphical representations, whilst SPF users (i.e., school staff) are often not statistically skilled (Earl & Fullan, 2003; Kerr et al., 2006; Saunders, 2000; Williams & Coles, 2007). Effects of school performance feedback use Feedback use should eventually lead to school improvement effects as improved student outcomes, professional development, improved didactical approaches, a stronger achievement orientation of staff etc. (Schildkamp & Teddlie, 2008). This positive feedback impact has been observed in several studies (Hammond &Yeshanew, 2007; Schildkamp & Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2009). However, as a result of the difficulties in data interpretation and use and the limited use of information, current research often reports disappointing results from school feedback use (Coe, 2002; Saunders & Rudd, 1999; Schildkamp, Visscher, & Luyten, 2009; Tymms, 1995; Van Petegem & Vanhoof, 2004; Verhaeghe et al., 2010; Zupanc, Urank, & Bren, 2009). Several studies show that the actual use of school performance feedback is often limited within schools, which may (partly) have been caused by the characteristics of these SPFSs (Earl & Fullan, 2003; Schildkamp & Kuiper, 2010; Schildkamp & Visscher, 2009; Verhaeghe et al., 2010; Coe & Visscher, 2002). In contrast to the intended effects, some literature findings refer to unintended and undesired effects of data use. For example, the (administrative) workload of teachers and principals may increase as a result of using an SPFS (Fitz-Gibbon & Tymms, 2002; Schildkamp & Teddlie, 2008). Moreover, participants may feel threatened by the evaluation, and evaluations may evoke defensiveness (Fitz-Gibbon & Tymms, 2002). Finally, using an SPFS may have a demotivating impact on teachers, especially in poorly performing schools (Van Petegem, Vanhoof, Daems, & Mahieu, 2005).
13
Chapter 1
3. Research context: Each school its own mirror Until now, only a limited number of initiatives to develop data systems have been undertaken in Flanders. The Flemish dislike of central examinations and the resulting lack of systematic data collection on the performance of pupils are in part responsible for this (Van Petegem, et al., 2005). However, schools are required by law to monitor and improve their own quality in a systematic manner. How they do so is a matter for the individual school and is part of the autonomy which schools are granted in Flanders. Deregulation and decentralization are therefore a continuing part of the educational policy implemented in Flanders. Schools are becoming increasingly autonomous and are achieving a greater degree of self-direction. The Flemish government does not impose any formal systematic obligation upon schools to carry out self-evaluation or to compel them to collect output data. Policy with regard to school feedback use is therefore primarily one of encouragement rather than strong pressure. When carrying out inspections, the schools education inspectorate is primarily concerned with schools’ output (in relation to their context, input and process) and this is not without consequences for the way in which schools themselves look at their own functioning in general and their output in particular. Within this context of autonomy and absence of central examination data, several data initiatives have been taken. However, an SPFS accessible to all Flemish schools was nonexistent. Therefore, researchers related to three Flemish universities (Katholieke Universteit Leuven, Ghent University and University of Antwerp) shared their expertise in developing an SPFS for Flemish schools, named the School Feedback Project “Each school its own mirror”1. The main objective of the School Feedback Project is to provide schools with confidential information on their functioning to encourage data-driven school improvement. The feedback project uses data from the SIBO research project (Schoolloopbanen in het BasisOnderwijs [School Trajectories in Primary Education]), which is a longitudinal study that has been set up to investigate the school careers of 6,000 children from a representative sample of Flemish schools, from the time they entered kindergarten until the end of primary education. Data are collected by means of standardized tests, surveys and observational data on child characteristics, family background, class characteristics, classroom 1
This research was supported by the agency for Innovation by Science and Technology (IWT), Grant number SBO 50194 (School Feedback Project). IWT is a Flemish government agency stimulating and supporting innovation by providing financial support to research institutes.
14
Chapter 1
practices, teacher attitudes and subjective theory, and school characteristics (Verachtert, Van Damme, Onghena, & Ghesquiere, 2009; Verhaeghe, Maes, Gombeir, & Peeters, 2002). The tests focus on language learning (orthography, reading fluency, reading comprehension) and mathematics. Item response theory based techniques are used to construct the test scores, enabling to estimate growth curves. The SPF project, as so far, was able to deliver trial versions of school feedback reports to the 1952 primary school principals participating. The resulting trial feedback reports were delivered on yearly basis to the schools. These individualized school reports informed about the performance of their cohort under study. Results were reported for mathematics, reading fluency, and orthography, supplemented with information about pupil characteristics (child factors, home factors, and Dutch language skills at the start of Grade 1). The school-specific results were compared to the Flemish reference group. The central concepts in these reports include learning gain, value added, and adjusted scores and were explained in such a way that no prior statistical knowledge was required. The data were supported with graphical representations (i.e., boxplots, bar graphs, pie graphs, growth curves, and cross tables). The text of each report was standardized. The school principals were required to interpret the results for their school, based on the general information made available. This studies conducted in this dissertation depart from this research and development feedback project in order to contribute both to further development of this SPFS and to scientific research on SPF use.
4. Problem statement Research literature on SPFSs depicts some limitations that require further examination. First, there is a lack of a firm theoretical framework for SPF use. Neither the different components, nor the relations between all variables of the framework developed by Visscher (2002; Visscher & Coe, 2003) can be considered as an overall structure that has been empirically validated. Further examination of influencing factors on school performance feedback use is required. In addition, there is a lack of detailed studies on the use and impact of existing school performance feedback initiatives (Coe & Visscher, 2002; 2
The number of the sample of SiBO schools receiving feedback reports from the School Feedback Project might slightly differ from study to study, due to school fusions or drop out.
15
Chapter 1
Goldstein & Spiegelhalter, 1996; Schildkamp, 2007; Schildkamp, Visscher, & Luyten, 2009; Visscher & Coe, 2003). Evaluation research on the functioning and impact of SPFSs is warranted in order to evaluate the strengths and weaknesses of these types of feedback interventions. Several studies reported on the limited data literacy skills of school staff in relation to data use. However, no detailed studies on SFPS user comprehension have been performed. This research topic would be interesting both from scientific and practical point of view. In consequence of the limited capacity of school staff in interpreting and handling the data, there is a large need for support initiatives. Not only there is a need for setting up more support initiatives, but also the evaluation of current support is warranted as these initiatives often lack empirical verification (Zupanc, Urank, and Bren 2009).
5. Dissertation overview: Purpose, research questions and research design In the following chapters, five studies will be reported and discussed. In the next chapter, we provide a general introduction in characteristics of SPFSs. A framework for characteristics of SPFSs will be applied to five SPFSs worldwide. This descriptive and analytic study illustrates both the wide variety in features but also provides a discussion on the rationales for making choices in feedback design. Following on a framework of SPFS characteristics, Chapter 3 is devoted to a framework for SPFS use. Parts of this framework will be used in further studies described in the successive chapters. Based on the Visscher framework, both influencing factors, SPF use and the resulting effects will be analyzed in the context of the School Feedback Project by examining users’ perceptions. Intrigued by the call for research on feedback interpretability, the fourth chapter focuses on the representation and interpretation of central SPF concepts. Alternatives in representation modes of value added and learning gain have been examined, by integration of literature on graphical data representation. Particular attention will be paid to misconceptions and interpretation difficulties. The Chapters 5 and 6 tackle two crucial variables in SPF use: data literacy competences and support in using SPF. By reporting the results of both a quantitative (Chapter 5) and a qualitative (Chapter 6) study, the outcomes of a field experiment with participants of the School Feedback Project will result in recommendations for effective support in using SPF. 16
Chapter 1
A final chapter will enumerate the key finding from all studies by answering the research questions. A complementary overall discussion and general conclusion will conclude this dissertation.
Figure 1. Chapter overview
An overview of the research objectives, the central research questions, methods, data analysis and participants for each of the five studies is provided in Table 1. Table 1 Dissertation overview Research objectives Chapter numbers RO 1: Exploring the characteristics of SPFSs RO 2: Developing a framework for SPF use, including influencing factors and effects RO 3: Exploring data literacy competences RO 4: Exploring effects of alternative data representation modes on feedback interpretation abilities RO 5: Exploring effects of support on SPF use
2
3
4
5
6
()
()
()
()
17
Chapter 1 Research questions RQ1: What variety in SPFS characteristics can be observed? RQ2: What are the rationales behind choosing for certain SPFS characteristics? RQ3: What phases can be observed in practice when schools use SPF? RQ4: What is/are the result(s) of using SPF? RQ5: How can differences be explained in the interpretation and use of SPF in different school contexts? RQ7: What’s the differential impact of alternative explanations and representations of valueadded on the conceptual and procedural understanding of non-statically skilled? RQ8: To what extent are variations in SPF use influenced by data literacy competences? RQ9: To what extent does specific SPF support has an impact on the development of SPF competences, actual SPF use and resulting SPF effects? RQ9.1: To what extent does INSET and ONSET for SPF use have and impact on the level of satisfaction of SPF users? RQ9.2: To what extent does INSET and ONSET for SPF use have and impact on the data literacy competences of SPF users? RQ9.3: To what extent does INSET and ONSET for SPF use have and impact on the use of this feedback within the school? RQ9.4: To what extent does INSET and ONSET for SPF use have and impact on the school improvement effects of SPF use? Methods Survey research In-depth interviews Experiment
18
2
3
4
5
6
()
()
()
()
2
3
4
5
6
Chapter 1 Data analysis 2 3 4 5 6 Qualitative analysis IRT-techniques Path modeling Analysis of covariance Participants 2 3 4 5 6 School principals Students Feedback providers Note: = main goal of study; () = side goal of study; SPF = school performance feedback
References Abbott, D.V. (2008). A functionality framework for educational organizations: Achieving accountability at scale. In E. B. Mandinach & M. Honey (Eds.), Data-driven school improvement: Linking data and learning (pp. 257-276). New York: Teachers College Press. Black, P. & William, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-75. Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the utilisation of management information systems in secondary schools. School Effectiveness and School Improvement, 18(4), 451-467. Coe, R. & Visscher, A.J. (2002). Drawing up the balance sheet for school performance feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 221-254). Lisse, The Netherlands: Swets & Zeitlinger. Coe, R. (2002). Evidence on the role and impact of performance feedback in schools. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger. Deming, W.E. (1986). Out of the crisis. Cambridge: Massachusetts Institute of Technology,Center for Advanced Engineering Study. Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge Journal of Education, 33(3), 383-394. Fitz-Gibbon, C.T. (1996). Monitoring education: Indicators, quality and effectiveness. London: Cassell. Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in indicator systems: Doing things right and doing wrong things. Education Policy Analysis Archives, 10(6), 1-28. Retrieved from http://epaa.asu.edu/ojs/article/viewFile/285/411 19
Chapter 1
Gardner, R. (1995). Onservice Teacher Education. In L. W. Anderson (Ed.), International Encyclopedia of Teaching and Teacher Education (pp. 628632). London: Pergamon Press. Goldstein, H. & Myers, K. (1996). Freedom of information: Towards a code of ethics for performance indicators. Research Intelligence, 57, 12-16. Goldstein, H. & Spiegelhalter, D.J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society: Series A: Statistics in Society, 159(3), 385-443. Hammond, P., & Yeshanew, T. (2007). The impact of feedback on school performance. Educational Studies, 33(2), 99-113. Hattie, J. & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81-112. Heck, R. (2006). Assessing school achievement progress: Comparing alternative approaches. Educational Administration Quarterly, 42(5), 667-699. Herman, J., & Gribbons, B. (2001). Lessons learned in using data to support school inquiry and continuous improvement: Final report to the Stuart Foundation. Los Angeles: University of Carolina, Center for the Study of Evaluation. Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School selfevaluation and student achievement. School Effectiveness and School Improvement, 20(1), 47-68. Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H., & Barney, H. (2006). Strategies to promote data use for instructional improvement: Actions, outcomes, and lessons from three urban districts. American Journal of Education, 112, 496-520. King, P. & Kitchener, K. (1994) Developing Reflective Judgement: understanding and promoting intellectual growth and critical thinking in adolescents and adults. San Francisco, CA: Jossey-Bass. Kluger, A.N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254–284. Learning Point Associates. (2004). Guide to using data in school improvement efforts: A compilation of knowledge from data retreats and data use at learning point associates. Retrieved from http://www.learningpt.org/pdfs/datause/guidebook.pdf Leithwood, K., Aitken, R., & Jantzi, D. (2006). Making schools smarter: Leading with evidence. (3rd. ed.) Tousand Oaks, CA: Corwin Press. Liket, T.M.E. (1992). Vrijheid & rekenschap: Zelfevaluatie en externe evaluatie in het voortgezet onderwijs [Freedom and accountability: Self 20
Chapter 1
evaluation and external evaluation in secondary education]. Amsterdam: Meulenhoff Educatief. Maier, U. (2010). Accountability policies and teachers' acceptance and usage of school performance feedback - a comparative study. School Effectiveness and School Improvement, 21(2), 145-165. Mandinach, E.B., Honey, M., Light, D., & Brunner, C. (2008). A conceptual framework for data-driven decision making. In E. B. Mandinach & M. Honey (Eds.), Data-driven school improvement: Linking data and learning (pp. 13-31). New York: Teachers College Press. Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 3–16). Oxford, UK: Elsevier Science. Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A systematic approach. Thousand Oaks: Sage. Rowe, K. & Lievesley, D. (2002). Constructing and using educational performance indicators. Paper presented at the 2002 Asia-Pacific Educational Research Association, Melbourne, Australia. Rowe, K. (2004). Analysing and reporting performance indicator data: 'Caress' the data and user beware! Paper presented at the 2004 Public Sector Performance and Reporting Conference, Sydney, Australia. Saunders, L. (2000). Understanding schools’ use of ‘value added’ data: The Psychology and sociology of numbers. Research Paper in Education, 15(3), 241-258. Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’ data: A science in the service of an art? Paper presented at the British Educational Research Association Conference, Brighton, University of Sussex. Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems in the USA and in the Netherlands: A comparison. Educational Research and Evaluation, 14(3), 255-282. Schildkamp, K. (2007). The utilisation of a self-evaluation instrument for primary education. Unpublished doctoral dissertation, University of Twente, Enschede, The Netherlands. Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform: Which data, what purposes, and promoting and hindering factors. Teaching and Teacher Education, 26(3), 482-496. Schildkamp, K., & Visscher, A. (2009). Factors influencing the utilisation of a school self-evaluation instrument. Studies in Educational Evaluation, 35(4), 150-159.
21
Chapter 1
Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school self-evaluation instrument. School Effectiveness and School Improvement, 20(1), 69-88. Smith, P. (1995). On the unintended consequences of publishing performance data in the public sector. International Journal of Public Administration, 18(2&3), 277-310. Sun, H., Creemers, B.P.M., & De Jong, R. (2007). Contextual factors and effective school improvement. School Effectiveness and School Improvement, 18(1), 93–122. Teddlie, C., Kochan, S., & Taylor, D. (2002). The ABC+ model for school diagnosis, feedback, and improvement. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 75-114). Lisse, The Netherlands: Swets & Zeitlinger. Tymms, P. (1999). Baseline assessment and monitoring in primary schools. Fulton Publishers: London. van Aanholt, T., & Buis, T. (1990). De school onder de loep [The school under scrutiny]. Culemborg, The Netherlands: Educaboek. Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatieindicatoren als strategisch instrument voor schoolontwikkeling [Feedback on school performance indicators as strategic instrument for school improvement]. Pedagogische Studiën, 81, 338–353. Van Petegem, P., Vanhoof, J., Daems, F., & Mahieu, P (2005). Publishing information on individual schools. Educational Research and Evaluation, 11(1), 45-60. Vanhoof, J. & Van Petegem, P. (2007). Matching internal and external evaluation in an era of accountability and school development: Lessons from a Flemish perspective. Studies in Educational Evaluation, 33(2), 101-119. Verachtert, P., Van Damme, J., Onghena, P., & Ghesquiere, P. (2009). A seasonal perspective on school effectiveness: Evidence from a Flemish longitudinal study in kindergarten and first grade. School Effectiveness and School Improvement, 20(2), 215-233. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and School Improvement, 21(2), 167-188. Verhaeghe, G., Vanhoof, J., Van Petegem, P., Verhaeghe, J.P., & Van Damme, J. (in press). Het gebruik van outputgegevens in basisscholen: Concretiseringen en illustraties uit het Schoolfeedbackproject [The use of output results in primary schools: Concretizations and illustrations from the School Feedback Project). Kwaliteitszorg in Het Onderwijs.
22
Chapter 1
Verhaeghe, J.P., Maes, F., Gombeir, D., & Peeters, E. (2002). Longitudinaal onderzoek in het basisonderwijs. Steekproeftrekking [A longitudinal study in primary education: Sampling procedure]. Leuven, Belgium: Steunpunt Loopbanen doorheen Onderwijs naar Arbeidsmarkt. Visscher, A.J. (2002). A framework for studying school performance feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 41-71). Lisse, The Netherlands: Swets & Zeitlinger. Visscher, A.J., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321-349. Visscher, A. J., & Coe, R. (Eds.). (2002). School improvement through performance feedback. Lisse, The Netherlands: Swets & Zeitlinger Weiss, C.H. (1998). Have we learned anything new about the use of evaluation? American Journal of Evaluation, 19(1), 21-33. Williams, D., & Coles, L. (2007). Teachers’ approaches to finding and using research evidence: An information literacy perspective. Educational Research, 49(2), 185-206. Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for effectiveness and improvement in classrooms and schools in upper secondary education in Slovenia: Assessment of/for Learning Analytic Tool. School Effectiveness and School Improvement, 20(1), 89-122.
23
CHAPTER 2 CHARACTERISTICS OF SCHOOL PERFORMANCE FEEDBACK SYSTEMS
24
Chapter 2
CHAPTER 2: CHARACTERISTICS OF SCHOOL PERFORMANCE FEEDBACK SYSTEMS∗ Abstract As evaluation and data-driven decision making are receiving increased attention in education, more and more School Performance Feedback Systems (SPFSs) are being developed and used worldwide. These systems provide schools with data on their functioning. However, little research is available on the characteristics of the different SPFSs. Therefore, this study reflected on characteristics SPFSs to provide feedback designers and users arguments for making sound choices in selecting certain school performance feedback characteristics. Based on literature on data driven decision making, a framework for identifying SPFS characteristics was developed. Next, this framework was applied to five diverse SPFSs. Interviews and surveys were administered, and documents about the selected SPFSs were collected. By integrating the results of the survey, semi-structured interviews, and SPFS documents, a summary meta-matrix was created. The results illustrate wide variety of the five selected SPFSs, with respect to features related to data gathering and data analysis processes, the content, and the numerical measures and representation modes used. Large variety in complexity and accuracy of the data modeling can be detected. These findings imply that users need to be informed properly on the underlying rationales of SPFSs features and on the limitations and strengths of the performance indicators used. Expanding and adjusting on the preliminary framework into a set of standards SPFS developers and schools can use, might aid to develop efficient instruments for data driven decision making.
∗
Based on Verhaeghe, G., Schildkamp, K., & Luyten, H. (2010). Characteristics of School Performance Feedback Systems. Manuscript submitted for publication in Educational Administration Quarterly.
25
Chapter 2
1. Introduction Schools all over the world have been granted more autonomy. Governmental bodies consider them as learning organizations and hold them accountable for continuously monitoring their internal quality policy and improving their functioning (Hofman, Dijkstra, & Hofman, 2009; Leithwood, Aitken, & Jantzi, 2006). Therefore schools are required to systematically gather data on their school functioning for self-evaluation purposes. Several schools use School Performance Feedback Systems (SPFSs) to gather these data, which “are information systems external to schools that provide them with confidential information on their performance and functioning as a basis for school self-evaluation” (Visscher & Coe, 2002, p xi). SPFSs primarily aim at supporting school improvement and internal quality policy. These feedback initiatives contribute to the creation of information-rich environments which are essential for schools in their data-driven decision making. Although data from SPFSs are only one source of information, they may provide schools with important information on variables associated with school effectiveness, which schools can use to improve their performance in terms of improving teacher instruction and ultimately student achievement (Davies & Rudd, 2001; Visscher & Coe, 2003). However, the empirical findings are not always confirming the expected positive effects of the SPFSs. Several studies show that often the actual use of school performance feedback is limited within schools, which may (partly) have been caused by the characteristics of these SPFSs (Earl & Fullan, 2003; Schildkamp & Kuiper, 2010; Schildkamp & Visscher, 2009; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010; Coe & Visscher, 2002). A wide variety of SPFSs can be discerned, all designed for specific purposes in certain educational contexts. All adopt their own data gathering systems, statistical methods, data representations, etc. However, little is known on the distinct characteristics of these SPFSs or on the rationales behind these features. Little is also known about whether its users are capable of correctly interpreting and analyzing data derived from these systems, which is a crucial condition for data-driven decision making. A debate on characteristics of SPFSs would be a first starting point for reflection for current and future feedback providers and users. Therefore, this study has been set up, focusing on the characteristics of SPFSs, specifically on the data gathering and data analysis processes, the content, and the representation modes of SPFSs. We will examine the variety in these aspects and the underlying rationales of these variations.
26
Chapter 2
2. Conceptual framework 2.1. School performance feedback systems The definition of SPFSs includes several important aspects: • The systemic organization of the feedback initiative: The feedback providers are bound to an organization and produce school performance feedback not as a one-shot activity but on a systematic basis. • The external component: This refers mainly to the data analysis and feedback provision. The data gathering process can be conducted in cooperation with school team members. • The goal of school improvement: This implies that SPFS developers provide the school performance feedback on a confidential basis, in contrast with information made public for accountability reasons. By generating data for voluntary use by schools, SPFSs are considered as professional monitoring systems. They differ from official accountability systems, by which schools are hold accountable as publicly funded institutions (Tymms, 1999). • The unit level of information: School performance feedback goes beyond individual pupil results. At least some indications are provided on the school’s functioning and effectiveness by aggregating data. • The content of the feedback: The content refers to the schools’ performance and functioning. This schools’ functioning encompasses more than merely output results, but also refers to context, input and process related indicators. If one looks at the definition and characteristics of an SPFS, many different systems might be considered as SPFSs, including central examination systems, school inspectorate, national assessment systems, pupil monitoring systems, research projects, school self-evaluation systems and providers of standardized tests (see Table 1). However all systems described in Table 1 can function as SPFSs, they simultaneously might function as an official accountability system. For example, central examination data are often considered by inspectorates and parents as a performance indicator for the school’s functioning. In addition, these data can be transformed into confidential feedback, after having performed secondary analyses on these results (Yang, Goldstein, Rath, & Hill, 1999). Also, reports from inspection visits can serve both purposes of accountability and improvement. This illustrates that the relation between accountability and improvement may have different
27
Chapter 2
configurations (Earl & Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009; Maier, 2010; Vanhoof & Van Petegem, 2007; Zupanc, Urank, & Bren, 2009). Table 1. Different kinds of SPFSs SPFS Genuine SPFSs Central examination systems
School inspectorates
National assessment systems
Pupil monitoring systems
Research projects School selfevaluation systems Standardized tests
Description The core task of these systems is providing schools with confidential information on their functioning. Sometimes, (raw or adjusted) results of central examinations are fed back to schools for school improvement, instead of/ in addition to making the results public. These reports can be considered as school feedback if they serve the purpose of school improvement, instead of/ in addition to accountability. This differs from central examinations as this information is gathered in the first place for educational governments to make a state of the art of a national educational level. However, if schoolspecific results are confidentially fed back to schools, it can be considered as school feedback. These systems are developed to assess individual pupils’/ students’ learning progress in the first place. These results can be used as school feedback, when also aggregated reports are provided for a group of pupils/ students. Participation in research projects can result in a school feedback report, as a return in investment. These are systems developed only with the purpose to provide schools with confidential information on their performance and functioning. Some (psychometric) standardized tests, taken from individual pupils/ students, can result in aggregated scores for a class or group and thus can be considered as school feedback.
2.2. Performance indicators SPFSs gather information on the schools’ performance and functioning, by making use of performance indicators. Following Goldstein and Spiegelhalter “a performance indicator is a summary statistical 28
Chapter 2
measurement on an institution or system which is intended to be related to the ‘quality’ of its functioning” (1996, p 385). Rowe and Lievesley add an evaluative component to this definition: “performance indicators (PIs) are defined as data indices of information by which the functional quality of institutions or systems may be measured and evaluated” (2002, p 1). Applied to the context of schools and internal quality policy, Fitz-Gibbon & Tymms (2002, p 2) define an indicator “as an item of information collected at regular intervals to track the performance of a system”. Hereby, they emphasize the systematic character of the data gathering and analysis, which corresponds to the definition of SPFSs by Visscher and Coe (2002). School performance indicators do not only report about the output aspect of school quality, such as pupil achievement results, but also on the context, input and process of the school’s functioning. These can include indicators on resource provision and funding, participation rates of pupils, repetition rates, class sizes, factors affecting students’ progress rates, etc. (Rowe & Lievesley, 2002). To successfully serve schools in their internal quality policy, these indicators have to meet certain requirements (Fitz-Gibbon, 1996; Heck, 2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher, 2002). First, feedback needs to be relevant and useful, which means it corresponds to the actual information needs of the users. Furthermore, feedback needs to be accurate, which relates to the reliability and validity of the data gathered. Next, the cost-effectiveness of the indicator system is an important factor to take into consideration. Related to this utility perspective, the performance indicators should be delivered timely, which both concerns the currency and punctuality of the delivered feedback. Furthermore, users need to accept the performance indicators and consider them to be fair. This fairness does not only refer to the striving towards unbiased results, but also to the interpretability, reliability, stability and incorruptibility of the reported performance indicators. Lastly, performance indicators should strive towards beneficial effects and should avoid unwarranted harm (Goldstein & Myers, 1996; Fitz-Gibbon, 1996, FitzGibbon & Tymms, 2002). Although there is a lack of systematic evaluation of effects of SPFSs (Visscher & Coe, 2003), some literature findings refer to unintended effects of data use. For example, the (administrative) workload of teachers and principals may increase as a result of using an SPFS (FitzGibbon & Tymms, 2002; Schildkamp & Teddlie, 2008). Moreover, participants may feel threatened by the evaluation, and evaluations may evoke defensiveness (Fitz-Gibbon & Tymms, 2002). Finally, using an SPFS may have a demotivating impact on teachers, especially in poorly performing schools (Van Petegem, Vanhoof, Daems, & Mahieu, 2005). 29
Chapter 2
2.3. Framework for SPFSs The central aim of this study is to explore the variety in characteristics of SPFSs and to reveal the underlying rationales. School performance feedback systems however cover a wide range of characteristics. In this study, we focus on the following main aspects of school performance feedback (see Table 2): the data gathering process, the data analysis, and the content of the feedback report, with a focus on the numerical measures and graphical representations used. Describing the data gathering and analysis is crucial to get a view on the reliability and validity of the feedback produced. In order to get a view on the relevance of the feedback system, the content of the feedback reports of the selected systems are described. Finally, we focus on the feedback representations used. This includes both the numerical measures and graphical representations used, to get a view on the interpretability of what is fed back to schools. Table 2. A framework for comparing SPFSs SPFS characteristics Data gathering Data administrators ( e.g., school team members, field workers from SPFS ) Medium (e.g., paper pencil, computer) Structuredness of instruments (e.g., completely structured, semi structured, computer adaptive) Types of instruments (e.g., tests, interviews, surveys, observation scales) Data source (e.g., pupils, teachers, parents) Timing (e.g., any time, fixed moments) Place (e.g., classroom, computer lab, playground) Options in test administration (e.g., fixed, flexible or demand driven supply) Data analysis Type of analysis (e.g., quantitative, qualitative) Scaling model (e.g., Classical Test Theory, Item Response Theory) Model used (e.g., regression model, Ordinary Least Squares, multilevel analysis) Type of value added (e.g., prior, concurrent) Levels of unit (e.g., pupil level, year group level, school level, cohort level, subscale level, item level, subject level, aggregate level) Measurement moments (e.g., single measurements, successive measurements, two linked measurements, longitudinal measurements)
30
Chapter 2 Feedback content Variables (e.g., attitudinal, behavioural, cognitive, contextual) Subjects (e.g., language, mathematics, science, world orientation) Non subject specific information (e.g., school culture, pupil background variables, pupil mobility, socio-emotional development, ADHD scale, attitudes to school, dyslexia, study skills) Reference group (e.g., national average, representative sample of population, group of participating schools) Type of reference (e.g., self-referenced, norm referenced, criterion referenced) Reliability indication (e.g., confidence intervals, significant values) Text content (e.g., results, interpretation of results, explanation of statistical concepts and graphical representations, information on how to communicate results) Numerical measures (e.g., raw scores, expected scores, cut off score, gain score, mean score, predicted score, value-added score) Feedback medium (e.g., static reporting, flexible tool) Graphical representations (e.g., bar graph, box plot, histogram, layer graph, line graph, pie graph) Reliability indices (e.g., confidence intervals, significance values)
3. Method 3.1. Instruments In order to create a framework for describing SPFSs a qualitative method has been used. Literature on SPFSs reveals that the framework developed by Visscher (Visscher, 2002; Visscher & Coe, 2003) is the most frequently cited and used (e.g., in Maier, 2010; Schildkamp & Teddlie, 2008; Schildkamp & Visscher, 2009; Verhaeghe et al., 2010; Zupanc, Urank & Bren, 2010). This framework discerns four sets of factors influencing the use of the performance feedback, including the design features of the underlying SPFSs and the characteristics of the feedback report itself. In addition to the framework of Visscher, more concrete features will be used to compare the selected SPFSs. These are chosen based on literature review on performance indicators, data-driven decision making, data use and SPFSs. This resulted in a framework that enables to analyze and describe SPFSs. This framework for describing SPFSs was restructured in the form of a survey. All different options were summed and explained in a MS word file, 31
Chapter 2
including 46 items (11 items with background information to identify the SPFS, 9 items on the data gathering process, 7 items on the data analysis, 14 items on the content of the feedback report and the concepts used, and 5 items on the graphical data representation). Almost all questions were multiple-choice items, besides some open questions. Depending on the items, the respondents were requested to provide complementary explanation. 3.2. Selected SPFSs The five systems described in this study were purposefully selected because of their diversity in feedback characteristics and because of the availability of information on these systems. This selection was not made to strive for representativeness but to illustrate and describe exemplarity. First, each selected SPFS is shortly described: • Assessment Tools for Teaching and learning (asTTle): AsTTle has been developed as part of a government funded research project at the Visible Learning Labs of the University of Auckland in New Zealand. This SPFS offers schools a national assessment model with all characteristics of an SPFS, without the negative consequences of high stakes testing. This feedback production should help to make teachers acquainted with the national curriculum, to enhance future teaching and learning. About 80% of all elementary and high schools of NZ are using the asTTle (year 4-12). Participation is voluntary and free of charge. The feedback is offered both in English and Maori, which have two distinct curricula. Feedback reports are delivered directly and immediately to school team members and pupils/students and parents via a secured online website or via software used on the local network. There are no results made public. A remarkable option of asTTle is the direct feedback delivery to students and parents. The technological applications allow pupils to get access to their results during their school career, over all different years and schools. Summarized, asTTle functions as a professional monitoring system as the purpose is to create a low-stake assessment system to be used internally within the schools. As it provides the function of following individual learning paths, it serves as a pupil monitoring system. However, the main function is the detection of learning needs on an aggregate level. • Performance Indicators in Primary Schools (PIPS): PIPS was developed by The Centre for Evaluation and Monitoring at the Durham University (UK). It is widespread in primary schools (from reception to year 6), in 32
Chapter 2
England and Scotland and to a smaller scale in the other parts of the UK. Furthermore, PIPS has local adaptations of the system, applied worldwide. Within the UK, independent schools show the largest interest in PIPS, as compared to the government funded schools, as they lack monitoring systems and information on national testing because they do not follow the national curriculum. As the access to PIPS is not cost-free, schools have to use their school budgets. All participation is voluntary, although some schools are strongly encouraged to participate by their Local Authorities. PIPS started as a research project, which transferred its services to schools. In some cases, Local Authorities also get direct access to the data of their schools, if they have paid for the assessments. They are not allowed to make these results public and are supposed to use the data for supporting schools. The feedback is delivered via regular mail (to the PIPS coordinator on the school) and via a secured electronic portal. Depending on whether the assessments were computer delivered or paper based, feedback production can take between two days and eight weeks. The main function of PIPS is a pupil monitoring system, besides a research project, SPFS and standardized test. • South African Monitoring system for Primary Schools (SAMP): PIPS served as a basis for the development of this system. It has been evolved to an almost complete distinct SPFS, developed at the Centre for Evaluation and Assessment at the University of Pretoria (SA). Due to resource limitations, feedback is only delivered in the Tshwane Region for the first year of primary education. Furthermore, only the government funded schools are reached as these are the schools with the largest need for accessible assessment systems, in contrast to the wealthier independent schools. Therefore, this SPFS delivers feedback for free (limited to 80 learners per school). Very specific for the development of SAMP is the complicated language context of SA, with has 11 official languages. SAMP is restricted to the three predominant languages of instruction in that region: English, Afrikaans and Sepedi. Therefore, SAMP is a small scale SPFS offering feedback to 22 schools. All of these schools are participating voluntary. The feedback users are in the first place the school team members. They are free to communicate the results with other stakeholders, such as parents, the department of education, etc. Feedback supply via regular mail is not an option as there is no assurance the package will reach its destination in SA. Since many schools lack internet and even computer access, electronic feedback delivery is neither an option. Therefore, feedback is
33
Chapter 2
delivered on the school to the contact person. This happens four days to two weeks after data gathering. • Leerling- en OnderwijsVolgSysteem (LOVS) [Pupil and Educational Monitoring System]: Similar as PIPS, LOVS is in the first place a pupil monitoring system (reception to year 7), besides a research project, SPFS and standardized test in the Netherlands. Furthermore, some local projects (e.g., in Germany, Turkey, Denmark, etc.) make use of the LOVS software. Dissimilar to the other systems in this study, LOVS is also an official accountability system (e.g., it is used by the Dutch Inspectorate) in addition to a professional monitoring system. During inspection visits, schools may be asked for permission to show their results on these tests. Furthermore, the inspectorate sometimes strongly encourages (weak scoring) schools to participate if they do insufficiently use data sources to prove their functioning. This implies that some schools may experience participation as an obligation, whilst in general voluntary assessment is the rule. The wide acceptance of LOVS is indicated by a 95% rate of use of at least one of the tests in all elementary schools in The Netherlands, including special needs education. This feedback is provided by a private company, called CiTO [Central institute for Test Development]. Due to this private character, schools use their budgets for the services offered. As a consequence, they are the only owners of their data. To disseminate their results to externals, schools need the permission of the parents. The way of delivering feedback depends on the tests taken. Some results are sent by regular mail, while other data is provided via an electronic portal, via software on a disk or manually by means of printed scoring tables. Also depending on the test taken and the standardization process (based on previous or current reference groups), feedback delivery takes a second to a few months. • Schoolfeedbackproject (SFP) [School Feedback Project]: The SFP is still in a developmental phase and is thus not commercially available yet. The SFP is a research and development project, initiated by three universities in Flanders (Belgium). As there is no central assessment system, schools lack information on their performance in relation to the national average or to results of schools with similar characteristics. Therefore, a government funded project has been set up for creating a Flemish SPFS. In this study, only the system developed for primary education will be described (year 1-6). From 2011, this will be commercially available, whilst the current sample (representative reference group of 195 schools) is participating for free. Although participation was voluntary in general, some school boards decided for their schools to participate. Results are fed back confidentially to schools 34
Chapter 2
only. In addition, aggregated results are reported to school boards, as part of the research project. School reports are delivered to the feedback coordinator on the school by electronic mail. Due to the developmental phase of the project, feedback generation took several months. From 2011, feedback will be delivered in an automated and quick way as the underlying software engine, feedback formats and reference data are already available. Furthermore, the SFP is developing a secured electronic portal to upload student data and download school feedback. 3.3. Procedure This survey was sent to the directors or coordinators of five selected SPFSs. They were informed about the purpose of this study. Additionally, semistructured in-depth telephone interviews were hold, to elaborate or clarify some of the answers from the survey and to gather information on the rationales for opting for certain SPFS characteristics. The telephone conversations, which took on average 90 minutes, were audio-taped with permission of the interviewees and transcribed afterwards. The integrated results from survey and interview were sent to the interviewees for member checking. Finally, the integrated files of surveys and interviews were summarized in separate files for each feedback system. These files were integrated in a conceptually ordered meta-matrix (Miles & Huberman, 1994) that facilitates a variable-oriented and case-oriented analysis. Furthermore, this meta-matrix serves to give a quick overview of the variety in feedback systems. Parts of this meta-matrix will be illustrated and explained in the results section.
4. Results: Application of the framework 4.1. Data gathering Having a view on the data gathering process is of major importance for evaluating the accuracy of the data on which the feedback is based on. Therefore, the following elements have been included in the framework: the persons gathering the data, types and structuredness of instruments used, data gathering medium, time and place of data collection and the data source. Table 3 gives an overview of the instruments used.
35
Chapter 2 Table 3 Overview of data gathering instruments used in selected SPFSs Completely structured Domain specific tests Survey on attitudes/ socio emotional development General achievement test Observation scale ADHD-scale Pupil background questionnaire Test on study skills Survey on social emotional functioning Test of intelligence Test on interests Semi- structured interviews on strategies in mathematics, writing assessments Pupil background questionnaire Rating scale for evaluation of a technical piece of work Computer adaptive Domain specific tests Screening instrument for Dyslexia Other Automatic upload of pupil background variables from data management system Observation notes of testing: no structured instruments Upload of results from Statutory Assessment Tests
asTTle
PIPS
SAMP
LOVS
SFP
X X
X X
X X
X
X
X X X X
X X X X X X X
X X X
X
X
X
X
(1)
(2)
X X X X
X
Note (1): Computer-delivered version of PIPS for Years 1 – 6; all other tests use stopping rules based on a number of mistakes made, on increasingly difficult items Note (2): depending on the test taken
In almost all cases (asTTle, PIPS, LOVS, SFP) teachers and/or other school team members organized the test administration on the school, following strict testing instructions. Only in case of SAMP, field workers from the SPFS guided the assessment. This choice was made because of the reliability of the data collection and because of not interrupting teachers from their teaching. Furthermore, teachers were not only organizing the test, but 36
Chapter 2
sometimes they were also providing data on the pupil’s functioning. In PIPS and the LOVS for example, they completed observation scales, pupil background questionnaires and/or surveys on the socio-emotional functioning. In asTTle, teachers have an even more active role by composing the test based on predefined parameters and options by using the testing software tool. Furthermore, parents can also be asked to provide information. In the case of the SFP, a parent questionnaire is provided for gathering home and pupil background information. Not only the testing instructions, but also most testing instruments are highly structured. Almost all instruments are completely structured. This means that tests and questionnaires entirely describe and guide the data collection. In some cases, semi-structured instruments are used. For example, SAMP does not require schools to complete structured questionnaires on student background variables, but just lists what information would be favorable to deliver (due to a lack of pupil information and lack of computerized management system). In contrast, asTTle, LOVS and PIPS make use of advanced software options that allow automatic import of pupil level data from the school’s management information systems. These three SPFSs additionally provide computer adaptive testing. This means that test items are presented to pupils accordingly to their ability level. For example, if pupil performs well on an intermediate difficult item, a more difficult question will be presented. Subsequently, if he performs poorly, he will be presented with a simpler item. Testing pupils can be very time consuming, especially in case of younger children. As they do not master reading or writing skills, often a one-on-one oral testing is necessary. In this case, the instructor provides the explanation following the guidelines and the pupil provides the answers (e.g. in PIPS and SAMP). In other cases, a one-on-one testing is required because of the nature of the test (e.g. reading fluency in SFP and LOVS). In other cases, a one-on-one testing is optional, as the testing medium allows both individual and classical testing at any time. This is the case for asTTle, as each pupil owns a personal computer and software adapts standardization of the scores to the moment of testing. More rigid systems with paper-pencil tests, computerized tests in computer labs and/or fixed measurement moments will be more likely to adopt whole classroom testing (PIPS, LOVS, SFP). The place of testing is highly related to infrastructural characteristics. Mostly, tests take place at the classroom if printed booklets are used (asTTle, PIPS, LOVS, SFP), or in the computer lab for computerized versions (PIPS, LOVS). Testing administration of asTTle is flexible because of 37
Chapter 2
technological provisions. Even testing at home is plausible. In case of SAMP, each testing situation is slightly different as has to be sought to an appropriate place in each school (e.g., in the staff room, under a shady tree). 4.2. Data analysis In this section, we will focus on the underlying scaling model used, on the data analysis model used including value-added measures, on the opportunities for longitudinal measurements, on the inclusion of pupil mobility and on the levels of aggregation used. Being informed about the data analysis of SPFSs is a prerequisite for making judgments on the accuracy of the feedback. In all feedback systems, testing data are analyzed quantitatively. We will focus in this section on the variety in these techniques used. First, the underlying scaling models have been examined. Item Response Theory (IRT) is underlying all SPFSs to some degree. This technique estimates several parameters, including the difficulty level of the items and the skill scores of the respondents. By creating one skill scale that relates different tests in a certain domain, IRT offers opportunities for longitudinal measurements or computer adaptive testing. IRT has been applied in the selected SPFSs for defining the item parameters and composing tests (SAMP). AsTTle, PIPS, LOVS and SFP go further and use IRT for defining ability test scores for the respondents for certain test versions. The IRT model that has been used most widely is Rasch (in asTTle, PIPS and SAMP). The techniques used in LOVS depend on the test taken and SFP uses a more complex 2-parameter model. The system taking the most advantage of IRT is asTTle. In combination with the possibilities of the software tools, teachers are enabled to compose tests from an item bank with different degrees of difficulty. Besides IRT, Classical Test Theory (CTT) is applied in all systems. This is not only used for analyzing data from interviews, surveys and/or observation scales (asTTle, PIPS, LOVS), but also for some tests (SAMP, SFP) which require no further analysis than a sum score. Only PIPS and SFP make explicit use of value-added measures. These measures indicate to what extent scores (raw or adjusted) are above or below an “expected” value. The expectations are based on statistical analyses that estimate the impact of independent variables such as cognitive aptitude, prior achievement and socioeconomic background. Value added is reported in PIPS and SFP as a measure of the school’s influence on the pupil’s performance. PIPS makes a distinction in prior and concurrent value added. For estimating the former type, a general 38
Chapter 2
achievement score is taken into account, as an aggregate of subject specific test scores and a developed ability score. Concurrent or contextual value added only includes the developed ability score. In addition, SFP uses both student background variables and prior achievement scores in the estimation of contextual value added. Both systems conflict in this conception as student background are either seen as redundant or necessary variables to be included in the model. Furthermore, the valueadded realizations differ significantly in their level of reporting. While SFP is convinced that value added should only be reported at an aggregate level, PIPS allows pupil level residual analysis. LOVS implicitly applies the notion of value added measures by reporting the difference in growth of the school as compared to the reference group. When focusing on the statistical model underlying the feedback production, a large variety in complexity can be noticed. While some SPFSs strive for complexity to provide a nuanced view on school performance data (as SFP and LOVS), others consciously avoid model complexity in favor of the transparency for feedback users. For example, in the calculation of value added, PIPS applies an ordinary least squares in contrast to the multilevel piecewise growth curve models of SFP. Other systems do not use regression models as they do not intend predicting scores or calculating value added in order to keep the low-stakes character of testing (asTTle) or are still in a development phase with limited capacity for growing complexity (SAMP). The type of statistical models used defines the options for longitudinal measurement. This means that scores for pupils are linked to each other over time. Whether or not learning progress can be measured, depends on the scale used. In case of asTTle, progress is estimated on one underlying IRT ability scale, which links all tests in a certain domain. PIPS uses a scale of standardized scores (either obtained by CTT or IRT) and puts these scores on a time line. SFP in contrast not just places the (rescaled) IRT-scores on a time line but provides both raw scores and scores adjusted for both for the influence of prior achievement and pupil background characteristics using a multilevel piecewise repeated measures model. This gives a different conception of growth and longitudinal measurement. The adjusted scores do not express the actual achievement levels, but the levels that would have been achieved if the pupils would have had the same background characteristics as the reference group. Also for some tests of the LOVS, adjustments have been applied for pupil background characteristics. Another factor delineating opportunities for longitudinality is the number of measurement occasions. AsTTle, PIPS, LOVS and SFP for example offer tests with (at least) three linked measurement moments, while SAMP 39
Chapter 2
only tests pupils at the start and end of the first year of primary education. In all systems its users decide whether or not to participate in single or successive measurements (repeated for different cohorts or not). Finally, it is of major importance to stress the influence of pupil mobility, in particular when this longitudinal data are represented for a cohort. A consequence of pupil mobility is that values are missing for pupils who left a testing sample by changing classes or schools. Therefore, multilevel modeling is preferred because missing data do not prevent from estimating growth curves, as all available pupil scores are taken into account in the estimation procedure. However, when estimating more complex longitudinal models, taking into account pupil mobility requires cross classifications, which may overburden the capacities for statistical analysis. A final aspect to be discussed in this section is the reported aggregation level for respondents and content. With regard to the respondents, all systems opt to report pupil level data, with exception of the SFP, which is designed for evaluating and informing school policy with a focus on aggregated data. The adjusted scores and value-added scores are in their view only valid for aggregated data as this cancels out measurement errors and bias by averaging. Furthermore, the model complexity does not allow generating data on several aggregation levels, due to pupil mobility. The systems that report pupil level data (asTTle, PIPS, SAMP and LOVS) also report data on aggregated levels as classroom level, group level, school level, etc. In these cases, the aggregated scores are easily obtained by averaging the pupil scores for a certain group. Besides the respondent level, also the reported content level is determined by both convictions and methodological considerations. All systems report at (broad) subscale and subject level. Only PIPS reports on the item level (only for reception feedback) as this would have the largest information value to inform planning in the classroom. AsTTle intentionally does not report on the item level as this would lead to teaching to the test. Items are therefore just considered as indicators of subjects. Another restriction for reporting item level scores depends on the objective of the test taken. As SFP developed tests for determining learning gains (which requires avoiding ceiling effects) and not for diagnostics (necessity to determine outliers), it is not opportune to report item scores. Following the SFP, this implies these tests are not suitable to discern detailed subscales either, as these would not meet the psychometric standards of reliability.
40
Chapter 2
4.3. School performance feedback content This paragraph contains a description of the subjects and topics that are reported in the feedback reports, the conceptual representations (performance indicators) and reference groups used, and the sections offered in reporting. Following the quality standards of performance indicators, the feedback content has to be relevant and useful. Furthermore, SPFS users should have to accept the performance indicators and consider them to be fair. Regarding the contents that have been tested in the selected SPFSs, we can refer to Table 3, in which the data gathering instruments are described. This table shows that all systems use domain specific tests. These refer in all cases to language and mathematics tests with different subscales. Some systems broadened their supply with tests for science (PIPS), English as a foreign language (SAMP, LOVS), and/or technique and world orientation (aggregation of geography, history and environmental science in LOVS). The other data instruments from Table 3 report other non-cognitive measures, such as attitudinal, behavioral and contextual contents. Concerning behavioral scales, interesting examples are noticed. PIPS, for example, offers a scale for detecting ADHD and LOVS for Dyslexia, whilst handwriting is tested by asTTle and SAMP. To illustrate attitudinal measurements, there are measures of attitudes related to subjects (asTTle, PIPS, SAMP and LOVS), to the school culture in general (asTTle, PIPS and SAMP) or to socio-emotional development (LOVS). Related to contextual information, informing schools about their pupil mobility is of major importance to get a view on their functioning. Why are pupils leaving? Which newcomers are schools attracting? Which pupils go to special education? Was the school aware of the huge number of pupils with learning lags? These are some of the questions that stimulate reflection at the school level, transcending individual learning pathways. Only the SFP specifically reports on this. Numerical measures A wide range of numerical measures have been reported in the SPFSs in this study. Table 4 gives an overview.
41
Chapter 2 Table 4. Numerical measures used in selected SPFSs Type of scores Adjusted scores Expected scores Predicted scores Raw scores Numerical measures Band score Cut-off score Grade score Learning gain score Mean score Percentage score Percentile score Rescaled score Standardized score Value-added score
asTTle
PIPS
SAMP
X
X X X
X
X
X
X X X X
(1)
LOVS
SFP
X X X X
X X X
X X
X X
(2)
X X X X
X X
X X X X X X X
X X
X X
Note (1): Depending on the tests taken Note(2): SAMP also registers loss scores besides gain scores
Raw scores are fed back in all cases, as well as expected scores, as the latter are reflected by the average for the reference group. Predicted scores, resulting from regression analyses, are used for making predictions for future performance, based on the current pupil achievements (PIPS, LOVS). Adjusted scores are rarer as these require more advanced statistical analysis (LOVS, SFP). All these types of scores are rescaled in meaningful units for the users. For example, scales are created with a mean of 50 and a standard deviation of 15. All these transformations are somehow arbitrary as there are no conventions on which scales, bands or grades are favorable. Mostly, test score have been transformed in relation to the local context. For example, AsTTle and PIPS reformulate scores to grades in accordance to the national curriculum, SAMP rescales to five-point scales teachers are familiar with and LOVS expresses scores conform to preferences of inspection authorities. Feedback reports may contain more information than the mere testing results. The explanation on how to interpret the results is only provided in the feedback reports of PIPS and SFP. Other systems provide this information in the accompanying manual. When it comes to searching for explanations for the results for a specific school, no further help is provided 42
Chapter 2
in any report. However, AsTTle, LOVS and SAMP take considerable initiatives for offering remediation material. AsTTle is the most advanced system by offering supporting material for teachers in accordance to the achieved grade levels per pupil and group. What information can be deduced from the reports also depends on the opportunities for references offered (norm, criterion or self reference). These three forms of reference offer different opportunities for a school to measure its own functioning against. All systems offer a norm to compare the results with. In most cases, this reference is a (representative sample of) the national average. Only SAMP cannot realize this as a small scale local project. Instead, SAMP offers the opportunity to compare with schools with the same language group within the sample. AsTTle and LOVS allow to compare with schools which are similar, based on certain characteristics. These features foster fair comparison, with the same underlying idea as providing adjusted scores (comparing same-to-same); although adjusted scores are estimated with a different calculation procedure. Criterion-based references are less prevalent (asTTle and SAMP) as these imply an absolute instead of relative point of comparison. These references refer in these cases to the cut-off scores used. Opportunities for self reference are offered in all systems by allowing schools to compare results over time, either within cohorts (cf. gain scores, longitudinal measurements), or between cohorts (multiple measurements with different year groups). Representation modes With regard to representation modes used in the feedback reports, we will discuss the medium used to present the results, the graphical representations, and the reliability indices (see Table 5). SPFSs differ in the feedback media used to report the results. These media are related to the flexibility for users in choosing representations or manipulating their feedback output. In AsTTle, PIPS and LOVS, users can select different types of representations by software tools or Excel macros. They may for example select a table to present exact data, and growth curves to show trends. SAMP en SFP are less flexible: these SPFSs provide the user with a printed or digital PDF report of the results. SAMP additionally reports the results in Excel sheets, which users can use to perform secondary analyses on. The graphical representations offered by the different SPFSs differ as well. Some systems only include more simple form of representations, such as bar graphs, cross tables and histograms (SAMP). Others include more
43
Chapter 2
complex graphical representations of the results, such as scatter plots with regression lines (PIPS), line graphs (SFP) and layer graphs (LOVS). The school performance feedback results are based on certain statistical analyses, which include a certain measurement error. To enable users to judge the accuracy and importance of their findings, information on the uncertainty surrounding the results has been incorporated in asTTle, PIPS, LOVS and SFP. These uncertainties can be indicated by adding confidence intervals. All SPFSs studied present confidence intervals in either bar graphs (AsTTle and LOVS) or longitudinal progress charts (PIPS). SFP represents uncertainty by marking significant values in cross tables. SAMP prefers not to present confidence intervals, as this would make the interpretation of the result too complex for the users. Instead of, they warn the users not to over interpret small differences or shifts in scores. Table 5. Overview of representation modes in selected SPFSs asTTle Medium: fixed Printed report Pdf version Medium: flexible tools Online tools Software applications on local network Excel sheet Excel macro’s in sheet Graphical representations Bar graph Box plot Cross table Divided bar graph Grouped bar graph Histogram Layer graph Line graph Multipanel display Pie graph Scatter plot with regression line Side by side graph Other: e.g., schemes, iconic representations 44
PIPS
SAMP
LOVS(1)
X X
X X
X
X X
SFP
X
X X X
X X X X
X X X
X
X
X
X X X
X
X X X X X
X
X X
X X X
X
Chapter 2
Reliability indices Confidence intervals Significance values
asTTle
PIPS
X
X
SAMP
LOVS(1)
SFP
X X
Note (1): Depending on the tests taken
5. Discussion As evaluation and data-driven decision making are receiving increased attention in education, more and more SPFSs are being developed and used worldwide. However, little research is available on the characteristics of the different SPFSs. Studies show that these characteristics may influence the degree to which the feedback is actually used for school improvement (e.g. Schildkamp & Visscher, 2009; Verhaeghe et al., 2010). Therefore, it is important to carefully consider the characteristics of an SPFS, when developing or selecting one to use. Users need to purposefully choose the type of SPFS that corresponds to their information needs. This requires transparency on the characteristics of different SPFSs. Therefore, in this article, we developed an exemplary framework for identifying SPFS characteristics, which usability has been demonstrated by applying it to five different SPFSs. We illustrated variety in the data gathering processes, the type of analyses, and the content of the feedback, including the numerical measures and representation modes used. The goal of this study hereby was not to judge the different SPFSs, but to highlight some issues concerning SPFS characteristics. With regard to data gathering, all SPFSs studied mainly offer completely structured instruments, as cognitive tests, questionnaires on socio emotional development, and diverse types of scales. Additional semistructured instruments as interviews or rating scales are provided as well. All are accompanied by strict prescriptions on how to gather the data. Providing highly structured instructions and instruments is a prerequisite for standardized and reliable data collection (Fitz-Gibbon, 1996; Fitz-Gibbon & Tymms, 2002), especially if data has been gathered by school staff. This data collection by school team members only leads to accurate data in case of low-stake testing (Fitz-Gibbon & Tymms, 2002; Yang et al., 1999; Smith, 1995). As data collection is very time consuming, technological tools may create several advantages. Therefore, initiatives as computer adaptive testing or automatic upload of data from management information systems (as in asTTle, PIPS and LOVS) are facilitating efficient data gathering. These 45
Chapter 2
tools do not only prevent pupils and teachers from overburdening with data collection, but also foster targeted data collection. However, only highly advanced SPFSs adopt these tools. Furthermore, these software tools cannot be applied in all contexts due to the infrastructural limitations. As data-driven decision making is a cyclic process (cf. Plan-Do-Check-Actcycle, Deming, 1986; Verhaeghe et al., 2010), repeated testing for different cohorts and year groups is required. Furthermore, systematic measurement with small time lags will result in the most reliable trend (van de Grift, 2009). In all the SPFSs in this study, the users therefore have the choice for which time intervals to opt. With respect to the content of the feedback, the SPFSs in this study focus rather narrowly on a few cognitive outcomes (e.g. language, mathematics and/or science), which make part of the core curriculum in all countries. Developers of SPFSs might consider how to include other subject areas in the SPFSs, as well as more attitudinal, behavioral and contextual information. If school staff wants to make informed decisions on how to improve their education, they need different types of data (Schildkamp & Kuiper, 2010). AsTTle, PIPS and LOVS have set the first steps, but other types of data may be considered as well, such as data on the functioning of teachers (e.g. teacher and student questionnaires). Moreover, schools already have other types of data available (e.g. school data such as achievement tests from other subjects, inspection reports, parent surveys, class tests). It is important to consider these different types of data and data sources in schools as well, in order to make a comprehensive evaluation of the schools’ functioning (Schildkamp & Kuiper, 2010). A preferable scenario to foster this data triangulation would be the development of integrated management information systems (Bosker, Branderhorst, & Visscher, 2007). In order to obtain an integrated system, more coherence in data conceptualization and representation is required, not only between different data sources, but also between different instruments of the same SPFS. A first step herein can be taken by developers of SPFS in creating more conformity in data analyses and representations. With regard to the data analysis, it is important to find a balance between statistically correct - and thus complicated - analyses and accurate results on the one hand and understandable analyses and user friendly results on the other. For example, the analyses used in PIPS are fairly straightforward and not too complex. Schools can understand the results, and studies show that schools feel ownership over the results (Tymms & Albone, 2002), 46
Chapter 2
which directly influences the degree to which the feedback is actually used (Kyriakides & Campbell, 2004; Schildkamp & Teddlie, 2008). However, because the system does not use multilevel analyses, schools are sometimes wrongly classified as, for example, underperforming. To reduce these misclassifications, some researchers claim that it is necessary to apply multilevel models (Goldstein & Spiegelhalter, 1996; Karsten, Visscher, Dijkstra, & Veenstra., 2010). Following Yang et al. (1999), it is possible to explain these multilevel models and outcomes to head teachers in an understandable way. In contrast, others consider multilevel modeling as inappropriate for feedback purposes and claim that the method of Ordinary Least Squares is accurate enough and more understandable to schools (FitzGibbon, 1996; Fitz-Gibbon & Tymms, 2002; Sharp, 2006). Whatever statistical analysis an SPFS uses, it should inform its users on the associated constraints, as it presents an image of being a fair performance indicator system. Moreover, it is important to realize that any type of measurement always includes some type of error. Statistical estimates always include uncertainty, which needs to be taken into account in any interpretation. This especially holds for small groups, such as classes and cohorts within schools. An SPFS should therefore provide information on limitations and uncertainties, and provide information on the reliability of the estimates (Fitz Gibbon & Tymms, 2002; Mortimore & Sammons, 1994; Rowe, 2004; Goldstein & Myers, 1996; Goldstein & Spiegelhalter, 1996; Yang et al., 1999; Karsten et al., 2010). These reliability indices are for example applied in AsTTle, PIPS, LOVS, and SFP. If school level data is intended to be used for making fair comparisons with reference groups, it is advisable to work with value-added models. Value-added is usually defined as everything the pupil has learned at his/her school (e.g., van de Grift, 2009). However, the concept “valueadded” is not unproblematic (van de Grift, 2009). It is not possible to assess everything a pupil has learned, such as social and creative abilities. Furthermore, because pupils change from schools and classes, different schools and classes have an influence on the pupils’ learning progress. Also it is not clear how this learning progress should be measured. And how to take into account the knowledge and skills acquired outside the school? As a result, several problems have been associated when applying value-added modeling (van de Grift, 2009; Karsten et al., 2010). We discuss some important issues for this study. Firstly, there is the problem of missing values, which may distort the results. In this study, primarily in the SFP, serious attention has been devoted to this issue (Knipprath & Verhaeghe, 2010). Moreover, these 47
Chapter 2
missing data might not just be random, but might be the result of certain interventions (e.g. grade repeating) in schools. Incorporating the impact of these missing values in the estimation procedures would be advisable (Sanders, 2006; van de Grift, 2009; Yang et al., 1999). Next, there is the instability of value-added judgments. Therefore, it is recommended to use data on successive cohorts (at least 3 school years; Van De Grift, 2009), to use longitudinal measurements (Heck, 2006) or to average scores over several years (OECD, 2008). Also cross-sectional data analysis might be used as it might have several advantages as compared to longitudinal testing (Luyten, 2006; Sammons & Luyten, 2009). Thirdly, there are different procedures for computing value-added models, which lead to different rankings of schools (Fitz-Gibbon, 1996; Goldstein & Spiegelhalter, 1996; Heck, 2006; OECD, 2008; Rowe, 2004; Sanders, 2006; van de Grift, 2009; Yang et al., 1999). For example, there is no consensus on the inclusion of student background characteristics in the models used in the SPFSs in this study. As student achievement results are influenced by prior achievement and student background characteristics (such as gender and SES), several researchers stress that corrections for these out-of-school influences might be required (Goldstein & Thomas, 1996; Goldstein & Myers, 1996; Goldstein & Spiegelhalter, 1996; Heck, 2006; Yang et al., 1999; Karsten et al., 2010; Sanders, 2006; Rowe, 2004). Fourthly, a value-added model has only limited predictive validity for certain schools (e.g., for schools with large SES-gaps). SPFSs that use valueadded models should therefore always be careful with categorizing schools as underperforming, and should use labels as durably underperforming or durably outperforming instead of ranking schools (van de Grift, 2009). Goldstein and Thomas (1996) and Yang et al. (1999) also recommend using these kinds of procedures only to identify “institutions at extremes”, as a screening device to detect problems. Furthermore, one should always keep in mind that value-added measures are only relative indicators of school performance that should be interpreted against the reference group (Rowe, 2004; Karsten et al., 2010). Comparing institutions based on statistical models will always require prudency (Goldstein & Spiegelhalter, 1996). Finally, users have difficulties when interpreting value-added data (Karsten et al., 2010; Santelices & Taut, 2009; Vanhoof, Verhaeghe, Verhaeghe, Valcke, & Van Petegem, in press). Users should be supported to gradually acquire expertise in data interpretation by, for example, getting offered more and diverse value-added models ( Schatz, VonSecker, & Alban, 2005). Also other conceptualizations in terms of “school contribution” (Santelices & Taut, 2009) or “residual analysis” (Fitz-Gibbon, 1996; Schatz, VonSecker, & Alban, 2005) might foster correct understanding. 48
Chapter 2
In addition to the discussion on the accuracy of the reported feedback data, we want to stress that relevant and fair feedback information does not always need complex statistical modeling. For example, information on pupil mobility is highly relevant to inform school policy. This variable should not be used merely as a covariate or grouping variable or to determine missing data (van de Grift, 2009). As only SFP is explicitly reporting on amount of students entering in, staying in and leaving from the cohort each year, this could be a consideration for other SPFSs. Moreover, initiatives as asTTle that keep track of student results during their whole career and make those results accessible to them should be encouraged. By denying the value-added aspect, asTTle reports student level data for all students, irrespective what schools they have attended before. Furthermore, we stress that besides adjusted scores, the raw scores need to be reported because of their informative value. After having analyzed the data, SPFS developers need to carefully consider what types of numerical measures and graphical representations to offer to the users. Research has revealed that even simple numerical conceptions and representations are often interpreted incorrectly (Earl & Fullan, 2003; Zupanc et al., 2009). A sufficient level of assessment literacy is a prerequisite for correct understanding. If not, proper support initiatives should be provided. Furthermore, other ethical issues are related to the fairness of the data. For example, the arbitrary or unfair boundaries used in case of cut-off scores (Fitz-Gibbon & Tymms, 2002; Heck, 2006). Furthermore, explorations in term of absolute instead of relative measures of performance need to be encouraged (Luyten, 2006). To see all these risks, proper informing of the users is required (Karsten et al., 2010). Furthermore, it might be worth to offer different types of representations that serve different purposes (e.g. band scores for detecting outliers and line graphs for visualizing growth; Kosslyn, 2006). This idea has been applied in asTTle that provides seven types of reporting serving different purposes. SPFS developers should think carefully about each of these characteristics and keep in mind that the use of school performance feedback does not always lead to improvement, but that it should at least do no harm (Fitz Gibbon & Tymms, 2002; Rowe, 2004). Moreover, they should consider offering training in the interpretation and use of the results, especially when using more complex statistical modeling, as studies have shown that SPFS use without proper training is difficult (Schildkamp & Visscher, 2009; Verhaeghe et al., 2010; Vanhoof et al., in press). Furthermore, it is advisable to provide the users with indications on what instructional or organizational 49
Chapter 2
processes should be improved upon (Coe & Visscher, 2002; Karsten et al., 2010; Verhaeghe et al., 2010), which has not been done by most of the SPFSs in this study.
6. Conclusion We belief that the components of SPFSs discussed in this study are important aspects in ensuring that the SPFSs that have been developed all over the world will be used as they are intended to be: for school improvement purposes. However, we also belief that there is much to be gained when it comes to developing SPFSs that provide schools with reliable, valid and user friendly data. Decisions made by SPFS developers about the design of the SPFS impact the results in ways that are not yet fully understood, but can have implications for determining how “strong or poor” a school is performing. Expanding and adjusting on the preliminary framework we developed into a set of standards SPFS developers and schools can use, may aid in developing efficient instruments for data driven decision making.
Acknowledgement We would like to express our sincere gratitude to the directors and researchers of the SPFSs involved in this study for their cooperation: • Prof. John Hattie, Director of Visible Learning Labs, director of asTTle, University of Auckland • Dr. Christine Merrell, Director of Primary Systems, Centre for Evaluation and Monitoring, Durham University • Elizabeth Archer, Project coordinator of SAMP, Centre for Evaluation and Assessment, University of Pretoria • Geert Evers, Information Manager Primary Education, Centraal Instituut voor Toetsontwikkeling • Ilse Papenburg: Training and advice, Centraal Instituut voor Toetsontwikkeling • Dr. Jean Pierre Verhaeghe, Project coordinator of the SFP, Ghent University and Katholieke Universiteit Leuven
50
Chapter 2
References Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the utilisation of management information systems in secondary schools. School Effectiveness and School Improvement, 18(4), 451–467. Coe, R., & Visscher, A.J. (2002). Drawing Up the Balance Sheet for School Performance Feedback Systems. In R. Coe & A.J. Visscher (Eds.), School improvement through performance feedback. Lisse: Swets & Zeitlinger Publishers. Davies, D. & Rudd, P. (2001). Evaluating school self-evaluation (Research Report No. 21). Berkshire, UK: National Foundation for Educational Research, Local Government Association. Deming, W.E. (1986). Out of the crisis. Cambridge: Massachusetts Institute of Technology, Center for Advanced Engineering Study. Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge Journal of Education, 33(3), 383-394. Fitz-Gibbon, C.T. 1996, Monitoring education: Indicators, quality and effectiveness London: Cassell. Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in indicator systems: Doing things right and doing wrong things. Education Policy Analysis Archives, 10(6), 1-28. Retrieved from http://epaa.asu.edu/ojs/article/viewFile/285/411 Goldstein, H. & Myers, K. (1996). Freedom of information: Towards a code of ethics for performance indicators. Research Intelligence, 57, 12-16. Goldstein, H. & Spiegelhalter, D.J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society: Series A: Statistics in Society, 159(3), 385-443. Goldstein, H., & Thomas, S. (1996). Using examination results as indicators of school and college performance. Journal of the Royal Statistical Society. Series A: Statistics in Society, 159(1), 149-163. Heck, R. (2006). Assessing school achievement progress: Comparing alternative approaches. Educational Administration Quarterly, 42(5), 667-699. Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School selfevaluation and student achievement. School Effectiveness and School Improvement, 20(1), 47-68. Karsten, S., Visscher, A.J., Bert Dijkstra, A., & Veenstra, R. Towards standards for the publication of performance indicators in the public sector: The case of schools. Public Administration, 88(1), 90-112.
51
Chapter 2
Knipprath, H., Verhaeghe, J.P. (2010, April). Instability of the school population: the less favourable side of longitudinal educational effectiveness research. Paper presented at the 2010 AERA Annual Meeting. Denver. Kosslyn, S.M. 2006, Graph design for the eye and mind. Oxford: Oxford University Press Kyriakides, L. & Campbell, R.J. (2004). School self-evaluation and school improvement: A critique of values and procedures. Studies in Educational Evaluation, 30, 23-36. Leithwood, K., Aitken, R., & Jantzi, D. (2006). Making schools smarter: Leading with evidence. (3rd. ed.) Tousand Oaks, CA: Corwin Press. Schatz, C.J., VonSecker, C.E., & Alban, T.R. (2005). Balancing accountability and improvement: introducing value-added models to a large school system. In R. Lissitz (Ed.), Value added models in education: Theory and applications (pp. 1-18). Maple Grove, Minnesota: JAM Press. Luyten, H. (2006a). An empirical assessment of the absolute effect of schooling: Regression-discontinuity applied to TIMSS-95. Oxford Review of Education, 32(3), 397-429. Luyten, H., Tymms, P., & Jones, P. (2009). Assessing school effects without controlling for prior achievement? School Effectiveness and School Improvement, 20(2), 145-165. Maier, U. (2010). Accountability policies and teachers' acceptance and usage of school performance feedback - a comparative study. School Effectiveness and School Improvement, 21(2), 145-165. Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook. Thousand Oaks, CA: Sage. Mortimore, P. & Sammons, P. (1994). School effectiveness and value added measures. Assessment in Education: Principles, Policy and Practice, 1(3), 315. Organisation for Economic Co-operation and Development (2008). Measuring improvements in learning outcomes: Best-practices to assess the value-added of schools Paris: OECD Publishing. Rowe, K. (2004). Analysing and reporting performance indicator data: 'Caress' the data and user beware! Paper presented at the 2004 Public Sector Performance and Reporting Conference, Sydney, Australia. Rowe, K. & Lievesley, D. (2002). Constructing and using educational performance indicators. Paper presented at the 2002 Asia-Pacific Educational Research Association, Melbourne, Australia. Sammons, P. & Luyten, H. (2009). Editorial article for special issue on alternative methods for assessing school effects and schooling effects. School Effectiveness and School Improvement, 20(2), 133-143. 52
Chapter 2
Sanders, W.L. (2006). Comparisons among various educational assessment value-added models. Paper presented at The Power of Two National Value-Added Conference, Columbus, Ohio Santelices, V. & Taut, S. (2009, September). Comprehension and use of value-added school performance indicators reported to teachers and parents. Paper presented at the European Conference on Educational Research, Vienna. Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform: Which data, what purposes, and promoting and hindering factors. Teaching and Teacher Education, 26(3), 482-496. Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems in the USA and in the Netherlands: A comparison. Educational Research and Evaluation, 14(3), 255-282. Schildkamp, K., & Visscher, A. (2009). Factors influencing the utilisation of a school self-evaluation instrument. Studies in Educational Evaluation, 35(4), 150-159. Sharp, S. (2006). Assessing value-added in the first year of schooling: Some results and methodological considerations. School Effectiveness and School Improvement, 17(3), 329-346. Smith, P. (1995). On the unintended consequences of publishing performance data in the public sector. International Journal of Public Administration, 18(2), 277-310. Tymms, P. (1999). Baseline assessment and monitoring in primary schools. Fulton Publishers: London. Tymms, P., & Albone, S. (2002). Performance indicators in primary schools. In A.J. Visscher, & R. Coe (Eds.), School improvement through performance feedback (pp 191-218). Lisse: Swets & Zeitlinger. van de Grift, W. (2009). Reliability and validity in measuring the value added of schools. School Effectiveness and School Improvement, 20(2), 269285. Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in druk).The influence of competences and support on school performance feedback use. Educational Studies. Vanhoof, J. & Van Petegem, P. (2007). Matching internal and external evaluation in an era of accountability and school development: Lessons from a Flemish perspective. Studies in Educational Evaluation, 33(2), 101-119. Van Petegem, P., Vanhoof, J., Daems, F., & Mahieu, P (2005). Publishing information on individual schools. Educational Research and Evaluation, 11(1), 45-60.
53
Chapter 2
Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using school performance feedback: Perceptions of primary school principals. School Effectiveness and School Improvement, 21(2), 167-188. Visscher, A.J. (2002). A framework for studying school performance feedback systems. In A.J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 41-71). Lisse, The Netherlands: Swets & Zeitlinger. Visscher, A.J., & Coe, R. (Eds.). (2002). School improvement through performance feedback. Lisse, The Netherlands: Swets & Zeitlinger. Visscher, A.J., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321-349. Yang, M., Goldstein, H., Rath, T., & Hill, N. (1999). The use of assessment data for school improvement purposes. Oxford Review of Education, 25(4), 469-483. Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for effectiveness and improvement in classrooms and schools in upper secondary education in Slovenia: Assessment of/for Learning Analytic Tool. School Effectiveness and School Improvement, 20(1), 89-122.
54
CHAPTER 3 PERCEPTIONS OF PRIMARY SCHOOL PRINCIPALS ABOUT SCHOOL PERFORMANCE FEEDBACK USE
55
Chapter 3
CHAPTER 3: PERCEPTIONS OF PRIMARY SCHOOL PRINCIPALS ABOUT SCHOOL PERFORMANCE FEEDBACK USE
∗
Abstract The present study focuses on the perception of primary school principals of school performance feedback (SPF) and of the actual use of this information. This study is part of a larger project which aims to develop a new school performance feedback system (SPFS). The study builds on an eclectic framework that integrates the literature on SPFSs. Through indepth interviews with 16 school principals, four clusters of factors influencing school feedback use were identified: context, school and user, SPFS, and support. This study refines the description of feedback use in terms of phases and types of use, and effects on school improvement. Although school performance feedback can be seen as an important instrument for school improvement, no systematic use of feedback by school principals was observed. This was partly explained by a lack of skills, time, and support.
∗
Based on Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and School Improvement, 21(2), 167-188.
56
Chapter 3
1. Introduction In recent years, the trend of decentralizing educational systems has prompted researchers to focus on school-based management and internal evaluation. Because schools are granted autonomy, governmental bodies expect them to be accountable for monitoring their internal quality policy (Nevo, 2002). In this context, the current performance level of a school serves as a starting point for developing future plans and educational targets. To asses their baseline performance level, schools can make use of feedback offered by school performance feedback systems (SPFSs). These external systems deliver confidential information about a school’s performance and functioning (Visscher & Coe, 2002, 2003). Performance feedback helps to reveal the strengths and weaknesses of a school’s functioning and is expected to contribute to the school improvement process by stimulating reflection and self-evaluation. However, receiving feedback alone is not a sufficient condition to foster self-evaluation and systematic reflection at the school level. Several other conditions related to the school, the context, and the specific SPFS being used, determine if and how schools will make use of the available feedback. Empirical research on SPFSs is limited (Schildkamp & Teddlie, 2008). Studies that have been carried out indicate that the actual use of school feedback and its impact are rather low (Coe, 2002; Tymms, 1995; Saunders & Rudd, 1999; Van Petegem & Vanhoof, 2004). We believe that a detailed study of the use and impact of existing school performance feedback initiatives is warranted (Goldstein & Spiegelhalter, 1996; Schildkamp, 2007; Schildkamp, Visscher, & Luyten, 2009; Visscher & Coe, 2002; 2003). In this study we build on the findings of an ongoing project which focuses on the design, development, and implementation of an SPFS in Flanders (The Dutch speaking community of Belgium). We investigate the perceptions of school principals of factors that promote or hinder their understanding and use of school performance feedback information. The results of this study are expected to support the development of SPFSs and to further refine theories on school feedback use.
2. Theoretical framework Based on a literature review, we developed a conceptual framework that integrates factors affecting SPF use and effects (Fitz-Gibbon & Tymms, 2002; Schildkamp, 2007; Van Petegem & Vanhoof, 2007; Visscher, 2002; Visscher & Coe, 2003). This framework is presented in Figure 1. 57
• Context related • School and user related INFLUENCING • School performance feedback FACTORS (system) related • Support related
PHASES IN USE
Chapter 3
Interpretation Policy actions
RESULTS: TYPES OF USE
EFFECTS
Figure 1. Conceptual framework of school performance feedback use
2.1. School performance feedback use: Phases, types and effects Adequate use of SPF is expected to lead to specific effects at the school and pupil level (Visscher, 2002; Schildkamp, 2007). Its purpose is to contribute to school improvement and lead to higher student performance (Visscher & Coe, 2003). Apart from the intended effects of SPF, unintended effects have also been reported in the literature, such as selective student admissions, teaching to the test, and removing difficult students (Visscher, 2002). Other studies refer to undesirable side effects of SPF, such as the demotivation of school staff who become overwhelmed by the amount of the data involved and the amount of time they have to invest (Fitz-Gibbon & Tymms, 2002; Schildkamp & Teddlie, 2008). In this context, SPF does not always result in significantly better student outcomes (Fitz-Gibbon & Tymms, 2002; Schildkamp, Visscher, & Luyten, 2009; Visscher, 2002). Nevertheless, recent research indicates that SPF can have a positive impact on pupil achievement levels (Hammond & Yeshanew, 2007) and on the associated school improvement processes (Schildkamp, 2007; Schildkamp & Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2009). In these studies several effects on process indicators were observed, such as an improvement in consultation and communication about school functioning and school quality, improved didactical approaches, and a stronger achievement orientation of staff. However, considering the limited amount of research available caution is warranted in drawing conclusions about the reported effects of SPF use (Coe, 2002; Schagen, 2004). The way school feedback is used plays a key role in its potential impact. In terms of a policy-making cycle (e.g., Hoy & Miskel, 2001) feedback should be used in the following sequence. First, feedback results must reach the 58
Chapter 3
proper person(s). Second, the data in the report must be read and interpreted correctly for it to be meaningful. In the subsequent diagnostic process, causes and explanations for the results are deliberated. The diagnostic process results in actions that are implemented and finally evaluated. However, research indicates that school principals do not always disseminate feedback information or simply distribute feedback reports without examining them (Van Petegem & Vanhoof, 2004). Other studies found that school feedback users often get stuck in the transition from the interpretation of SPF to active policy making (Vanhoof, 2007; Schildkamp, 2007). This is highly problematic as the interpretation of the data is essential in deducing workable information (Earl & Fullan, 2003). These phases of data use are outlined in the practice of data driven decision making (Learning Point Associates, 2004). However, in the current literature on SPF use for school improvement these phases are not distinguished in a systematic way. Within this policy-making cycle different types of feedback use can be distinguished: (1) direct/instrumental, (2) conceptual, and (3) symbolic/convincing (Rossi, Lipsey, & Freeman, 2004). An instrumental use of feedback serves as a starting point for immediate policy making decisions. A conceptual use of feedback does not result in concrete actions, but influences the decision making process, which indirectly affects action. Even if feedback does not influence one’s conceptualizations, it can affect the policy making process in a symbolic way. This means feedback results serve to convince others of existing opinions and to support viewpoints in discussions (Visscher, 2002). Furthermore, feedback can be used in a strategic way for accountability purposes, although this is not in line with a school improvement discourse (Visscher & Coe, 2003). These four types of feedback use can be considered as results of feedback use. For example, a conceptual use results in an altered way of thinking about pupil performances. This intermediate result can in the end lead to effects of feedback use, such as a stronger achievement orientation. 2.2. Factors influencing school performance feedback utilization Differences in the interpretation and use of school feedback can be attributed to a variety of factors. In the framework of Visscher (2002) and Visscher and Coe (2003) the following set of influential factors are outlined: context, school and user, SPFS, and support. The authors embed the process of feedback use in the broader school environment, which we call context related factors. They do not distinguish support related factors as a separate set, but place them within the implementation process and 59
Chapter 3
characteristics of the feedback system. These variables were selected based on a literature review in the fields of educational innovation, educational management, business administration, and computer science. However, the relations between the different influencing factors and the feedback effects are not examined (Visscher, 2002). This framework is used as a basis for the present study. Context related factors that impact feedback use include the school’s policy strategies at the regional and/or governmental level (Sun, Creemers & de Jong, 2007; Visscher, 2002). For instance, policies can contain clear expectations that schools make use of feedback information. Educational governments can stimulate feedback use by pressure and/or support. Furthermore, feedback will be used differently depending on the context (e.g., school improvement, school accountability, or a combination of both strategies) (Vanhoof & Van Petegem, 2007; Visscher, 2002). Secondly, school and user related characteristics seem to be key variables explaining differences in school feedback use. First, the motivation to use an SPFS leads to different utilizations. Motivation varies from internal quality development or external accountability, to policy preparation (van Aanholt & Buis, 1990; Liket, 1992). Secondly, previous experiences with feedback use, general experience with school related data, and the statistical knowledge and skills needed to interpret feedback reports will also influence feedback use. While most teachers have experience with school test data, pupil monitoring systems, and selfevaluations, in several studies school staff report that they are lacking the skills and confidence when using data for school policy purposes (Earl & Fullan, 2003; Kerr, Marsh, Ikemoio, Darilek, & Barney, 2006; Saunders, 2000; Williams & Coles, 2007). Thirdly, school performance levels also influence feedback use (Visscher, 2002; Visscher & Coe, 2003). Schools receiving positive feedback (large value added) will discuss the results differently compared to schools receiving a less positive picture (Schildkamp, 2007). In line with control theory, participants receiving negative feedback are more likely to make an effort to reduce the discrepancy between the negative feedback and the expected standards (Kluger & DeNisi, 1996). This will result in different policy implications. However, this theory does not hold in all cases; it is not unusual for school principals to withhold feedback information that does not fit the current policy plan (Van Petegem & Vanhoof, 2004). A third set of factors influencing school performance feedback use refers to the characteristics of the school feedback reports and the feedback system. In this context, the perception of the user determines how feedback will be used (Visscher, 2002; van den Berg & Ros, 1999). At the 60
Chapter 3
level of content, feedback should be perceived as relevant, nonthreatening, and corresponding to the actual informational needs (Schildkamp & Teddlie, 2008; Visscher, 2002; Van Petegem & Vanhoof, 2007). Furthermore, the representation of both absolute and relative school performance results also impacts the way feedback is used (Visscher, 2002; Visscher & Coe, 2003). If relative measures are used to compare the school’s results with a reference group, these school scores should be adjusted for the influence of pupil background characteristics and should be linked to the relevant cohort group (Goldstein & Spiegelhalter, 1996). Information should also be up-to-date, reliable, and valid (Visscher, 2002; Visscher & Coe, 2003; Schildkamp & Teddlie, 2008). In terms of ethical issues, Fitz-Gibbon and Tymms (2002) refer to the Hippocratic Oath and state that feedback should “at least do no harm” (p. 75). For example, in some cases feedback can be threatening to recipients’ self-esteem, particularly in a system of accountability (Visscher & Coe, 2003). Consistent with our definition of SPFSs, feedback systems for school improvement should guarantee confidentiality and anonymity to the subjects and schools. Moreover, feedback should not harm subjects or schools on the basis of misleading information (Goldstein & Myers, 1996). The fourth and final set of factors that affect feedback use concerns the support experienced by feedback users (Schildkamp & Teddlie, 2008). School staff that are involved in SPFS training are more likely to read the feedback reports and adopt a more positive attitude (Tymms, 1995). Numerous studies stress the importance of providing feedback support (e.g., Schildkamp & Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2209; Van Petegem & Vanhoof, 2007; Visscher & Coe, 2003). This can be administered by educational and government parties, school team members, or the feedback system itself.
3. Research questions This study examines the perception of school feedback users. Based on the conceptual framework discussed above, the following research questions are asked: • What phases can be observed in practice when schools use school performance feedback? • What is/are the result(s) of using school performance feedback? • How can differences be explained in the interpretation and the further use of school performance feedback in different school contexts?
61
Chapter 3
4. Research context This study is part of a larger SPF project called “Each school its own mirror.” As there is currently no SPFS available in Flanders, this project is in the process of developing and evaluating a new SPFS with collaboration between researchers, various stakeholders, and a target group of primary school principals and teachers. The system that has so far been developed from the SPF project gives schools feedback on a confidential basis. These feedback reports are designed to enable teachers and principals to understand the value added scores of their school as compared to a reference group. The reference group used is taken from another research project (the SiBO project, Schoolloopbanen in het BasisOnderwijs [School Trajectories in Primary Education]) that is currently tracking approximately 6000 children from a representative sample of Flemish schools (from the time they entered kindergarten until the end of primary education). In the SPF project, scores on tests and survey and observational data are being continuously collected to gather information on child characteristics, family background, class characteristics, classroom practices, teacher attitudes and subjective theory, and school characteristics. The tests focus on language learning (orthography, reading fluency, reading comprehension) and mathematics. IRT-based techniques are used to construct the test scores, enabling us to estimate growth curves. The SPF project is currently able to deliver trial versions of school feedback reports to the 198 primary school principals participating. In this study, we build on the results from the trial versions sent to the schools in the spring of 2007. These reports inform schools about the performance of children and classes in the first two years of primary education. Results were reported for mathematics, reading fluency, and orthography, supplemented with information about pupil characteristics (child factors, home factors, and Dutch language skills at the start of grade 1). The school specific results were compared to the Flemish reference group. The central concepts in these reports include learning gain, value added, and adjusted scores and were explained in such a way that no prior statistical knowledge was required. The data were supported with graphical representations (i.e., boxplots, bar graphs, pie graphs, growth curves, and cross tables). The text of each report was standardized. The school principals were required to interpret the results for their school, based on the general information made available. They also received individual pupil feedback which represents the observed scores and percentile rankings relative to the reference group. Pupil feedback was presented to the schools shortly after 62
Chapter 3
taking the class tests, but the aggregated scores at class and school level were sent approximately 10 months later.
5. Research design 5.1. Research approach In this study we use a qualitative design to explore the perceptions of primary school principals of SPF use. A qualitative approach is appropriate since we want to develop a view on “naturally occurring, ordinary events in natural settings, so that we have a strong handle on what ‘real life’ is like” (Miles & Huberman, 1994, p. 10). It is recommended when the knowledge base is limited and the nature of the variables, processes, and interrelations is less clear (Maso & Smaling, 1998), which holds for the literature about SPF use. 5.2. Research instrument and procedure Data were gathered on the basis of semi-structured in-depth interviews. This type of interview creates an informal relationship between researcher and respondent, and gives the researcher a better understanding of the perceptions, opinions, and views of respondents (Mason, 2002). The interview questions were largely open ended and were derived from the conceptual framework discussed above. Respondents were invited to describe their school situation, to propose suggestions, and to express their concerns. To clarify remarks or to ask for elaboration, spontaneous followup probes were allowed (Lindlof & Taylor, 2002). Examples of questions include: • Questions about feedback characteristics: These questions focused on the perceptions of the relevance, interpretability, user-friendliness, validity and reliability of the feedback information (e.g., Do you think the information is relevant to draw a picture of the school’s influence on pupils’ performances? Which information is the most relevant? Why? Do you trust the quality of the feedback results?). • Questions about school and user characteristics mainly focused on interpretation skills, expectations of feedback use, and the perception of the school’s performance (e.g., Do you feel comfortable interpreting these feedback results? If yes, where did you acquire the knowledge and skills for this? Which problems did you encounter?) Furthermore questions regarding school culture characteristics were asked (e.g., Is 63
Chapter 3
•
•
•
•
there a culture of systematic reflection? To what degree do teachers welcome school performance feedback reports? Besides this feedback project, are there other data gathering systems used to asses the school’s functioning?). Questions about support initiatives included support use and support needs (e.g., Have you engaged the team members when interpreting the feedback reports? Do feel enough support from the school staff when interpreting the results? Is there a need for more external support? For which activities?) Questions on feedback use were formulated to discern different types and phases of feedback use (e.g., Did you formulate any goals you want to achieve by using feedback? Which initiatives are you undertaking to communicate the feedback results to staff members? Did the feedback report play a role in policy decision making? Has it influenced your way of thinking about the school? Did you use the report for strategic purposes, such as promoting your school, informing the school inspectorate about your school’s results? Did you use the report to legitimize your own convictions?) Questions about feedback effects were not stressed because it was unlikely that effects of feedback use on the school could already have been observed in the three months period between the feedback delivery and the interview. However, questions about participants’ expectations of effects were posed (e.g., What effects should take place for your effort to have been worthwhile?). The perception of context related factors is limited in this study to the influence of the inspection visits to schools.
School principals were visited in their school office by one of the two interviewers, three months after receiving their school performance feedback report. Interviews lasted approximately 90 minutes. 5.3. Theoretical sampling From the 198 SiBO-principals, a sample of 16 primary school principals was selected by means of theoretical sampling, maximizing a variety of feedback use (Mason, 2002; Silverman, 2005). In this sampling method the choice of cases is made on conceptual grounds, not on representative grounds (Miles & Huberman, 1994). To gather this sample, two months after having received feedback reports, the 198 principals were asked to fill out an online survey. We obtained a response rate of 61%. The principals were selected for the present study on the basis of the following variables: the 64
Chapter 3
degree to which they used the school feedback, the number of children without special needs in their school, experience in working with selfevaluation, and school performance as represented in the feedback report. For each variable the schools were divided in to three groups (low, average, and high), with exception of the school performance level (positive or negative value added). In this survey the principals were asked who they had discussed the feedback report with, and chose from 6 answers. This was considered as an indicator of feedback use. Respondents that depicted more than 3 options were defined as high users. Principals that marked less than two options were defined as low feedback users (M = 1.77, SD = 1.26). The second variable concerns the school’s performance level (Visscher, 2002; Visscher & Coe, 2003). A distinction was made between schools with a positive or negative value-added mathematics score at the end of grade two. In the online survey principals were asked to report their degree of experience in conducting self-evaluations in the school. Respondents with scores higher than three on a 5-point Likert scale were classified as highly experienced and those with scores less than three as having a low degree of experience (M = 3.50, SD = 1.08). This selection criterion was used as it indicates prior experience in data use for school improvement. The fourth selection variable was the number of pupils without special needs at their school. As feedback reports in this case were adjusted for pupil background characteristics, a differential approval of the feedback relevance was expected. Schools with percentages between 30 and 70 are considered as having an average number of pupils without SEN (M = 50.36, SD = 27.73). Figure 2 gives an overview of the selected schools. 16
Interviewees Feedback use
H (5)
A (7)
Value added
+ (3)
- (2)
+ (4)
Self-evaluation
H (3)
L(2)
H(4)
School population
H (1) A (1)
L (1)
L (4)
- (3)
H (3)
A (1) L (1) H (2) A (2) H (1) A (2)
+ (1)
- (3)
? (1)
H(1)
L (2)
L (1)
L (1)
A (2)
Figure 2. Overview of selected respondents. Note: H = high, A = average, L = low, ? = information unknown, + = positive value added score, - = negative value added score; number of respondents between parenthesis. From left to right respectively respondents from school 2, 11, 7, 1, 16, 3 & 10, 4 & 13, 15, 14 & 5, 6, 9, 8 & 12.
65
Chapter 3
5.4. Framework analysis Next to influencing the design of the SPFS, the results of this study were also used as a means to evaluate the theoretical framework presented above. Therefore the interview data were placed in the theoretical frame to examine whether the theoretical findings were confirmed or needed to be altered and/or elaborated. This can inspire future studies that build on new preliminary concepts, and hypotheses (Ritchie & Spencer, 1994). These findings can also contribute to the ecological validity of research findings on feedback effects, as here they are applied in the context of school improvement (Visscher & Coe, 2003). Each interview was transcribed verbatim and was independently coded by two researchers with ATLAS.ti, a qualitative analytic software tool. Codes were assigned by following the middle order approach, which allows for the initial application of broad categories that can later be refined (Dey, 1993). Text fragments were mainly assigned to codes in a deductive way. First, text fragments were placed under broad categories (e.g., effects of use, phases of use, the four groups of influencing factors, types of use, and other relevant information) and were then assigned to a predefined coding structure. If no predefined code was appropriate, the text fragments considered to be of importance were placed under the suitable broader category. New codes were created for these fragments inductively, emerging from the data, as in the grounded theory approach (Strauss & Corbin, 2007). For inter-rater agreement, the first two interviews were coded collaboratively and the coding structure was set up. Two interviews were then coded by both researchers separately to calculate inter-rater reliability, following the formula of Miles and Huberman (1994): ratio between the number of agreements and the total number of attributed codes. An inter-rater correlation value of .90 was calculated, indicating good inter-rater reliability. After this coding phase, the analysis shifted from a focus on individual interviews in a vertical analysis to a focus on the coding categories as they occurred in all the different interviews in a horizontal analysis (variable oriented approach; Miles & Huberman, 1994). This allows the researcher to transcend the individual narratives of the school principals and to create a spectrum of perceptions and interpretations.
66
Chapter 3
6. Findings and discussion 6.1. What phases can be observed when schools use school performance feedback? The interview results confirm that school performance feedback use in primary schools is limited. Most schools were situated at the first phase of the policy cycle described above. Only a few schools reached the planning phase and action phase in the policy cycle. Concerning the dissemination of information, the first stumbling block occurred at the moment feedback reports arrived at the school. Though all interviewees confirmed receipt of the report, one of them could not remember it. This stumbling block became more apparent when we examined the various ways in which the reports were handled. In some schools, the report was not read: “Mostly the reports arrive at the school. I give it a glimpse and then it is classified. Then, nothing is done with it” (School 8). Other school principals reported they only took a quick look at it. In contrast, others distributed the report to the teachers responsible for the class that was discussed in the report. Others handed the report over to the special needs teachers or special care coordinators. Sometimes teachers were intentionally not asked to be involved in reading the reports. My opinion is that if you are not really acquainted with the interpretation of these data, you will not spontaneously unravel the whole report. It is not so easy. It is an extra task on top of the rest. If I do this and draw the conclusions and give it to them, it is already a lot. (School 5) Occasionally reading the feedback reports led to discussion between the principal and the special care coordinator. In other cases, teachers were also invited into a discussion, but even then it was not guaranteed that they would read the reports. Principals reported that informal and unplanned discussions took place: We have a smoking room. That’s where we discussed the report. Those who entered the room glanced through the report. It was not intentionally communicated to the rest of the team members. This happened rather informally. (School 10) Other principals reported having a formal discussion during planned multidisciplinary team meetings. In these instances, the school principal or special care coordinator presented a summary of the results and their interpretations. All school principals reported that they only discussed the 67
Chapter 3
feedback information within the school team, with the exception of also reporting the information to the education inspectorate. While we made a theoretical distinction between a reading and an interpretation phase, it became clear that in practice these phases merged together. The principals or special care coordinators that discussed the results with team members proposed their own interpretations. The new report was read and discussed by me and the care coordinator. Afterwards the report was discussed in a team meeting with all teachers; not just the teachers that are involved in the research. Conclusions and underlying statistical procedures were communicated. Growth curves were presented. (School 2) Principals also stressed that the interpretation process was an intensive, time consuming, and difficult activity. Some confirmed that they were not able to correctly interpret or understand the information. This is problematic as the interpretation phase is crucial for developing a solid and valid basis for the development of school policy (Earl & Fullan, 2003). While a minority reported not having experienced difficulties, all principals reported that successfully interpreting the report requires effort. You really have to examine it carefully to figure it out. I went over it …but to really master it, you have to read and examine it several times. (School 1) I think that …one of the reasons is that you first look at it. It is similar to the directions for use of a new apparatus. First you set it up and afterwards you read how it works. If the set up is successful, you are not going to read the instructions for use. (School 14) The laborious interpretation phase seems to have a strong impact on the diagnostic phase. Most principals dropped out after one attempt at understanding the feedback results. Only a few principals set up initiatives to identify strengths and weaknesses in their school and examined the feedback information when looking for explanations. However, this was rarely set up in a systematic way. Principals frequently stated that the diagnostic and action phase were barely reached. They also linked this to the lack of cues in the feedback reports that might direct future action. This may be a reason why school feedback is not systematically taken into consideration when developing internal policy.
68
Chapter 3
We discuss it with the teachers involved. And, until now, the interpretation is limited to the reading of the report and the file, but no immediate actions follow from this. (School 4) But this feedback is not that useful for classes and individual children. I think this is the biggest concern. In fact it has to be as concrete as possible. That is the request of teachers; something ready-made. In fact this is also partly how I am. If I take a method book, I expect not to have to search for accompanying exercises. (School 3) 6.2. What is/are the result(s) of using school performance feedback? The findings discussed above indicate factors that can affect the outcomes of SPF use. We found that in some schools feedback is used as a mirror image of the school’s performance. In those cases a better understanding of the school’s impact on pupil performance was developed. However, this did not automatically lead to (policy) actions. This can be labeled as conceptual feedback use; it led to reflection in schools, even when the results confirm prior findings and impressions. Indeed, so far we have (…) already noticed a few things concerning the school’s position that we were not aware of before. What we also notice is that there is a large pupil mobility, which influences our results significantly. These are important findings for us. (School 12) Most important was to see where the school’s position is. How well are we performing and whether the school realizes a value added score. This is, for me personally, a refinement in thinking about what you are doing as a school, about your task, about your aims … (School 7) Illustrations of instrumental feedback usage were rare. Some principals stated that the feedback information did not offer enough starting points (e.g., remedial information) to direct actions. However, some principals reported that action had been taken, such as a reorganization of rosters, an increase in the number of teaching hours, the introduction of a new reading method, and more intensive mentoring of new teachers. Even when information confirmed prior findings, it led to instrumental feedback use: “What is reported confirms what we already assumed. It is more like an affirmation of our feelings. And we have done a few things, such as introducing a new spelling method.” (School 10) Feedback information was particularly used in a symbolic way. Respondents indicated that school feedback was a useful instrument in highlighting existing opinions and underlining various problems in the 69
Chapter 3
school’s functioning. According to the respondents, the feedback was used as input for shared decision-making. However, this did not lead to concrete action. I had my own vision of the school and I wanted to impose it on the team; this was a good instrument to make out a case for it and to say it is necessary that we deal with this. (School 4) Examples that we found of strategic utilization referred to the use of school feedback in the development of the self-evaluation report to be submitted to the education inspectorate. Principals reported that they were grateful to participate in the study because they could make use of school, class, and pupil related information for this purpose. This factor deviates from the original theoretical model of SPF usage. Schools seem to have used the feedback information in the context of being accountable to the inspection authorities. This is in contrast to the perception of the authors and developers of the SPFS who want feedback to be used for school improvement. Not all of the information gathered about feedback use could be placed within the predefined coding scheme that was based on the literature (Rossi, Lipsey, & Freeman, 2004; Visscher, 2002). Therefore two extra codes were created: a motivating use and a pupil directed use. In some cases, the feedback information helped to motivate or stimulate school team members. In some schools, the feedback was communicated to team members for this purpose, which sometimes implied a selective presentation of the results. If you are an immigrant school, as we are, sometimes it is questioned if our performance level is high enough. And if you receive an output report from an external organization, it partly confirms we are doing a good job. (School 16) For making internal plans (…) we selected some results for reading and mathematics. We used these results for our own reports to say: ‘Look, on this measurement occasion, we just took out these results and notice that our children score like this. And the Flemish average is like this. Thus, we are below this average’. (School 7) The latter statement illustrates that lower performance results were also used to motivate the team members to overcome shortcomings. Conversely, some school principals kept the feedback results private, especially if they were not as good as expected. This was explained by the intention not to discourage team members.
70
Chapter 3
For example, concerning the learning gain scores. Absolutely. If I had to communicate it and mention that for example the learning gain in the first grade is smaller than on average and in the second grade larger than on average, this would be very hard to bear for the teacher involved if this is made public. I am sure of that. (School 6) All of the aforementioned examples indicate feedback usage at the school level. During the interviews, principals stressed that aggregated results were useful for policy makers, but not for teachers who prefer a pupil directed utilization. Classroom teachers need data at the pupil level to direct actions that correspond to the learning needs of individual pupils. Pupil feedback is seen as complementary to pupil monitoring systems and is also considered more accessible to interpretation and to direct action on short notice. These interview results indicate that school feedback is not extensively used and has a limited impact. In fact, many school principals had not yet noticed school improvement effects by using the SPF, and if they had, they referred to the effects of using the feedback reports of the previous year. [As a result of mentoring starting teachers and introducing a new method; cf. instrumental use] We see the AVI-results [AVI is a Dutch grading system for reading fluency often used in primary education]. When before almost no pupil reached an AVI-1 level at the end of the year with that method and that young teacher, we now have several AVI-6 levels. Thus we have good results. That partly was a result of that. (School 1) Some principals stated that, because of the longitudinal nature of the study that provides the feedback services, barriers against the feedback discussions in the group decreased and interest in the results increased. This illustrates the valuable effects of process variables that indirectly contribute to school improvement (Schildkamp, 2007; Schildkamp, Visscher, & Luyten, 2009; Schildkamp & Teddlie, 2008). 6.3. How can differences be explained in the interpretation and the use of school performance feedback in different school contexts? In the theoretical framework, different factors/conditions were discerned that explain differences in school feedback use. Our findings confirm the distinction of four clusters of related factors.
71
Chapter 3
Context related factors To understand why school feedback is used to such a limited extent, we must take into account both the research context and the Flemish educational context. In terms of the research context the SPF presented information at the school level with adjusted scores. These built on a comparison with a reference group, resulting in value added scores. This is a very new approach that principals are not acquainted with. In terms of the Flemish educational context, the central educational authorities do not formally encourage or oblige schools to adopt an SPFS approach. Indeed, some authorities are even reluctant to do so, stating that it introduces the risk that schools will be compared and ranked on the basis of biased information or that adjusted scores will reveal another school performance level than expected. However, educational inspection authorities adopt another view. They encourage schools to document school performance on the basis of performance related information. [On being questioned about whether it was a conscious choice to participate in the research project] You always have the possibility to refuse…The main reason for me to participate was that our inspectorate often asks for output results. And yes, of course we have our own class tests but there is no reference point, because teachers create their own tests. We also have tests from our methods. But nowhere is there a comparison with another school to see how we perform. (School 1) School and user related factors The interview analysis indicated four groups of related school and user characteristics. Differences in expected functions and effects of school performance feedback. School principals differed in the degree to which they had expectations of using feedback as well as the goals they oriented themselves towards with feedback use. Some did not even define goals or targets, while others reacted in a proactive way. When schools did formulate explicit and shared goals, the chances of observing more optimal and successful feedback use increased. This indicates that if schools are convinced of the potential of school feedback, they undertake actions toward effective use (Bosker, Branderhorst, & Visscher, 2007). These actions have to be performed by the users themselves for innovations to become successful (Fullan, 2007).
72
Chapter 3
A distinction can be made between utilization expectations and effect oriented expectations. In the former situation, school principals expected to use the school feedback as a mirror, helping to develop a clearer view of the current school operation and school performance, and to detect strengths and weaknesses. Others expected to use feedback for policy development (e.g., for evaluating policy decisions or developing policy plans). We thought ‘look this research will be conducted over seven years; we are going to follow it up. Where are we as a school? We are putting a lot of effort into our care policy. What does this effort give us in return?’ (…) In fact, we do have a very problematic population and it is our goal to see what the benefit is of all our effort. (School 1) Another utilization oriented perspective was discussed above (i.e., when principals used the information for accountability purposes). Almost all principals intended to use the feedback as input for their discussions with inspection authorities but stressed that they would not do this for parents. In terms of effect expectations, principals expected that investing time and effort in school feedback would eventually improve education: “We expect to improve our quality of education. So far, for the first grade, it was worth the effort. That is the goal: an improvement of our education” (School 1). We found no evidence that the principals systematically reflected upon their expectations with regard to feedback use and feedback effects. In addition, principals indicated that their expectations of the feedback did not necessarily reflect the opinions of their staff members. Teachers are not willing to participate because it is a lot of work for them. Moreover, the SPF project examines the same domains as the pupil monitoring system, thus it does not directly benefit them. (…) Teachers participate in this research project because the previous school principal decided they would. For them, it is ‘if it must be.’ (School 2) Differences in statistical knowledge and skills. Most school principals claimed not to have advanced statistical knowledge. Their statistical knowledge was acquired during their initial teacher training and additional training courses, and was partly based on learning to work with pupil monitoring systems. However, they stressed that this was insufficient to work with school performance feedback. Conversely, some did not experience difficulties, either because everything was explained in the report or because they had sufficient prior knowledge.
73
Chapter 3
Everything [in the feedback reports] is explained in terms of how to interpret it. Thus, if one pays enough attention to the instructions ‘to read it this way and these numbers, if this is mentioned it means this,’ then I think no extra prior knowledge is needed. (School 4) Differences in time available for feedback use. Some principals reported that if more time was available, they would have made more use of the feedback. Because principals and teachers have to divide their time over a large number of activities, less urgent tasks as those related to SPF use are not prioritized. This confirms previous findings that the self-evaluation of a school is not a priority for principals and teachers (Visscher, 1996; Williams & Coles, 2007). There is often a lack of time. You cannot use this as an excuse but it is often the reason. For example at team meetings, you want to put this and this on the agenda, but then there is not enough time to go more deeply into it, because there are so many issues coming from the outside. (School 11) Differences in perceptions of positive/negative feedback results. When school feedback reflected low performance levels, the principals were willing to search for explanations. This confirms the control theory of Kluger and DeNisi (1996). However this observation cannot be generalized: When the performance levels were far below average, sometimes feedback results were not distributed in order not to discourage team members. When performance results were perceived to be relatively good, further use of the feedback reports decreased: “We are scoring on average, so there are no severe differences. So why should we pay much attention to it?” (School 3). The perception of the performance results was influenced by the way the results were represented, for example by the way value added is calculated. The feedback reports presented both adjusted scores that took into account the influence of pupil background characteristics and nonadjusted scores. Our results indicate that especially in schools with a large number of children with special needs, the adjusted performance scores were valued positively. The surplus value of this research for our school is that for all these years we’ve had the impression we were doing things right. Because we have a large number of foreign speaking and special needs children we want to know the effects of the way we organize our education and monitor our children. (…) Particularly in the last few years with the introduction of
74
Chapter 3
adjusted scores, some attention is given to the pupils’ progress, while taking into account certain factors. (School 7) School performance feedback (system) related factors Feedback has to meet a number of requirements to facilitate correct interpretation and to promote feedback utilization. Differences in perceived feedback relevance. All school principals requested that feedback should fit their needs. These needs differed between schools. Some principals expressed a primary interest in performance results on mathematics and language; others were more interested in socio-emotional development or other subjects. Furthermore, schools’ preferences differed in the calculation of value added scores (observed or adjusted scores), in the way information was aggregated (pupil – class – school – other subgroups), in the amount of statistical background information in the reports, and the nature of the reference group(s). During the interviews these differences were observed between and within schools. Differences were also related to the roles and occupations of feedback users. Teachers prefer pupil level feedback, pupil relevant error analysis, and remedial material, whilst policy makers prefer aggregated information that reflects their school focus. In my opinion, the school and class level is the most interesting, in view of my function. I am supposed to work mainly on school and class level and less on the pupil level. Thus for me this is more interesting than an individual report. But of course a teacher will see it differently. I am sure of that. This teacher will probably prefer feedback about the pupils in this class. (School 10) When asked for ideas on how to better meet user needs, respondents suggested enlarging the amount of school subjects to be tested, focusing on different pupil cohorts, and tailoring information. The interviewees were not pleased about redundant information. They required feedback systems to focus on complementary information. In particular, some principals asked for information that would complement the available monitoring systems. All respondents required that the performance feedback be up to date. In particular, teachers expected feedback within the same school year as when tests had been taken, in order to support low-scoring pupils. When teachers shifted classes, feedback results of previous years were considered irrelevant. Differences in perceived feedback interpretability. For this factor no coherent picture could be deduced from the interview data. Most principals 75
Chapter 3
stated that interpreting the information was difficult. Some stressed that interpreting the information without support was a hopeless task. Some stated that the information could not be understood after only one reading. But not all principals considered this to be a problem or experienced difficulty in analyzing the reports. Some principals stated that it is important to stress that school feedback is a complex field and cannot be simplified without losing depth and meaning. It is magnificent the way this report [is written]…It is not easy to explain something complex that clearly. They [the feedback developers] largely succeeded in it, but it is still a large amount of information. (…) Of course, sometimes I get lost, which is not surprising, considering the technicality of it. (School 7) During the interviews, explanations for why some principals were not able to correctly interpret the feedback were given. Some complained of a lack of structure in the feedback information. Others criticized the amount, stating that they skipped a lot of information, were selective, and focused only on the school results. Maybe some parts are less interesting for me, but this is not a reason to leave out this information from the reports, because everything is concisely described. For example, the information about pupil mobility, if it does not interest you, just turn the page. (School 10) In contrast, others appreciated the comprehensiveness of the feedback reports and preferred the additional information. A third element influencing the interpretability of feedback was the balance between technical concepts and the way school staff label and discuss education. Feedback was often experienced as being too abstract. Additionally, principals seemed to be less familiar with feedback that was aggregated at the class and school level. Both the language used and the graphical representations (growth curves, box plots) led to difficulties in interpretation. Some school principals stressed that the feedback is not appropriate for teachers as they do not possess the competence or experience to interpret the information, whilst others did not question the competence of their staff. Differences in perceived validity and reliability. Respondents trusted the professionalism of the feedback developers. Nevertheless, they expressed some concerns. Some principals valued the feedback less because the adjusted scores do not take into account school specific process and context variables. The feedback developers wanted to articulate these differences, but schools preferred an adjustment model taking into account 76
Chapter 3
more external influences that explain school outcomes and result in an average school profile. I think researchers do not have enough information [about pupil and school characteristics]. They do not know we introduced a new reading method, which caused problems. They do not know there was a starting teacher. And they do not know that this teacher is not worthy of being called a teacher. That gives different results. This information should be on top of it [of the current adjustment procedure]. It is important for the school. (…) Now it does not give a correct image of the school. (School 1) The feedback was perceived as valid and reliable when the results were congruent with the findings of pupil monitoring systems, school tests, or intuition. When this was not the case the results were seen as less valid and low performance was more easily attributed to external factors, such as the difficulty of test items, atypical question methods, and incorrect results of the reference group (i.e., some schools were thought to have falsified their results by helping their pupils during the test). Others criticized the singleshot nature of the data gathering. A particular problem arose when a school was geographically distributed. Aggregation of data at the school level was of lesser value because the school’s population, and sometimes also school’s policies, can differ between geographical locations. Finally, concerns were expressed when class organization or differentiation forms were very different from the approaches adopted in the reference group. Differences in perceived user-friendliness of the SPFS. The nature of the overall feedback system influenced feedback use. Respondents complained about the large investment of time and effort during the data gathering process. Teachers and pupils perceived the tests as stressful. In addition, questionnaires directed to parents required a considerable amount of time and a willingness to report private information. Furthermore, test times overlapped with other key assessment and evaluation periods in the school year. This explains why some teachers considered participation in the project as an extra burden on top of a heavy workload. This feeling was reinforced when the feedback was perceived as less relevant. User-friendliness also refers to the tailoring of the school feedback. Some principals suggested adapting the report to the individual school setting. In the same line, satisfaction with the communication between user and the feedback system played a role. Moreover, the schools received the feedback at a rather unexpected moment, which made it difficult to include the new information in the policy making cycle.
77
Chapter 3
Support related factors The interviewees offered valuable information on user needs concerning feedback support and advised us about how to fulfill these needs. The results reveal that feedback use requires both policy oriented and research oriented skills. These are skills that must be developed (Visscher, 2002). Differences in support needs. As mentioned above, most users reported not being able to interpret the information without extra support. Nevertheless, feedback support should go further than just assuring a correct interpretation. Almost all principals reported that they got stuck after their attempt at interpretation. They stated that they did not feel confident about their interpretative capacities and that they needed recommendations on how to proceed to the next phases in feedback use. I can only hope my interpretations are correct. But definitely with the last report, it is so extensive that there is some – I will not say doubt, but fear – that it might be wrong. (School 6) It is the same problem as with pupil monitoring systems. You can go to the teacher and say ‘these are your results. This child scores an E, Here you are.’ That teacher will file the report and there it stops. (School 3) The respondents asked for specific help dependent on whether they received positive or negative feedback results. Furthermore, they requested help in diagnosing the causes and circumstances that the results could be attributed to. Most respondents asked for concrete instructions for action. This suggests that consultation services could help to fulfill these additional needs. Differences in support characteristics. When asked for ideas on how to organize support, some respondents requested a face-to-face introduction to the concepts and representations in the report. These sessions should be organized on site, but if that is not possible, regional meetings are acceptable. Feedback support should be functional, offering intelligible, theoretical, and practical information. Principals expected the support to go beyond the interpretation phase and to empower schools to diagnose their results. Concerning the interpretation, we try to manage it. But we do not know if we are doing it right. It would be interesting if the SPF project would come with the report to the schools and would explain the information in a team meeting with the teachers, with the whole team, to show us how to look at the results. ‘What’s the next step?’ Because now we only get the ‘sec’ results and read them as such, as how they are printed. Even 78
Chapter 3
some reading advice is provided, the impulse to really do something with it is always lacking. (School 4) Defining the role of external support services was a difficult issue. Some respondents claimed that schools have to take the lead in feedback use. This is in line with Earl and Fullan (2003), who claim that professional development will help strengthen personal confidence and self-efficacy in coping with complex feedback information. The respondents indicated a preference for internal support by counselors and via in-service training. External support from feedback suppliers should not interfere with these initiatives. They emphasized the demand-driven nature of support. This confirms the idea that external support must be tailored to the needs of individual schools. A sufficient level of goodness-of-fit is a requirement to achieve successful support (Nevo, 1995). Principals also referred to school team members as a basis for support. Principals mostly got support from the special care coordinator or teacher. Often these staff members were more experienced in interpreting statistical concepts and graphical representations. These staff members can play a role as complementary specialists. As they have a more flexible work schedule, they can allocate time to study feedback reports. This is not the case for teachers that have to work according to a prescheduled roster. Some school principals also wanted to protect team members against work overload, thus not involving them in feedback use activities. They might also have perceived these staff members as less important sources of support in feedback use.
7. Implications, limitations and conclusion The present study focuses on the perception of principals of school performance feedback and the actual use of feedback information. This study took place within the context of a larger project aiming to develop and implement a new school performance feedback system. This study also builds on an eclectic framework that integrates the literature on SPF use. This framework was the guiding structure for interviews with 16 principals from different primary school settings. Our results indicate that the elements presented in the theoretical framework reappear in the interviews. Figure 3 represents the integration of findings from the literature and our study.
79
School and user related
◦ Functions/expectations of SPF use ◦ Prior knowledge and experience in data use ◦ Priorities in task scheme ◦ Statistical knowledge and skills ◦ Perception of school performance level
◦ Perception of relevance School ◦ Perception of performance interpretability feedback ◦ Perception of validity and (system) reliability related ◦ Perception of userfriendliness Support related
◦ Support needs ◦ Support set up ◦ Internal - External
Figure 3. Integration of literature and research findings on SPF use
80
Receiving feedback
Successive phases
U S E
◦ School improvement – accountability ◦ Pressure and support
F E E D B A C K
I N F L U E N C I N G F A C T O R S
Context related
Reading and discussing
Evaluation
Implementation Planning
Interpretion Diagnosis
Instrumental Conceptual Results: Types of Symbolic Strategic feedback use Pupil directed Motivating
E F F E C T S Intended – Unintended Desirable – Undesirable Product - Process
Chapter 3
The aim of this study was to illustrate and elaborate a framework of factors that influence school performance feedback use. Where previous studies have provided literature findings (Visscher, 2002; Visscher & Coe, 2003), perspectives of feedback suppliers (Schildkamp & Teddlie, 2008; Visscher & Coe, 2002), and quantitative methods of testing feedback use (Schildkamp, Visscher, & Luyten, 2009), this study illustrates the influence of different variables on feedback use in a qualitative way. From a theoretical perspective our findings can help refine the description of feedback use. Whereas previous studies (e.g., Schildkamp, 2007; Visscher, 2002; Visscher & Coe, 2003) make a distinction between different kinds of information use (cf. instrumental, symbolic, and conceptual use; Rossi, Lipsey, & Freeman, 2004; cf. strategic use; Visscher & Coe, 2003), an empirical investigation of the phases of feedback use has not been carried out. In this study both were explored. Additional types of feedback use emerged from the data: a motivating and pupil directed use. The interview data also show that different types of feedback use are related to one another and occur simultaneously or successively. While a sequence of feedback phases can be discerned theoretically (Learning Point Associates, 2004), the process of feedback use is less systematic in practice. Our findings indicate that users can get stuck in the process of feedback use. A crucial challenge for future feedback use is to detect the difficulties in each phase and to offer appropriate support to systematize the process involved. Our findings indicate that interpreting school feedback, making a diagnosis based on the results, discussing causes, and setting up actions based on feedback results is not a clear-cut process. The results reveal that feedback use requires both policy oriented and research oriented skills which must be developed by users (Visscher, 2002). Educational authorities should not neglect the importance of stimulating professional development and providing external support. Expectations about the positive impact that feedback use can have on school improvement will only be realized if extra support is available (Schildkamp & Teddlie, 2008; Sun, Creemers, & de Jong, 2007). To design appropriate support initiatives, a detailed analysis of the difficulties encountered when interpreting feedback reports must be conducted. For example, a recent study, which used both oral comprehension tests and IRT-calibrated online tests, illustrated the misconceptions that respondents reported during the interpretation of feedback reports. The results of that study contributed to the design of specific support initiatives (Verhaeghe, Verhaeghe, Vanhoof, & Valcke, 81
Chapter 3
2009). Furthermore, experimental studies that manipulate the nature of external support can contribute to the design of a more sophisticated SPFS (e.g., Tymms, 1995) and the required support measures. In the design of SPFS, it is important to integrate the characteristics which appear to have a considerable influence on feedback use, such as relevance, interpretability, reliability, and validity. These characteristics are mediated by the perceptions of the feedback users. What is considered relevant by feedback developers, policy makers, or researchers does not necessarily correspond with what the target group perceives as relevant. However, little is known about the effect of these differing perspectives in the context of school feedback use. Moreover, one cannot expect schools to successfully implement innovations without making sufficient resources available (Davies & Rudd, 2001; Kimball, 2002). As school feedback use is not heavily promoted (Davies & Rudd, 2001), resources are limited. When we consider the work load of teachers and principals, our findings indicate that teachers will prioritize their classroom related activities at the expense of school level issues. This study was conducted in Flanders where there is no accountability culture or central examination system. It is not yet clear whether effective feedback use in such a context should only function within a school improvement perspective, as we found that feedback use was stimulated by an accountability orientation in terms of the inspection visits. It would be useful to examine the (in)direct influence of national and international authorities on feedback use (Creemers, 2006). Future research could focus on the relationship between a school improvement and an accountability orientation of educational authorities and key stakeholders (Vanhoof & Van Petegem, 2007) and on the balance between internal and external evaluations (Kyriakides & Campbell, 2004), influencing feedback use in schools. The present study contains certain limitations. The validity of our findings is restricted to a specific educational context, with a particular school performance feedback system. However, the aim of this study was not to formulate generalizations but to explore and illustrate feedback use by its users. Another limitation is that a comprehensive framework is needed with an evidence based set of influencing factors. Neither this study, nor previous school performance feedback studies have attempted to meet this need. Furthermore, the link between school performance feedback use and the more general practice of data driven decision making remains unexplored. Despite the focus on accountability in the data driven
82
Chapter 3
decision making literature, common points of interest with SPF use can be further examined.
References Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the utilisation of management information systems in secondary schools. School Effectiveness and School Improvement, 18(4), 451-467. Creemers, B.P.M. (2006). The importance and perspectives of international studies in educational effectiveness. Educational Research and Evaluation, 12(6), 499-511. Coe, R. (2002). Evidence on the role and impact of performance feedback in schools. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger. Davies, D., & Rudd, P. (2001). Evaluating school self-evaluation (LGA research report 21). Berkshire: National Foundation for Educational Research, Local Government Association. Dey, I. (1993). Qualitative data analysis: A user-friendly guide for social scientists. London: Routledge. Earl, L., & Fullan, M. (2003). Using data in leadership for learning. Cambridge Journal of Education, 33(3), 383-394. Fitz-Gibbon, C.T., & Tymms, P. (2002). Technical and ethical issues in indicator systems: Doing things right and doing wrong things. Education Policy Analysis Archives, 10(6), 68-82. Fullan, M. (2007). The new meaning of educational change (4th ed.). London: Cassell. Goldstein, H., & Myers, K. (1996). Freedom of information: Towards a code of ethics for performance indicators. Research Intelligence, 57, 12-16. Goldstein, H., & Spiegelhalter, D.J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society. Series A (Statistics in Society). 159(3), 385-443. Hammond, P., & Yeshanew, T. (2007). The impact of feedback on school performance. Educational Studies, 33(2), 99-113. Hoy, W., & Miskel, C. (2001). Educational administration: Theory, research and practice. Boston: McGraw-Hill. Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H., & Barney, H. (2006). Strategies to promote data use for instructional improvement: Actions, outcomes, and lessons from three urban districts. American Journal of Education, 112(4), 496-520. 83
Chapter 3
Kimball, S.M. (2002). Analysis of feedback, enabling conditions and fairness perceptions of teachers in three school districts with new standardsbased evaluation systems. Journal of Personnel Evaluation in Education, 16(4), 241-268. Kluger, A.N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254-284. Kyriakides, L., & Campbell, R.J. (2004). School self-evaluation and school improvement: A critique of values and procedures. Studies in Educational Evaluation, 30, 23-36. Learning Point Associates. (2004). Guide to using data in school improvement efforts: A compilation of knowledge from data retreats and data use at learning point associates. Retrieved October 23, 2007, from http://www.learningpt.org/pdfs/datause/guidebook.pdf Liket, T.M.E. (1992). Vrijheid & rekenschap: Zelfevaluatie en externe evaluatie in het voortgezet onderwijs [Freedom and accountability: Self evaluation and external evaluation in secondary education]. Amsterdam: Meulenhoff Educatief. Lindlof, T.R., & Taylor, B.C. (2002). Qualitative communication research methods (2nd ed.). London: Sage Publications. Maso, I., & Smaling, A. (1998). Kwalitatief onderzoek: Praktijk en theorie [Qualitative research: Practice and theory]. Amsterdam: Boom. Mason, J. (2002). Qualitative Researching (2nd ed.). London: Sage Publications. Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks: Sage Publications. Nevo, D. (1995). School-based evaluation: A dialogue for school improvement. Oxford: Pergamon. Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 3-16). Oxford: Elsevier Science. Ritchie, J., & Spencer, L. (1994). Qualitative data analysis for applied policy research. In: A. Bryman & R. Burgess (Eds.), Analysing qualitative data (pp. 173-194). London: Routledge. Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic approach (7th ed.). London: Sage Publications. Saunders, L. (2000). Understanding schools’ use of ‘value added’ data: The Psychology and sociology of numbers. Research Paper in Education, 15(3), 241-258. Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’ data: A science in the service of an art? Paper presented at the British 84
Chapter 3
Educational Research Association Conference, Brighton, University of Sussex. Schagen, I. (2004, November). Weighing the baby or fattening it: The use of data to inform school evaluation. Paper presented at the NFER/ConfEd Annual Research Conference, London. Schildkamp, K. (2007). The utilisation of a self-evaluation instrument for primary education. Unpublished doctoral dissertation, University of Twente. Schildkamp, K., & Teddlie, C. (2008). School performance feedback systems in the USA and in The Netherlands: A comparison. Educational Research and Evaluation, 14(3), 255-282. Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school self-evaluation instrument. School Effectiveness and School Improvement, 20(1), 69-88. Silverman, D. (2005). Doing qualitative research: A practical handbook (2nd ed.). Londen: Sage Publications. Strauss, A.L., & Corbin, J. (2007). Basics of qualitative research: Grounded theory procedures and techniques (3rd ed.). Newbury Park, CA: Sage Publications. Sun, H., Creemers, B., & de Jong, R. (2007). Contextual factors and effective school improvement. School Effectiveness and School Improvement, 18(1), 93-122. Tymms, P. (1995). Influencing educational practice through performance indicators. School Effectiveness and School Improvement, 6(2), 123-145. van Aanholt, T., & Buis, T. (1990). De school onder de loep [The school under scrutiny]. Culemborg, The Netherlands: Educaboek. Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatieindicatoren als strategisch instrument voor schoolontwikkeling [Feedback on school performance indicators as strategic instrument for school improvement]. Pedagogische Studiën, 81, 338-353. Van Petegem, P., & Vanhoof, J. (2007). Towards a model of effective school feedback: School heads’ point of view. Educational Research and Evaluation, 13(4), 311-325. Van den Berg, R., & Ros, A. (1999). The permanent importance of the subjective reality of teachers during educational innovation: A concernsbased approach. American Educational Research Journal, 36(4), 879-906. Vanhoof, J. (2007). Zelfevaluatie binnenstebuiten: Een onderzoek naar het verloop en de kwaliteit van zelfevaluaties in scholen [Self-evaluation inside out: A study on the proceeding and quality of self-evaluations in schools]. Mechelen: Wolters-Plantijn.
85
Chapter 3
Vanhoof, J., & Van Petegem, P. (2007). Matching internal and external evaluation in an era of accountability and school development: Lessons from a Flemish perspective. Studies in Educational Evaluation, 33(2), 101-119. Verhaeghe, G., Verhaeghe, J.P., Vanhoof, J., & Valcke, M. (2009). The valueadded results of schools: How to represent school feedback information? Manuscript submitted for publication. Visscher, A.J. (1996). The implications of how school staff handle information for the usage of school information systems. International Journal of Educational Research, 25(4), 323-334. Visscher, A.J. (2002). A framework for studying school performance feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 41-71). Lisse: Swets & Zeitlinger. Visscher, A., & Coe, R. (Eds.). (2002). School improvement through performance feedback. Lisse: Swets & Zeitlinger. Visscher, A., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321-349. Williams, D., & Coles, L. (2007). Teachers' approaches to finding and using research evidence: An information literacy perspective. Educational Research, 49(2), 185-206.
86
CHAPTER 4 VALUE-ADDED RESULTS OF SCHOOLS: HOW TO REPRESENT SCHOOL FEEDBACK INFORMATION
87
Chapter 4
CHAPTER 4: VALUE-ADDED RESULTS OF SCHOOLS: HOW TO REPRESENT SCHOOL ∗
FEEDBACK INFORMATION
Abstract The use of data for school improvement purposes has recently gained research interest. In order to use school performance feedback (SPF) effectively, it is necessary to interpret feedback information correctly. However, systematic research on this topic is scarce. Therefore, the present experimental study was set up to examine the effectiveness of various modes of explaining and representing the statistical concepts ‘value added’ and ‘learning gain’ in SPF reports. The results indicate that non-statistically skilled people encounter interpretation difficulties, especially in deriving value-added scores and for complex conceptual questions. This delineates the importance of developing effective SPF systems and support initiatives.
∗
Based on Verhaeghe, G., Verhaeghe, J. P., & Valcke, M. (2010). Value-added results of schools: How to represent school feedback information. Manuscript submitted for publication in The Journal of Educational Research.
88
Chapter 4
1. Introduction Over the last decades, governmental bodies require schools to be accountable for their educational quality in return for school autonomy (Nevo, 2002). Schools are not only expected to gain insight into their input and process characteristics, but also to link these to their output. Therefore, schools are required to systematically gather data on their functioning for self-evaluation purposes. In this context, school performance feedback (SPF) is an important source of information on pupil performance. It also helps the school to detect the extent to which it contributes to pupils’ performance levels. The correct interpretation of school feedback information is a crucial condition for effective feedback use (Visscher, 2002; Visscher & Coe, 2003). The interpretation phase is one of the most important phases in the process of using feedback and requires a considerable amount of time, skills, and effort (Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010). An examination of existing SPF systems and their related literature reveals that research on user comprehension is scarce (Schildkamp & Teddlie, 2008). Few studies have examined the effectiveness of the various modes of explaining and representing data in school feedback reports. This is problematic considering the fact that SPF reports use complex concepts, such as learning gain and value added, whilst SPF users (i.e., school principals and teachers) are often not statistically skilled (Earl & Fullan, 2003; Kerr, Marsh, Ikemoio, Darilek, & Barney, 2006; Saunders, 2000; Williams & Coles, 2007). In this study we explore whether alternative modes of explaining and representing SPF information have a differential impact on user comprehension in a group non-statistically skilled people. The results of this study are expected to contribute to a better understanding of the way the target group (i.e., mainly school principals and teachers) interprets SPF information. The results are also expected to direct the design and development of a new SPF system. Below we outline the central concepts investigated and present our theoretical frame on how alternative ways of presenting complex information influence users’ understanding. Following this, research questions and hypotheses are presented. 1.1 School Performance Feedback School performance feedback is conveyed to its users through school performance feedback systems (SPFSs). Visscher and Coe (2002) define SPFSs as “Information systems external to schools that provide them with 89
Chapter 4
confidential information on their performance and functioning as a basis for school self-evaluation” (2002, p xi). SPFSs primarily aim at supporting school improvement and internal quality policy, which distinguishes them from school accountability systems. These feedback initiatives are complementary to central examination results (provided by government agencies) and to school and class data. SPFSs contribute to the creation of information-rich environments essential for schools in their data-driven decision making process. These data provisions serve schools in their role as learning organisations that continuously monitor and improve their quality policy (Hofman, Dijkstra, & Hofman, 2009; Leithwood, Aitken, & Jantzi, 2006). Examples of available SPFSs or related projects are Performance Indicators in Primary Schools (PIPS; Durham University, Centre for Evaluation and Monitoring, n.d.), and the VAS [Monitor and Advice System] (CiTO [Central Institute for Test Development], n.d.). In some cases, SPF is also provided to schools in return for their participation in large scale research projects. This approach is adopted by Flemish researchers in relation to the Progress in International Reading Literacy Study (Katholieke Universiteit Leuven, Centre for Educational Effectiveness and Evaluation, 2008), the Programme for International Student Assessment, and the Trends in International Mathematics and Science Study. The present study focuses on the representation and understanding of two statistical concepts that can be considered as key concepts in school feedback reports: learning gain and value added (Saunders, 2000). In this study, learning gain (or gain score) is defined as the progress of a learner in a certain knowledge domain. This can be considered as the difference between the test scores of an individual at two different moments, on the condition that the same scale is used for both tests. The latter implies that the same test is presented twice or different tests are IRT (Item Response Theory) calibrated. For example, PIPS Reception builds on the administration of the same test at the start and at the end of the school year while the VAS calculates the learning gain on the basis of skill scores, estimated by the performance on different tests at three measurement occasions. Value added refers to the extent to which the school has contributed to the achievement progress of its students (Organisation for Economic Cooperation and Development, 2008). It can be operationalised as the school level residual, after adjusting for the effects of student background characteristics. Two different approaches to explaining the concept of value added can be discerned in SPF reports. Both approaches to explaining the concept are used in an overview publication of the OECD in relation to 90
Chapter 4
value-added modelling (OECD, 2008), but are not clearly distinguished or further refined. The first approach is based on the notion of expected values. Based on students’ background characteristics and their prior achievement, a certain achievement level can be expected. Students’ actual/observed achievement level can be higher or lower than the expected/predicted achievement level, because of measurement error, uncontrolled individual differences, or differences between schools. Averaging the differences between expected achievement and observed achievement within a particular school will principally (i.e., when there is a sufficient number of students) cancel out measurement error and the impact of individual differences. What is left is the shared value added for all students attending the same school. This approach is applied in the feedback reports of the Centre for Evaluation and Monitoring and in the Value Added National Project (FitzGibbon, 1997). The second approach to explaining the concept of value added starts from the notion of adjusted achievement level. The adjusted achievement level represents the achievement level a learner in a particular school would have had if his or her input characteristics and prior achievement were equal to the reference group, i.e., the “average” school. If a school does not differ from an average school in its contribution to students’ learning, its mean adjusted achievement level will be equal to that of the average school. In that case, the difference between the school mean of the students’ adjusted achievement scores and the mean of the average school will be zero. However, if there is a difference, the school’s contribution to students’ learning (value added) is higher or lower than in the average school. This approach is used in the PIRLS reports in Flanders. It should be noted that these two approaches refer to exactly the same statistical procedure and resulting regression equation. They only differ in the underlying mathematical operation used to isolate the school level residual from that equation, as represented in Table 1.
91
Chapter 4 Table 1 Regression equations of two approaches determining the school’s value added Approach 1 – Expected means Basic multilevel regression equation
Observed school mean is given by
Stripping the school level residual gives the predicted or expected school mean
Subtracting predicted mean from raw mean yields
From which follows
Meaning:
Value added = observed mean – expected mean
Approach 2 – Adjusted means Basic multilevel regression equation
Observed school mean is given by
Stripping the effects of student input characteristics (= setting them equal to reference group)
Yields adjusted school mean
,
From which follows
,
Meaning:
Value added = adjusted mean – grand mean
To our knowledge, the differential impact of these two modes of explaining value added on SPF users’ understanding of the concept has not yet been studied. Not much research has been carried out on the way SPF is interpreted by its users or how this interpretation is influenced by the representational format of SPF reports. However, since school feedback builds on numerical information, statistical concepts, and graphical
92
Chapter 4
representations, literature on graph design and interpretation is used as a starting point for the design of this study. 1.2 Representation of complex quantitative information Numerous studies have examined the way graphical representation of numerical information is understood (e.g., Kosslyn, 2006; Leinhardt, Zaslavsky, & Stein, 1990). Research has also examined the representation of textual information in combination with multimedia representations, such as illustrations and graphs (e.g., Mayer, 2001; Mittal, Carenini, Moore, & Roth, 1998; Schnotz & Bannert, 2003), in relation to task performance. We do not present an overview of this field of research, but build on the available evidence about design principles derived from these studies. We refer to these principles when describing the experimental design of the present study. Principles of effective graph design An interesting overview study that summarises best practices in graph design is the work of Kosslyn (2006). According to this author, many graphs are not satisfactory because they do not adequately consider the aims, needs and competences of the user. Based on research in perceptual and cognitive information processing, Kosslyn proposes eight design principles: (1) The principle of relevance. This concerns limiting or reducing the amount of information presented: Only the information necessary to get the message across must be represented. (2) The principle of appropriate knowledge. This means tailoring information to the prior knowledge of the user. (3) The principle of salience. Since it is crucial to attract the audience’s attention, this principle stresses making large perceptible differences in information presentation. (4) The principle of discriminability. The graphical representation should enable users to distinguish between different pieces of information. (5) The principle of perceptual organisation. This refers to the tendency of users to group together perceptual elements and to remember these groups better than isolated elements. Furthermore, Kosslyn recommends promoting understanding and memory: (6) the principle of compatibility stresses the importance of the compatibility between form and meaning, and (7) the principle of informative changes indicates that readers interpret any changes in displays as conveying information (e.g., changing the colour, adding lines). Finally, (8) the principle of capacity limitations, addresses users’ limited capacity to retain and process information. Mayer (2001) also stresses these limitations in his 93
Chapter 4
cognitive theory of multimedia learning. In this context, Sweller, van Merriënboer, and Paas (1998) describe the concepts of intrinsic and extraneous cognitive load. Intrinsic cognitive load refers to the difficulty inherent to instructional materials. The degree of intrinsic cognitive load depends on the element interactivity, or the number of elements simultaneously manipulated in one’s working memory. While intrinsic cognitive load is related to the material being learned, extraneous cognitive load refers to the instructional design. This concerns the execution of cognitive activity that is redundant to the purpose of the task (Chandler & Sweller, 1991). Overtaxing the user’s working memory is caused by ineffective presentation of the materials. For example, when accompanying text and illustrations are presented separately or inappropriately, the reader has to invest extra cognitive effort to integrate the information. Both types of cognitive load are additive, but only extraneous cognitive load can be altered and prevented by the design of the learning material. Misconceptions in interpreting graphs Next to research on data representation, our study also builds on previous research on the interpretation of graphs and the common misconceptions of inexperienced users. Smith III, diSessa, & Roschelle (1993) define a misconception as “a student’s conception that produces a systematic pattern of errors” (p.119) that arises from the student’s prior learning. This prior learning can follow from formal instruction (Smith III, diSessa, & Roschelle, 1993), general knowledge, or intuition (Leinhardt, Zaslavsky, & Stein, 1990). Alternative terms that are used to depict students’ misconceptions include preconceptions, alternative conceptions, naïve beliefs, alternative beliefs, alternative frameworks, naïve theories, and systematic errors (Mevarech & Kramarsky, 1997; Smith III, diSessa, & Roschelle, 1993). All terms refer to students’ conceptualisations that differ from the accepted or intended meaning of the instructed concepts. From a constructivist point of view, misconceptions can be considered as the incomplete acquisition of expert knowledge in a learning process, rather than mistakes that impede learning (Smith III, diSessa, & Roschelle, 1993). Clement (1988) and Leinhardt, Zaslavsky, and Stein (1990) present a review of common misconceptions that occur when people interpret graphs. These include the slope-height confusion, confusion in responding with a point or an interval, and mistaking a graphical as an iconic representation (e.g., uphill and downhill for a rising and a descending curve). These misconceptions are also mentioned in the studies of Beichner (1994), Mevarech and Kramarsky (1997), and Kramarski (2004). 94
Chapter 4
SPF systems have not yet been incorporated in Flanders’ school system which means that feedback users are inexperienced. Given that SPF contains several complicated concepts (e.g., value added and learning gain) and forms of representation (e.g., growth curves), we expect feedback users to make several mistakes when interpreting the information. In this study we explore which difficulties are encountered by statistically unskilled participants when trying to understand learning gain and value added. In this study we try to prevent interpretation difficulties by using effective graph design principles in the SPF reports. It is therefore important to examine what ways of representing information in the SPF report are effective, taking into account the characteristics of its users. Individual differences Previous research has explored the interaction between representation formats and individual characteristics. The effect of representation forms on task performance appears to depend on learning styles, individual preferences (Dekeyser, 2001), differences in ability (Mayer, 2001; Tapiero, 2001), and prior knowledge (De Westelinck, Valcke, De Craene, & Kirschner, 2005; Mayer, 2001; Shah & Hoeffner, 2002). Prior knowledge also appears to be an important factor in determining the degree of intrinsic cognitive load. Since more experienced users are able to handle higher element interactivity they experience a lower degree of intrinsic cognitive load (Sweller, van Merriënboer, & Paas, 1998). Furthermore, a recent study was carried out on the interaction between learner characteristics and hypermedia learning, cognitive load, and information utilisation strategies (Scheiter, Gerjets, Vollmann, & Catrambone, 2009). The results of that study indicate that characteristics such as positive attitudes towards mathematics, more complex epistemological beliefs, higher prior knowledge, and better cognitive and metacognitive strategy use have a positive influence on these outcome variables. This study will incorporate several individual variables that are expected to have an influence on feedback users’ understanding of the information. 1.3 Research questions and hypotheses The present study examines the extent to which non-statistically skilled users understand the explanations and representations of the concepts learning gain and value added. We focus on feedback users’ conceptual and procedural (i.e., deriving information from graphical representations) understanding of these terms. The level of understanding is indicated by 95
Chapter 4
the number of misconceptions and other mistakes made during the interpretation of the information. Since the feedback users participating in this study are inexperienced in interpreting SPF reports, we expect them to have misconceptions and interpretation difficulties with the complex conceptual and graphical information (Hypothesis 1). Interpreting information accurately largely depends on the instructional design of the learning material. Two modes of explaining the term value added have been discussed above. Since no research has been carried out on these two modes of explanation, we cannot formulate any expectations with regard to the differential impact they may have on feedback users’ conceptual and procedural understanding of the SPF report. We therefore examine this in an exploratory way (Research question 1) Adding representations to textual information can support the interpretation of the information presented, as indicated in the theory of multimedia learning (Mayer, 2001). Analogously, we test the hypothesis that graphical representations with supporting information that is supposed to facilitate interpretation are more favourable for successful feedback interpretation than basic representations (Hypothesis 2). As individual learner characteristics have an effect on task performance and cognitive load, we take these differences into account and expect them to serve as significant control and/or moderator variables.
2. Method 2.1. Design A 2x3 factorial experimental design with post-test was used. Two variants of explaining the concept of value added were combined with three alternatives to represent learning gain and value added (for a schematic overview, see appendix A). 2.2. Participants The target audience for SPF consists of school principals and teachers in primary and secondary education. However, since no SPF system is currently available in Flanders, a study was set up involving first year students in educational sciences (N = 312, mean age 19.33 years, SD 1.69, 88% women) at the Ghent University. Not all participants started educational studies without prior study experience as some had already obtained a professional bachelor degree in a different subject area (n = 62, 96
Chapter 4
mean age 21.98 years, SD 1.14, 81% women; versus freshmen, n = 250, mean age 18.75 years, SD 1.16, 90% women). Students participated in this experiment as a formal part of their study programme. They subscribed individually for one of the eight parallel experimental sessions. 2.3. Material SPF Tutorial - Experimental conditions Participants were randomly assigned to one of the six different experimental conditions. In each condition, they received a specific version of an SPF report via PowerPoint presentation. This medium was used as it enables a controlled stepwise instruction for each participant at his/her own pace and simulates the electronic version of the SPF that schools receive. Each of the six presentations consisted of approximately 40 slides. First an introduction and slide overview was presented, followed by pie graphs representing the background characteristics of the fictitious school population. Next, an explanation of the concept learning gain was presented, which was the same in all experimental conditions. Third, the definition and estimation of value added was shown. Each feedback report presented the average growth curve from grade one to grade six of a cohort of pupils and presented the school’s value added for one single subject. The horizontal axis of the line graphs indicated time and measurement moments; the vertical axis indicated the mean skill score, as represented in Figure 1.
Figure 1. Example of growth curve as used in the present study. 97
Chapter 4
The design of the tutorials builds on Kosslyn’s principles of relevance, appropriate knowledge, and capacity limitations (2006). As few slides as needed were used to clearly explain the concepts learning gain and value added. Therefore, only limited information was given about the underlying statistical analyses, such as Item Response Theory and regression analysis. Furthermore, captions were added piecewise to graphs, consistent with Mayer’s theory of multimedia learning (2001). Spatial contiguity was respected to promote the integration of captions and illustrations, and to prevent extraneous cognitive load (Chandler & Sweller, 1991; Sweller, van Merriënboer, & Paas, 1998). For the design of the growth curves, different colours for lines and different symbols for points were used, following the principle of discriminability and salience (Kosslyn, 2006). The difference between alternative presentations was based on the way value added was explained (Research question 1). In half of the conditions, value added was presented as the difference between the school mean of the students’ adjusted achievement and the mean of the “average school” (i.e., the reference group). In the other conditions, value added was explained as the difference between the average expected achievement and observed achievement within the fictitious school. In addition, three different presentation formats were used in the target group: (1) a baseline version building on text and graphs, and two elaborated versions enriched with either (2) tables, or (3) symbolic representations of the underlying statistical concepts. For the basic version, we opted for text explanation and growth curves, since that is a common way to represent longitudinal data. Two additional conditions were created by adding representations that are supposed to support the knowledge construction of the learners (Hypothesis 2). First, we opted to add cross tables to the basic version to support the use of prior knowledge, since the target audience is acquainted with this form of representation from daily use. The second elaborated version was based on symbolic representations to explain the simplified regression equations (without detailed equations as in Table 2). Schematic representations were used, but the variable names were written in full instead of using Greek symbols (see Figure 2). In this way, Kosslyn’s principle of appropriate knowledge (2006) was respected. This form of representation was expected to foster a more deeplevel understanding of the value-added estimation procedure, resulting in higher performance scores.
98
Chapter 4
Figure 2. Example of symbol representation as used in the present study.
Performance test An online post-test test was developed to measure respondents’ conceptual understanding of the SPF and their procedural skills in deriving information from graphs and tables (Anderson, Krathwohl, Airasian, Cruikshank, Mayer, Pintrich, et al., 2001). The post-test test consisted of two parts. In the first part (closed version) students were not allowed to look lo back at the feedback report, whilst they could do so in the second part (open version). In reality, principals and teachers are always able to check the feedback report, as mirrored in the open version of the test. Nevertheless, the closed version was used sed to determine the potential differential learning effects of alternative representations. Since we can expect a carry over effect of taking the first test on the results of the second test, half of the respondents only participated in the second test. Building B on the different experimental conditions and test approaches, 12 different test groups can be distinguished with the number of participants in each group varying between 25 to 28 (see Appendix A for a schematic overview). As we found no previous research search that tests SPF users’ comprehension of value added and learning gain, we developed a test for this study. To do so, we created a framework which included all of the different cognitive tasks that have to be performed to correctly interpret the feedback feedb information. Using this framework, suitable test items were developed varying in degree of difficulty. Out of this list of closed items (true-untrue (true and multiple choice) and open items (filling in digit values), a test was composed that could be completed comple in 30 minutes. Tests consisted of conceptual and procedural items on two central concepts in the SPF report: • 6 items referring to conceptual knowledge of learning gain: For example: “Learning gain is the extent to which pupils progress in a certain skill ski domain. (true-untrue)”
99
Chapter 4
• 5 to 6 items referring to the conceptual knowledge of value added: For example: “Value added can only be determined if you know the input characteristics of pupils. (true-untrue)” • 13 items referring to the reading off learning gain from graphs and tables: For example: “Look with close attention to these growth curves. Then, complete the blanks in the table (score school 1 at start grade 1, score at end of grade 1 and learning gain).” • 17 to 18 items referring to the derivation of the value-added scores from graphs and tables: For example: “Look with close attention to these growth curves. Determine if these statements are true or untrue. School 1 reached a higher value-added score than school 2 in grade 4.” See also Figure 4 for a simplified example of deriving value-added scores from growth curves (in this test, the growth curves represented 6 grades and 2 schools were presented simultaneously). A section of each post-test was designed in accordance with the nature of the experimental condition. This means that the terminology and curves were adapted to the way in which the concept of value added was explained, either in terms of adjusted or expected means. The psychometric quality of the test was checked by first converting the scores on the different test versions to one common scale, applying a threeparameter IRT model. Seven test groups were defined in function of the different test variants. A satisfying overall fit was found for 111 of the original 127 items (LR = 508.9, SE = 556.0, p = .92). The number of bad fitting items did not exceed the number to be expected based on coincidence. The empirical reliability of the tests varied from .80 to .90 depending on the test version and test group, with the exception of .72 for one particular test group (Mdn = .84). Exploration of a two-parameter IRT model resulted in comparable results, but 4 extra items had to be removed from the calibration to attain a good overall fit (LR = 535.4, SE = 549.0, p = .65). The correlation between skill scores resulting from the three- and twoparameter model is r = .95 (p < .001). The three-parameter model was finally preferred, because the IRT scores had a slightly better normal distribution. Short survey of learner characteristics In addition to the post-test, data was gathered on characteristics of the participants. As discussed in the theoretical framework, individual differences must be taken into account when designing studies on data representation. However, since the instruction and testing time in this experiment was limited, we could not use elaborate measurement 100
Chapter 4
instruments (as, for example, in Scheiter, Gerjets, Vollmann, & Catrambone, 2009). We therefore selected a number of indicator variables. To take into account differences in prior knowledge of the participants in this study (De Westelinck, Valcke, De Craene, & Kirschner, 2005; Mayer, 2001; Shah & Hoeffner, 2002; Sweller, van Merriënboer, & Paas, 1998), data was collected for the following variables: • their study program: freshmen or students with a prior bachelor degree • the number of hours of mathematics per week in the last and second last year of their secondary education, and • their mathematics exam score at the end of their secondary education (in %) As an indication of attitudes towards statistics (e.g., Scheiter, Gerjets, Vollmann, & Catrambone, 2009), an item was included measuring the degree to which participants like statistics. This was measured on a 7-point Likert scale ranging from 1 (totally dislike) to 7 (like it very much). As an additional moderator variable, we included the participants’ perceived clarity of the feedback report, since it is plausible that not all respondents experienced the clarity of the instruction material equally. This was also measured on a 7-point Likert scale, ranging from 1 (very unclear) to 7 (very clear). 2.4. Procedure The entire population of freshman of educational sciences was invited to schedule their participation for an experimental session by means of a learning management system. A maximum of 50 students could participate in each session, which was set up in a computer lab. Following a brief introduction participants were asked to assume the fictitious role of a school principal who had received a school performance feedback report on the results from a longitudinal study that their school had participated in. Participants were asked to imagine that the pupils of their school had been monitored over the six years of primary school education and had been tested on seven occasions. Participants were asked to read at their own pace the school feedback report presented to them by a PowerPointpresentation. At the end of the presentation, they were asked to click on the link to the online test and short survey. If their computer screen turned orange, they had to take the test without looking back at the presentation (closed version). If the screen turned green, they received a printed version of their school feedback report to guide them when answering the open version of the post-test. Students were told they had approximately 90
101
Chapter 4
minutes to read through the materials and to complete the post-test and short survey. 2.5. Analyses An explorative analysis of the descriptive data was carried out to screen response patterns in relation to the content, degree of difficulty, and the nature of the questions. Therefore, scatter plots were used to represent the locations of the items in terms of the item type (conceptual – procedural) and item content (learning gain – value added). The item location is an IRT parameter that is related to the percentage of correct answers (r = -.90 in this study) and gives an indication of the item difficulty level: the lower an item is located, the more participants scored correctly on that item, which indicates a lower level of the difficulty. To clarify the meaning of these item locations, the percentages of correct answers for these items are reported below. In addition, an error analysis was performed by first listing all possible answers, then revealing error patterns, and finally reconstructing participants’ reasoning processes. First, the potential difference between the scores on the open and closed test was examined using a t-test. Depending on this result, further analyses were performed with either only the open test (if no difference) or both tests (in case they differ). The differential impact of experimental conditions was checked on the basis of univariate analyses of covariance, controlling for potentially confounding variables (mathematics level, degree of liking statistics, current study program, etc.). Differences were tested with respect to the IRT test scores. Additionally, pairwise comparisons were executed to determine the differences within the categories of significant factors. Furthermore, relevant moderator effects were included in addition to the main effects to get a more nuanced view on the predictors in the model. These moderator effects were added to the model stepwise. For this study, the students’ mathematics exam scores in the last year of secondary education were brought into interaction with the hours of mathematics they received per week. Furthermore, the moderating relation between value-added explanations and graphical representation modes was examined. Finally, the model examined the interaction effect between the value-added conditions and the perceived clarity of the presentation. Assumptions were checked for the analysis techniques, i.e., homogeneity of regression slopes and of residual variances, which confirmed that no assumption had been violated.
102
Chapter 4
3. Results and discussion answer 3.1. Descriptive statistics of correct answers No differences were found between the results of the open and closed versions of the test (t(153) (153) = .322, p = .748). Consequently, the scores obtained for the open version of the test were used in the subsequent analyses. Descriptive statistical analysiss reveals that students did not experience difficulties in reading exact values from the tables and graphs (more than 85% of the answers correct), or in calculating learning gains (more than 75% correct). To illustrate the spread of the items, the panel display di in Figure 3 shows the frequencies of the standardised item locations in relation to the item type and item content.
Figure 3. Panel display of standardised item locations in relation to item type and content.
For example, the upper right panel shows no items exceeding a standard deviation above zero, indicating a mean item location for procedural learning gain items. This implies that test items that required participants to derive learning gain values from graphs gra and tables were not more difficult than the mean test difficulty level. In contrast, the most difficult items appear in the procedural value-added added panel. On average, only 35% of the respondents were able to derive value-added value scores from the graphs presented sented in the feedback report. Reading off value-added value scores requires 103
Chapter 4
the comparison of data (heights and slopes) from different growth curves (e.g., the school’s adjusted growth curve and the “national” average growth curve or the school’s expected growth curve and the school’s observed growth curve). In contrast, deriving the average learning gain over a certain period only requires the examination of one growth curve (see Figure 4). Calculating value added thus requires extra processing, possibly causing cognitive overload in the working memory due to high element interactivity (Sweller, van Merriënboer, & Paas, 1998).This difference in mental effort, related to intrinsic cognitive load, may explain the lower scores for procedural value-added items in comparison to the learning-gain items.
Figure 4. Example of deriving learning gain and value added from growth curves. For deriving learning gain, the difference in skill scores between two points of the same curve must be calculated. For example, the learning gain of this school in the first grade is 50 - 35 = 15. For reading off value-added results, two curves need to be compared and a geometrical translation need to be performed. Before subtracting the end points, the starting points of the curves must coincide. For example, the value added of this school in the second grade is - 10.
Examining the nature of the errors in calculating value added, patterns can be observed for the incorrect answers. This enables us to reconstruct the thinking process of participants and to identify certain misconceptions. Typical errors made when calculating value added are (1) comparing the wrong growth curves; (2) ignoring the difference in starting points of the 104
Chapter 4
curves before subtracting (no geometrical translation was performed); (3) confusing the calculation process of value added and learning gain: (4) using the wrong signs (+/-); and (5) confusing the heights of curves with their slopes. This last misconception, called the slope-height confusion, has been reported in earlier studies (Beichner, 1994; Clement, 1989; Kramarski, 2004; Leinhardt, Zaslavsky, & Stein, 1990). Respondents mostly gave correct answers to the conceptual questions related to the information that was literally explained in the school feedback presentation (87% correct answers). In contrast, low test scores were observed when the questions required deep level conceptual learning (24% correct answers). For example, the statement that “The learning gain of pupils can be calculated by tests with the same maximum score” was incorrectly classified as ‘true’ by 86% of the participants. During the instruction period, participants were told that learning gain can only be calculated if both tests are on the same scale, by IRT calibration or by taking the same test twice. Only 2% of the participants who received the item “To estimate a school’s value added, you first have to adjust for school characteristics,” answered it correctly. This indicates that for these participants either the difference between school and input characteristics was not clear or they just did not notice the difference in this sentence. 3.2. Differences between conditions The results of the analyses of covariance in Table 2 show significant differences in test scores in relation to the way value added was explained. The school feedback report that explained value added in terms of expected means resulted in higher performance (see Table 3 for descriptive statistics; t(298) = 2.283, β = -.536, p = < .05). But the effect sizes are limited as the explained variance in test scores is 2.1% (partial η2). A pairwise comparison of the presentation modes reveals significant differences between the test scores for the basic SPF version and the elaborated SPF version using tables, in favour of the basic version (∆ = .269, SE = .117, p <.05, partial η2 = 1.7 %). This finding contradicts Kosslyn’s (2006) theory on the advantage of observing the design principle of appropriate knowledge. An explanation of this finding could be found in the structure mapping hypothesis (Schnotz & Bannert, 2003). This hypothesis assumes that adding representations is not beneficial in all cases but is dependent on the kind of task being carried out. In this sense, tables may not have been helpful in solving the tasks presented in this study because their structure does not facilitate the construction of a task-appropriate mental model. Indeed, adding tables may have been inappropriate for illustrating 105
Chapter 4
trends in the data, since tables might be more appropriate for determining exact numbers (Meyer, Shinar, & Leiser, 1997). Therefore, adding tables that were not in accordance to the different task purposes may have caused extraneous cognitive load (Chandler & Sweller, 1991). Table 2 Results of analysis of covariance for IRT test score
Corrected model Explanation mode value added (E) Presentation mode (P) ExP Study Program Degree of liking statistics Hours of math sec. education (H) Math exam score sec. education (S) HxS Perceived clarity of presentation (C) ExC Error 2 Note. Adj. R = .15 for IRT test score *p ≤ .05. ** p ≤.01
df 13 1 2 2 2 1 1 1 1 1 1 298
Test score F p 5.221 .000** 6.238 .013* 2.648 .072 .738 .479 2.728 .067 1.228 .269 .501 .480 .714 .399 .024 .878 10.921 .001** 4.778 .030*
Table 3 Numbers, means and standard deviations of IRT test scores for the six conditions n M SD Explanation mode value added Adjusted scores 154 -.070 .950 Expected scores 158 .068 .829 Presentation mode Basic version 102 .131 .974 Table version 103 -.077 .852 Symbol version 107 -.051 .841
Regarding the influence of individual differences on the test score, only the perceived clarity of the presentation appears to be significant, both as a main effect and as a moderator effect in interaction with the value-added explanation mode (t(298) = 2.186, β = .487, p = < .05). The direction of this interaction effect shows that the perceived clarity of the presentation is even more important when value added is explained in terms of adjusted means than in terms of expected means. In other words: “The more clear a presentation is perceived, the higher the IRT test score,” holds more when value added is explained by adjusted than by expected means.
106
Chapter 4
4. General discussion and conclusion 4.1. Interpretation of SPF in the present study Since school performance feedback aims at contributing to internal school quality policies, it is important that the target audience develops a good understanding of the information offered. The results of the present study reveal that at least one of the most widely used concepts in school performance feedback, the concept of value added, is not well understood by non-statistically skilled people. The results from our experiment indicate that there is a lack of procedural and deep conceptual understanding of this function. Even when comprehensive information was provided to participants in the experimental setting, the conceptual basis of value added was too complex for statistically unskilled people to master. These findings confirm our first research hypothesis that users’ would have difficulty interpreting complex conceptual and graphical information due to interplay between the inherent complexity of SPF and a lack of prior knowledge of the respondents. This interplay causes intrinsic cognitive load (Sweller, van Merriënboer, & Paas, 1998), interpretation difficulties, and misconceptions (e.g., slope-height confusion, see Beichner, 1994; Clement, 1989; Kramarski, 2004; Leinhardt, Zaslavsky, & Stein, 1990). We compared the two explanations and representations of value added in terms of their differential effect on participants’ understanding of the concept. This proved to be helpful in detecting which explanation of value added facilitated better conceptual and procedural understanding. Explaining this concept in terms of the difference between observed and expected growth appears to be better than explaining it in terms of the difference between the school’s adjusted growth curve and the reference growth curve. However, the effect size of the observed significant differences is rather small. While more research is needed to confirm these findings, they serve as a point of reflection for the designers of school feedback systems. In terms of the graphical representations used in our experiment, it is rather surprising that the tables did not add to users’ understanding of the feedback report. However, this does not imply that the use of tables in combination with growth curves is not advisable. Previous research indicates that different information is derived from tables and graphs (Meyer, Shinar, & Leiser, 1997); both sources of information have merits, depending on the task being performed (Schnotz & Bannert, 2003). An appropriate use of tables and graphs can avoid extraneous cognitive load and foster correct understanding. 107
Chapter 4
4.2. Strengths and limitations Earlier studies which examined school feedback reports expressed concern for the accuracy of feedback users’ interpretation of information, but were not able to pinpoint what was being misunderstood (Earl & Fullan, 2003; Kerr et al., 2006; Saunders, 2000; Williams & Coles, 2007). The present study allows us to develop a more detailed understanding of what is misunderstood when interpreting learning gain and value added from SPF reports. The use of IRT techniques appears to deliver detailed information both on the item parameters and on respondents’ scores. This allows SPF developers to examine interpretation difficulties in detail and to adapt SPF representation forms for clients that are statistically unskilled. Furthermore, this may inspire feedback providers to set up support initiatives. In our experimental study particular concepts were studied in a controlled setting. However, participants were not genuine feedback users. This feedback experiment must therefore be considered as a first attempt to test the understanding of diverse modes of explaining SPF concepts. These results require further examination in future research. It is possible that the experimental tasks in this study placed too high a demand on the participants, in that they were expected to derive and calculate value added from graphical representations or to indicate what conclusions can be drawn from the tables and the growth curves presented. This is also a point of discussion for feedback providers: Is it necessary to expect SPF users to master the basic principles of deriving value added and learning gain scores from representations or should SPF reports be simplified? Providing more technical information to users also implies more complex interpretations of the SPF information. 4.3. Implications for future research Our findings in relation to the different modes of explaining the concept of value added need further confirmation. The different modes of explaining and representing SPF concepts and their influence on users’ understanding can be tested in a number of ways. IRT techniques appear to be useful in this regard and can be applied in less controlled settings, such as in quasiexperimental designs. This would provide more detailed item information and could support the external validity of our findings. An alternative way of testing value added conceptions is to interview the feedback users. This method has provided useful results in previous research (Santelices & Taut, 2009; Saunders, 2000). However, more in-depth analyses, such as videotaping feedback users as they explain their understanding of the 108
Chapter 4
representations and concepts in SPF reports, may provide more insight into the reasoning process of respondents. The individual differences and preferences that influence feedback users’ understanding of the SPF data require further attention. It is important to explore whether SPF developers should introduce feedback reports that are more flexible in terms of form and content, i.e., tailored to the individual user (Visscher & Coe, 2003). This study points at the importance of how respondents perceive the SPF variables and data. Indeed, it is often not the feedback characteristics as such, but the perception of them that determine how the data will be used (Verhaeghe et al., 2010; Visscher, 2002). Therefore, valid measures of users’ perception of SPF variables should be developed. 4.4. Implications for practice It is quite likely that the misconceptions observed in this study also occur in school practices when interpreting school performance feedback. Therefore, these findings underpin the importance of carefully examining the interpretability of feedback reports. Feedback developers should adapt the mode of explaining the concept of value added to the target audience; they should be aware of the prior knowledge of feedback users and should develop graphical representations that differ from those used in scientific publications. The presentation of information in SPF reports should be designed in line with the task to be performed (Kluger & DeNisi, 1996; Schnotz & Bannert, 2003). This implies that feedback reports should be designed according to the cognitive tasks that are necessary to understand the information. Many studies stress the role of support when dealing with school feedback (Bosker, Branderhorst, & Visscher, 2007; Earl & Fullan, 2003; Kerr et al., 2006; Saunders, 2000; Verhaeghe et al., 2010; Williams & Coles, 2007). If school performance feedback is expected to contribute to school improvement, attention must be given to the way users interpret the information.
109
Appendix A: Assignment of participants to the different conditions and test formats Note. Extra variation in the tests was added by developing parallel formats in case students subscribed in later sessions were influenced by colleagues of earlier sessions.
110
Chapter 4
References Anderson, L.W., Krathwohl, D. R., Airasian, P.W., Cruikshank, K.A., Mayer, R.E., Pintrich, P.R., et al. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Longman. Beichner, R.J., (1994). Testing student interpretation of kinematics graphs. American Journal of Physics, 62(8), 750-762. Bosker, R.J., Branderhorst, E. M., & Visscher, A. J. (2007). Improving the utilisation of management information systems in secondary schools. School Effectiveness and School Improvement, 18(4), 451-467. CiTO. (n.d.) Volg- en adviessysteem: Voor elke leerling de beste kansen [Monitor and advice system: The best chances for each pupil]. Retrieved December 1, 2008, from http://www.cito.nl/vo/vas/algemeen/eind_fr.htm Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of instruction. Cognition and Instruction, 8(4), 293-332. Clement, J. (1989). The concept of variation and misconceptions in Cartesian graphing. Focus on Learning Problems in Mathematics, 11, 7787. De Westelinck, K., Valcke, M., De Craene, B., & Kirschner, P. (2005). Multimedia learning in social sciences: Limitations of external graphical representations. Computers in Human Behavior, 21(4), 555-573. Dekeyser, H. M. (2001). Student preference for verbal, graphic or symbolic information in an independent learning environment for an applied statistics course. In J.F.Rouet, J.J. Levonen, & A. Biardieu (Eds.), Multimedia learning: cognitive and instructional issues (pp. 99-109). Oxford: Pergamon. Durham University, Centre for Evaluation and Monitoring. (n.d.). PIPS. Retrieved December 1, 2008, from http://www.cemcentre.org/RenderPage.asp?LinkID=22210000 Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge Journal of Education, 33(3), 383-394. Fitz-Gibbon, C.T. (1997). The value added national project: Final report. Feasibility studies for a national system of value added indicators. Hayes: School Curriculum and Assessment Authority. Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School selfevaluation and student achievement. School Effectiveness and School Improvement, 20(1), 47-68. Katholieke Universiteit Leuven, Centre for Educational Effectiveness and Evaluation. (2008). PIRLS: Begrijpend lezen vierde leerjaar: 111
Chapter 4
Schoolfeedbackrapport n.a.v. deelname aan het PIRLS 2006 – onderzoek [PIRLS: Comprehensive reading grade four: School feedback report in response to participation to the PIRLS 2006 study]. Retrieved December 1, 2008, from http://ppw.kuleuven.be/pirls/voorbeeldrapport.pdf Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H. & Barney, H. (2006). Strategies to promote data use for instructional improvement: Actions, outcomes, and lessons from three urban districts. American Journal of Education, 112, 496-520. Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254-284. Kosslyn, S.M. (2006). Graph design for the eye and mind. Oxford: Oxford University Press. Kramarski, B. (2004). Making sense of graphs: Does metacognitive instruction make a difference on students’ mathematical conceptions and alternative conceptions? Learning and Instruction, 14(6), 593-619. Leinhardt, G., Zaslavsky, O., & Stein, M.K. (1990). Functions, graphs, and graphing: Tasks, learning, and teaching. Review of Educational Research, 60(1), 1-64. Leithwood, K., Aitken, R, & Jantzi, D. (2006). Making schools smarter: Leading with evidence (3rd ed.). Thousand Oaks, CA: Corwin Press. Mayer, R.E. (2001). Multimedia Learning. Cambridge: Cambridge University Press. Mevarech, Z.R. & Kramarsky, B. (1997). From verbal descriptions to graphic representations: Stability and change in students’ alternative conceptions. Educational Studies in Mathematics, 32(3), 229-263. Meyer, J., Shinar, D., & Leiser, D. (1997). Multiple factors that determine performance with tables and graphs. Human Factors, 39(2), 268-286. Mittal, V. O., Carenini, G., Moore, J.D., & Roth, S. (1998). Describing complex charts in natural language: A caption generation system. Computational Linguistics, 24(3), 431-467. Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp 3-16). Oxford: Elsevier Science. Organisation for Economic Co-operation and Development. (2008). Measuring improvements in learning outcomes: Best-practices to assess the value-added of schools. Paris: OECD Publishing. Santelices, V., & Taut, S. (2009, September). Comprehension and use of value-added school performance indicators reported to teachers and parents. Paper presented at the European Conference on Educational Research, Vienna. 112
Chapter 4
Saunders, L. (2000). Understanding schools’ use of value-added data: The psychology and sociology of numbers. Research Paper in Education, 15(3), 241-258. Scheiter, K., Gerjets, P., Vollmann, B., & Catrambone, R. (2009). The impact of learner characteristics on information utilization strategies, cognitive load experienced, and performance in hypermedia learning. Learning and Instruction, 19(2009), 387-401. Schildkamp, K., & Teddlie, C. (2008). School performance feedback systems in the USA and in The Netherlands: A comparison. Educational Research and Evaluation, 14(3), 255-282. Schnotz, W., Bannert, M. (2003). Construction and inference in learning from multiple representation. Learning and Instruction, 13(2), 141-156. Shah, P., & Hoeffner, J. (2002). Review of graph comprehension research: Implications for instruction. Educational Psychology Review, 14, 47-69. Smith III, J. P., diSessa, A.A., & Roschelle, J. (1993). Misconceptions reconceived: A constructivist analysis of knowledge in transition. The Journal of the Learning Sciences, 3(2), 115-163. Sweller, J., van Merriënboer, J.J.G., & Paas, F.G.W.C. (1998). Cognitive architecture and instructional design. Educational Psychology Review, 10(3), 251-296. Tapiero, I. (2001). The construction and the updating of a spatial mental model from text and map: effect of imagery and anchors. In J.F.Rouet, J. J. Levonen, & A. Biardieu (Eds.), Multimedia learning: Cognitive and instructional issues (pp. 45-57). Oxford: Pergamon. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using school performance feedback: Perceptions of primary school principals. School Effectiveness and School Improvement, 21(2), 167-188. Visscher, A.J. (1996). The implications of how school staff handle information for the usage of school information systems. International Journal of Educational Research, 25(4), 323-334. Visscher, A.J. (2002). A framework for studying school performance feedback systems. In A. J. Visscher, & R. Coe (Eds.), School improvement through performance feedback. Lisse: Swets & Zeitlinger. Visscher, A.J., & Coe, R. (Eds.). (2002). School improvement through performance feedback. Lisse: Swets & Zeitlinger. Visscher, A., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321-349. Williams, D. & Coles, L. (2007). Teachers’ approaches to finding and using research evidence: An information literacy perspective. Educational Research, 49(2), 185-206. 113
CHAPTER 5 THE INFLUENCE OF COMPETENCES AND SUPPORT ON SCHOOL PERFORMANCE FEEDBACK USE
114
Chapter 5
CHAPTER 5: THE INFLUENCE OF COMPETENCES AND SUPPORT ON SCHOOL PERFORMANCE FEEDBACK USE
∗
Abstract Information-rich environments are created to promote data use in schools for the purpose of self-evaluation and quality assurance. However, providing feedback does not guarantee that schools will actually put it to use. One of the main stumbling blocks relates to the interpretation and diagnosis of the information. This study examines the relationship between data literacy competences, support given in interpreting the information, actual use of the feedback, and potential school improvement effects. A randomized field experiment with 188 school principals from primary education was set up and a posttest was used to investigate the effects of a support initiative. The results revealed that a minority of schools invested significantly in the interpretation and diagnosis of the school performance feedback (SPF), despite the fact that most of the respondents showed an interest in the SPF report. In addition, data competence support and the subsequent use of feedback were found to be limited.
∗
Based on Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in press).The influence of competences and support on school performance feedback use. Educational Studies.
115
Chapter 5
1. Introduction and research questions The growing autonomy of schools is going hand in hand with initiatives by education authorities to hold schools accountable for their approach to quality care (Nevo, 2002; Hofman, Dijkstra, & Hofman, 2009) and to create information-rich environments. Schools are given feedback on their functioning and performance via school performance feedback systems (Visscher & Coe, 2002; 2003). The use of such systems as a policy instrument is not a straightforward issue. School performance feedback (SPF) has turned out to be a necessary yet insufficient step as both the schools and the feedback systems have to meet certain requirements in order to actually use this in practice (Visscher & Coe, 2003; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010). Consequently, current research often reports disappointing results from school feedback use (Coe, 2002; Saunders & Rudd, 1999; Tymms, 1995; Schildkamp, Visscher, & Luyten, 2009; Van Petegem & Vanhoof, 2004; Verhaeghe et al., 2010; Zupanc, Urank, & Bren, 2009). One important obstacle is the lack of knowledge and skills needed to process the information. School principals are usually not trained in carrying out research, collecting data, data management or data interpretation. This lack of data literacy (Earl & Fullan, 2003) leads to valuable information often being neglected. Available research reveals a need to support school principals and teachers in both the interpretation and further use of the feedback data (Schildkamp & Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2009; Zupanc, Urank, & Bren, 2009). A second critical issue - rising from the research review – is the need to evaluate the impact of support initiatives related to the use of SPF (Zupanc, Urank, & Bren, 2009). Indeed, current support initiatives often lack empirical verification. And when evaluation initiatives have been set up, they often focus too much on the short-term effects, such as the satisfaction of participants, without considering the effects on the organization (Mathison, 1992; Rossi, Lipsey & Freeman, 2004). The present study aims at testing insights emerging from the current knowledge base against empirical information. Answers are sought to the following research questions: • How do schools use SPF (in terms of phases in use and types of use)? What are the effects of this use? • To what extent are variations in SPF influenced by data literacy competences? • To what extent does specific SPF support has an impact on the development of SPF competences, actual SPF use and resulting SPF effects? 116
Chapter 5
2. Theoretical framework In the following paragraphs, we first provide a theoretical framework used to investigate the use of SPF. We subsequently address the question of SPF effects. Finally, we focus on factors that are expected to influence the use of school feedback, in particular data literacy competences and SPF usage support. A visual representation of the theoretical framework is given in Figure 1.
Figure 1. Framework for SPF usage in the present study
2.1. The use of SPF: Phases in use and types of use Research shows that the process of SPF use in schools can be described in a variety of ways (e.g., Schildkamp, 2007; Schildkamp & Kuiper, 2010). The effective use of SPF implies a well-considered sequence of several consecutive phases in a cyclical process (Huffman & Kalnin, 2003; Learning Point Associates, 2004). In the process of school feedback use, Verhaeghe et al. (2010) distinguish between receiving, reading and discussing the SPF as a means to arrive at a correct interpretation. After the school has performed an analysis of its results, the next stage involves putting to use the information from the SPF, comprising a diagnosis that looks for explanations for the school’s results. Furthermore, SPF use refers to specific actions or changes in thinking and processes. With reference to available research on evaluation data and SPF use (Schildkamp, 2007; Schildkamp & Teddlie, 2008; Schildkamp, Visscher & Luyten, 2009; Weiss, 1998), this study focuses on the following types of use: an instrumental and conceptual use. In the case of a conceptual use, we centre on changes in the thinking of the feedback users (e.g. influences thinking in regard of how the pupils perform or how the school functions). In the case of an instrumental use, we examine reported 117
Chapter 5
changes in school policies. The way the feedback will be used (types of use) is expected to be correlated positively to the investment in the process of SPF use (phases in use). 2.2. Effects of SPF School performance feedback use will not automatically result in a significant improvement of pupil performance (Fitz-Gibbon & Tymms, 2002; Schildkamp, Visscher & Luyten, 2009). This underlines the importance of examining effects beyond the level of educational performance and giving sufficient attention to process-oriented effects (Schildkamp, 2007; Visscher & Coe, 2002; 2003). The latter are described in terms of professional development of team members, improved educational processes and improvements in school functioning (Zupank, Urank, & Bren, 2009; Schildkamp & Teddlie, 2008; Visscher & Coe, 2003). Furthermore, unintended and undesirable effects can be observed, however; for example, reduced motivation among teachers due to extra workload (FitzGibbon & Tymms, 2002; Schildkamp & Teddlie, 2008) or an excessive and narrow focus on testing-towards-the-curriculum (Schildkamp & Teddlie, 2008; Visscher, 2002). In the present study, we map the perceived effects of SPF use on the basis of self-reports of school improvement effects. This approach has been successfully applied in previous studies on data use (Huffman & Kalnin, 2003; Schildkamp & Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2009). 2.3. Influential factors: Competences and support In our theoretical model, we distinguish between a variety of variables and processes that influence the actual feedback use and the related effects: (1) variables within the users that define their ability/orientation to adopt SPF use and (2) levels of SPF support. Data literacy competences A competence is the ability to take satisfactory action through the integration of knowledge, skills and attitudes. These three elements are operationalized below in the context of school feedback use. An attitude reveals how positively or negatively a person views a particular matter (Petty & Wegener, 1998). A negative attitude towards SPF is – according to Bosker, Branderhorst and Visscher (2007) – one of the main obstacles in the use of feedback information. The attitude is the most 118
Chapter 5
significant aspect that determines a person’s willingness to invest time and energy in dealing with information (Williams & Coles, 2007) and the users’ belief that they need the data in order to improve education (Schildkamp & Kuiper, 2010). The concept can be operationalized in analogy to selfevaluation research in schools (Meuret & Morlaix, 2003; Vanhoof, Van Petegem, & De Maeyer, 2009). An individual’s attitude towards SPF can be situated on a bipolar continuum. A number of examples include: School feedback does/does not lead to better teaching, is favored/not favored by most team members, and so on. The importance of knowledge and skills is evidenced by the impact of data literacy on the SPF use process (Webber & Johnston, 2000). “Data literacy encompasses the strategies, skills and knowledge needed to define information needs, and to locate, evaluate, synthesize, organize, present and/or communicate information as needed” (Williams & Coles, 2007, p. 188). Data literacy is a condition for being able to convert data into valuable and usable information (Earl & Fullan, 2003). The current lack of know-how on making use of the information is an important obstacle (Kerr, Marsh, Ikemoio, Darilek, & Barney, 2006; Saunders, 2000; Van Petegem & Vanhoof, 2004; Williams & Coles, 2007). Next to a lack of capacities needed to interpret the data, there often is a lack of well-developed research skills such as the formulation of research questions and hypotheses (Earl & Fullan, 2003; Herman & Gribbons, 2001; Kerr et al. 2006). In this context, we also have to distinguished between the actual mastery of knowledge and skills on the one hand, and the level at which the users estimate their skills on the other. The concept academic self-efficacy is applicable in the context of SPF, which is a person's belief that he or she can perform certain academic tasks to certain levels (Bandura, 1977; Schunk, 1991). In the present study, academic self-efficacy focuses on the extent to which users think they have understood the terms, figures and tables used and the extent to which they believe they are able to find explanations for their results. It is not only important to measure the actual knowledge and skills but also to record the level of perceived self-efficacy, since it significantly determines a person’s motivation for action (Bandura, 1977). SPF usage support Providing SPF support is essential because it might influence the actual and experienced mastery of the competences of school principals to interpret information relating to their school. A more detailed description of support levels in this study will be discussed in the section about research methodology. For the evaluation of support effects, we build on 119
Chapter 5
Kirkpatrick’s (1998) four levels of evaluation, which can be linked to our broader theoretical framework. Table 1 describes these levels in general terms and in terms of the SPF focus in this study. Table 1 Kirkpatrick's Evaluation Levels (1998) Description of evaluation levels Reaction. Immediate response of the participants after the support. This concerns a general impression of the relevance and possibilities for application.
Application in this study This level is not reported because it could logically only be obtained from the experimental group.
Learning. Increase in knowledge and skills and the change in attitudes as a result of the support.
In this study, this level translates as the question of whether the support has contributed to an increase in data literacy competences, specifically in relation to the feedback report used.
Behavior. Application of what has been learned in the organization and behavioral changes.
In this particular case, it concerns the question of how far schools progress in the phases of SPF use and types of SPF use.
Results. Effects of the support on achieving the organization’s aims and on the organization itself.
In the context of SPF use, this evaluation level is represented in the variable ‘perceived effects' of SPF.
Kirkpatrick's underlying premise (1998) is that the attainment of a higher level can only be achieved once a lower level has been realized. This fits the theoretical framework (see Figure 1) since SPF support provisions will only contribute to school improvement effects when underlying SPF competences have been affected.
3. Methodology: research design, procedure and research instruments A between-groups field experiment with posttest was set up to investigate the impact of SPF use support. The schools in this study can be classified into two groups: a group with SPF support (experimental group) and a group without SPF support (control group). The design was experimental rather than quasi-experimental (Creswell, 2008; Field & Hole, 2003), given that the schools were randomly assigned to either one of the two 120
Chapter 5
conditions and it was possible to control the independent variable, namely the support intervention. The experiment was set up in the context of a large-scale project, whereby Flemish primary schools annually receive confidential feedback based on the comparison of their school performance results with a reference group. The schools receiving the feedback participate in a longitudinal study, named Schoolloopbanen in het BasisOnderwijs (SiBO), tracking approximately 6000 pupils from a representative sample of Flemish schools (from the start of K3 until the end of grade six and the transition to secondary education). Item Response Theory (IRT)-based techniques are used to construct the test scores, enabling the estimation of growth curves. At the beginning of 2008, about 200 schools received feedback reports containing the results (grade 1 to grade 4) of the investigated pupil cohort. Results were reported in relation to mathematics, reading fluency, reading comprehension, and orthography, supplemented with information about pupil characteristics (child factors, home factors, and Dutch language skills at the start of grade 1). The central concepts in these reports include learning gain, value added, and adjusted scores. These concepts were explained in such a way that no prior statistical knowledge is required. The data were supported with graphical representations (i.e. pie charts, growth curves, and cross tables). The content of the text of each report was standardized. The school principals were required to interpret the results for their school, based on the general information made available. Forty-five - chosen at random - of the 188 schools involved in the project received an invitation to participate in the support. The principals in the experimental support condition participated in a professional development activity (a half-day workshop) with the following aims: (1) being able to describe concepts from the report in their own words; (2) being able to interpret the figures and tables from the SPF report; (3) being able to give an explanation why performances could be worse or better as compared to the reference group and (4) being able to describe which function(s) the SPF report can fulfill in the context of their own school. To this end, school principals met in small groups outside their own school. The feedback designers explained the feedback reports during these meetings and participants were given the opportunity to practice using and evaluating the feedback information interactively. Only 23 of the 45 schools invited effectively participated in the experimental condition. Although the study participants were assigned to the various conditions randomly, there is a real risk of selection bias caused by the self-selection through working with volunteers (Rossi, Lipsey & Freeman, 2004). This could endanger the internal validity of the study. 121
Chapter 5
Therefore, previously collected data was used to investigate whether this subgroup deviated from the population of schools in the project in relation to relevant school population and functioning criteria. This proved not to be the case. Five months after receiving the SPF – and after the experimental group had participated in the support provision – the school principals of the SIBO schools were asked to fill in a questionnaire. A total of 116 schools completed the questionnaire (response rate 62%). The response for the control condition amounted to 60% (n = 99) and the amount for the experimental condition amounted to 74% (n = 17). The various concepts from the theoretical framework were translated in scales, consisting of specific questionnaire items. Each item presented the principals with a statement they were to judge using a Likert scale. Table 2 presents descriptives of the scale scores for the different constructs; in addition psychometric details are reported. The reliability analyses show good to very good internal consistency values for all scales (α > .80). Table 2 Descriptives and reliability of the research instruments Influencing factors Attitude towards SPF use Academic self-efficacy SPF use Phases in SPF use Conceptual SPF use Instrumental SPF use Effects Perceived effects of SPF use
M
SD
Range
N items
α
3.97 3.81
1.08 0.74
1-6 1-5
7 6
.91 .92
3.81 3.27 2.85
0.75 0.83 0.97
1-5 1-5 1-5
6 4 3
.86 .86 .90
2.92
0.90
1-5
6
.94
A data literacy test was used to measure the knowledge and skills in relation to feedback interpretation. The test focuses on the concepts and representations used in the SPF reports and comprises 26 test items mapping out both conceptual knowledge (correct conception of the terms used) and procedural knowledge (skills in reading learning gains and added value from graphs and tables). Both closed (true-untrue and multiple choice) and open (filling in digit values) questions were used in the test. Test scores were construed using IRT-analysis. A good overall fit was achieved using a two-parameter model (LR = 248.4; SE = 320.0; p = .99) and a good empirical validity of .83 was achieved using 24 retained items. The scores were standardized to enhance their interpretability. 122
Chapter 5
Considering the nature of the theoretical framework and the research questions, putting forward empirical evidence will require structural equation modeling. Path analyses were therefore used to analyze whether theory-based relationship expectations corresponded with the empirical findings.
4. Results
4.1. Descriptive results In this section, we focus on the sum scores and individual item scores for the different constructs in the questionnaire. We discuss the variables as ordered in our theoretical model: influencing factors, feedback use, and perceived SPF effects. Finally, the results for the data literacy test are discussed. The sum score for the scale attitude towards SPF reveals that a large majority of the respondents state that SPF use is (to some degree) a valuable activity (M = 3.97, SD = 1.08). The most positive scores (M > 4) were recorded in relation to the statements that SPF stimulates selfevaluation, that much can be learned from SPF and that SPF results in better management and more involvement in school policy. The statement for which the lowest score was recorded (M = 3.46, SD = 1.22) related to school feedback being an enjoyable activity for the majority of team members. In addition to a positive attitude, most of the respondents had a positive self-efficacy score relating to the interpretation of and possible uses of the feedback report (M = 3.81, SD =.74). For example, 80% stated that they understand the terms, figures and tables in the report and can see connections between the terms. Only a minority (between 12 and 18%) disagreed with the statement concerning their ability to clearly grasp the objectives and possibilities for the use of SPF or describe terms from the report in their own words. As regards the phases in feedback use, only a minority of the principals reported having invested significant time and effort in the interpretation and diagnosis of the SPF, despite the fact that the majority of respondents indicated having an interest in the SPF report (M = 4.37, SD = 0.72). Although 70% of the respondents agreed with the statement that the report had been examined thoroughly (M = 3.84, SD = 0.97), only 43% of
123
Chapter 5
the respondents stated they had sought explanations for the performance of their own schools on the basis of this report (M = 3.30, SD = 1.11). In regard to the types of SPF use, the respondents scored significantly higher (t (114) = 4.64, p < .001) for items pertaining to conceptual use (M = 3.27, SD = .83), as compared to items relating to instrumental use (M = 2.85, SD = .97). Half of the respondents stated that the SPF had an impact on their perception of pupils’ performance and on the school in general (conceptual use), while only 30% of the respondents stated that the report had resulted in specific action (instrumental use). The latter leads to the - not surprising - result that the perceived effect of SPF use is rather low. Only a limited number of respondents reported any effects of the SPF (M = 2.92, SD =.90). Between 30 and 40% stated that the SPF report has contributed to more discussions on how the school functions, to more attention for professional development, to a better functioning of the school principal and to skills improvement in SPF use. Around twenty percent of the respondents indicated that the SPF report has improved the quality of the teaching in their schools. The results of the data literacy test reveal that only 42% of the respondents answered half of the questions correctly. Only 10% of the respondents answered more than three-quarters of the questions correctly. But some school principals (n = 5) succeeded in interpreting all the information from the report correctly. Analysis of the difficulty of the literacy test items points out that most principals experience difficulties in relation to procedural exercises; i.e. reading the learning gains and added value from the graphs and tables. The conceptual questions were apparently less difficult. 4.2. Path model 1: Phases in use, types of use and perceived effects of feedback use The theoretical framework - presented in Figure 1 – shows the hypothetical direct and indirect relations between the variables in our model: the data literacy competences influence the perceived SPF effects via the phases in use and types of feedback use. In a first analysis approach, the data of all respondents were entered in the model without making a distinction between SPF support conditions.
124
Figure 2. Results of path-model: Use and perceived effects of SPF use
Figure 3. Results of path model: Impact of support 125
Chapter 5
In order to test the mediation hypotheses, the direct effect of all independent variables on the endogenous variable have to be studied (MacKinnon 2008). This initial model was found to include various statistically non-significant regression lines and co-variations. These had to be removed - stepwise - in order to achieve a parsimonious model. Figure 2 shows the findings of the resulting path model, including standardized path coefficients and percentages of variance explained (X² (df) = 8.1 (8), p = 0.43; RMSEA = 0.01; AGFI = 0.92; GFI = 0.98). This path model can be used to answer the second research question about the extent to which differences in SPF use (phases and types) and perceived SPF effects can be explained by SPF competences. The percentages of variance explained for the variables relating to SPF use (phases and types) are highly relevant. For example, 39% of the variance in the variable ‘phases in SF use’ can be explained by the data literacy competences of the respondents. The higher the respondents’ estimation of their level of knowledge and skills (self-efficacy) and the more positive their attitude towards SPF, the higher they invest in the use of SPF. However, the additional effect of the ‘actual’ knowledge and skills is limited. The theoretical model also hypothesized that the ‘types of SPF use’ can only be explained directly by the ‘phases in SPF use’. This only holds true in relation to an instrumental use (24% of the variance explained). When considering a conceptual use, also the attitude towards SPF use and self-efficacy are relevant. Together with the ‘phases in SPF use’ these two variables explain 43% of the variance in conceptual use. It can also be concluded that a positive correlation (.32) exists between the unexplained variance in the variables instrumental and conceptual use. This possibly indicates - after checking for the other variables in the model - that the number of instrumental and conceptual use respondents report increases concurrently. The ultimate variable in our model is the perceived effect of SPF use. The path model explains 66% of the variance in this variable. The model suggests that the ‘types of SPF use’ play an important role. The more intensively respondents report conceptual and instrumental use of the SPF, the higher their perception of the SPF effects. In contrast to our initial model, there seems to exist a direct relationship between the attitude towards SPF on the perceived effects of SPF. Therefore, we have to conclude that the hypothetical mediation role of certain variables is more direct than expected.
126
Chapter 5
4.3. Path model 2: Differential impact of support on data literacy competences, feedback use and perceived effects Building on the previous model, a subsequent path analysis was carried out to test whether the SPF support condition results in significantly higher scores than the control group as regards data literacy competences, feedback use and perceived effects. A dummy variable was added to the model referring to the experimental (1) and control (0) condition. Figure 3 displays the results of the path model with support, using the standardized path coefficient and the percentage of variance explained in the endogenous variables (X² (df) = 11.3 (13), p = 0.58; RMSEA = 0.01; AGFI = 0.92; GFI = 0.97). The path model immediately reveals that there is no significant direct impact of support on the ‘phases in SPF use’, the ‘types of SPF use’ and the ‘perceived effects of SPF use’. This is consistent with the - a priori theoretical framework. Nevertheless, it has to be stressed that the proportion of variance explained in the competence related variable and the self-efficacy variable is limited. The results also do not confirm the expectation that SPF support contributes to a more positive attitude. Yet support does have a statistically significant effect on the mastery of knowledge and skills: 11% of the variance in the test scores can be attributed to whether or not respondents received support. This impact remains limited as far as the self-efficacy is concerned. Only 2% of the variance in this variable can be attributed to the experimental condition.
5. Conclusion and discussion In the present study, we focused on the question of how schools use school performance feedback, and what the perceived effects are of SPF use. At a general level, the respondents reported a rather low level of perceived impact of SPF. Nonetheless, the majority of respondents stated that they had thoroughly read and examined the SPF report. However, a less significant number of respondents have considerably invested time and/or efforts in interpreting the results and seeking explanations for the results of their own schools. In line with our theoretical framework, differences in the ‘phases in SPF use’ translate into differences in the ‘types of SPF use’. There is a considerably higher occurrence of conceptual use than instrumental use. This can be explained by the fact that a conceptual use (control and plan-oriented) precedes an instrumental use (goal-oriented) in a school policy cycle (cf. Plan-Do-Check-Act-cycle, Deming, 1986). Research already 127
Chapter 5
revealed that many schools experience difficulties to use the findings of a control stage in subsequent steps of quality control (Schildkamp 2007; Verhaeghe et al., 2010). The results are also helpful to demonstrate that differences in SPF use correspond with differences in SPF competences. This study confirms hypotheses related to the second research question. In regard of the attitude towards SPF, we found that the impact is not only mediated by the ‘phases in SPF use’, but also a direct relationship exists with the ‘types of SPF use’ and ‘perceived SPF effects’. Another relevant finding is that the ‘phases in SPF use’ are related more closely to the perceived mastery of knowledge and skills (academic self-efficacy) as compared to the actual mastery as measured with the data literacy test. We learn from this that faith in one’s own knowledge and skills is very important in making the transition to action (Bandura, 1977). Obviously, it should also be noted that the actual mastery of the knowledge and skills is still relevant. School policies should be developed on the basis of correct information (Devos & Verhoeven, 2003). A key research question in this study related to the differential impact of an SPF support provision. Building on the theoretical framework, this implies that we expect SPF support initiatives to effect - in a direct way - the SPF related competences (attitudes, data literacy and self-efficacy). No direct impact was expected on the ‘phases in SPF use’, related ‘types of SPF use’ and ‘perceived SPF effects’. The results confirm by large our theoretical assumptions. Principals that participated in the SPF support condition attained higher on data literacy test scores and reported higher self-efficacy levels. This consequently affected their process of feedback use (phases in use). The expected indirect effect of SPF support is in line with Kirkpatrick’s model (1998), implying that a higher level can only be achieved if lower levels have been attained. Contrary to our expectations, participation in the SPF support condition had no significant effect on the attitude towards SPF. We have to stress that this attitude remains a crucial factor. Future SPF support could focus to a larger extent, on the fundamental basis and motives to implement SPF and on facilitating successful experiences with SPF. Furthermore, SPF support initiatives that offer opportunities for discussion and exchange of experiences – within and between schools must be considered (Huffman & Kalnin, 2003; Lachat & Smith, 2005; Wayman, Midgley, & Stringfield, 2007). Some authors stress that such discussions and exchanges are crucial to see benefits of SPF in terms of school improvement (Zupanc, Urank, & Bren, 2009). Another interesting finding is the larger impact of the SPF support initiative on data literacy test scores as compared to the impact on 128
Chapter 5
academic self-efficacy. An initial explanation for this fact relates to the limited scope of the support initiative. This was a single shot activity that focused on the development of interpretation skills. The SPF support seemed to have succeeded in the latter. A second explanation can be that the SPF support has raised awareness among the participants about the complexity of school feedback. This can explain why the SPF support results mainly in higher literacy test scores and to a lesser extent in an increased level of self-efficacy. A third explanation is that the participants have learned hardly something from the support intervention. It is possible they report the same level of self-efficacy but attain higher literacy test scores as a result of an extra effort. Future research about the impact of SPF support could adopt a longitudinal approach with a more elaborated pre- and postintervention measurement. This could enable to take into account the specific support needs of respondents. These differences in need could also be linked to the selection of SPF training participants. Moreover, the - delayed - effect of SPF usage student achievement could be studied. Such effects could only be expected after several SPF reports and persistent efforts for effective SPF use. In addition, it would be worthwhile to set up research related to more intensive support initiatives that go beyond single shot SPF support provisions. At a theoretical level, a cross-validation of the path model developed in the present study could be emphasized. In the present study this was not possible because a sample size of 100 respondents is required (Hoyle, 1995). Finally, the low data literacy test scores and its relationship with the ‘phases in SPF use’ needs to make a methodological comment. The data literacy test score was the single variable in the model not based on perceptions of the respondents. The strong interrelations between perception-based variables in the present study are thought-provoking. It introduces the need for research that links the ‘perceived’ to the ‘expected’ and in particular to the ‘actual’ use of SPF. Finally, next to a focus on the competences and perceptions of principals, future research could also switch the attention to other critical actors in the discussion about educational quality: inspection authorities, teachers, school teams, etc. We finish by repeating that we observed a strong interest in SPF and a positive attitude towards SPF among the Flemish school principals in our study. This is in sharp contrast to their limited usage of the school performance feedback information and the related effects on educational practices and results. The study therefore shows in a particular way the need to develop critical and conditional competences related to SPF use. This is interesting from both a theoretical and practical point of view since many support initiatives are being set up by feedback providers (e.g. 129
Chapter 5
helpdesks, after-school information sessions, information sessions at school, and so on) without evaluating their direct and indirect effects.
References Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191-215. Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the utilisation of management information systems in secondary schools. School Effectiveness and School Improvement, 18(4), 451-467. Coe, R. (2002). Evidence on the role and impact of performance feedback in schools. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger. Creswell, J.W. (2008). Educational Research: Planning, conducting, and evaluating quantitative and qualitative research (3rd ed.) Upper Saddle River, NJ: Pearson Prentice Hall. Deming, W.E. (1986). Out of the crisis. Cambridge: Massachusetts Institute of Technology,Center for Advanced Engineering Study. Devos, G., & Verhoeven, J. (2003). School self-evaluation - Conditions and caveats: The case of secondary schools. Educational Management & Administration, 31(1), 403-420. Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge Journal of Education, 33(3), 383-394. Field, A.P., & Hole, G. (2003). How to design and report experiments. Thousand Oaks, CA: Sage. Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in indicator systems: Doing things right and doing wrong things. Education Policy Analysis Archives, 10(6), 1-28. Retrieved from http://epaa.asu.edu/ojs/article/viewFile/285/411 Herman, J., & Gribbons, B. (2001). Lessons learned in using data to support school inquiry and continuous improvement: Final report to the Stuart Foundation. Los Angeles: University of Carolina, Center for the Study of Evaluation. Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School selfevaluation and student achievement. School Effectiveness and School Improvement, 20(1), 47-68. Hoyle, R.H. (Ed.). (1995). Structural Equation Modeling: Concepts, issues and applications. Thousand Oaks, CA: Sage. Huffman, D. & Kalnin, J. (2003). Collaborative inquiry to make data-based decisions in schools. Teaching and Teacher Education, 19, 569-580. 130
Chapter 5
Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H., & Barney, H. (2006). Strategies to promote data use for instructional improvement: Actions, outcomes, and lessons from three urban districts. American Journal of Education, 112, 496-520. Kirkpatrick, D.L. (Ed.). (1998). Evaluating training programs: The four levels. San Francisco: Berrett-Koehler. Lachat, M.A., & Smith, S. (2005). Practices that support data use in urban high schools. Journal of Education for Students Placed at Risk, 10(3), 333349. Learning Point Associates. (2004). Guide to using data in school improvement efforts: A compilation of knowledge from data retreats and data use at learning point associates. Retrieved from http://www.learningpt.org/pdfs/datause/guidebook.pdf MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. New York: Lawrence Erlbaum Associates. Mathison, S. (1992). An evaluation model for inservice teacher education. Evaluation and Program Planning, 15, 255-261. Meuret, D., & Morlaix, S. (2003). Conditions of success of a school's selfevaluation: Some lessons of an European experience. School Effectiveness and School Improvement, 14(1), 53-71. Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp 3-16). Oxford, UK: Elsevier Science. Petty, R.E., & Wegener, D.T. (1998). Attitude change: Multiple roles for persuasion variables. In D. Gilbert, S. Fiske & G. Lindzey (Eds.), The handbook of social psychology (pp. 323-90). New York: McGraw-Hill. Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic approach. Thousand Oaks: Sage. Saunders, L. (2000). Understanding schools’ use of ‘value added’ data: The Psychology and sociology of numbers. Research Paper in Education, 15(3), 241-258. Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’ data: A science in the service of an art? Paper presented at the British Educational Research Association Conference, Brighton, University of Sussex. Schildkamp, K. (2007). The utilisation of a self-evaluation instrument for primary education. Unpublished doctoral dissertation, University of Twente, Enschede, The Netherlands. Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform: Which data, what purposes, and promoting and hindering factors. Teaching and Teacher Education, 26(3), 482-496. 131
Chapter 5
Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems in the USA and in the Netherlands: A comparison. Educational Research and Evaluation, 14(3), 255-282. Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school self-evaluation instrument. School Effectiveness and School Improvement, 20(1), 69-88. Schunk, D.H. (1991). Self-efficacy and academic motivation. Educational Psychologist, 26(3&4), 207-231. Tymms, P. (1995). Influencing educational practice through performance indicators. School Effectiveness and School Improvement, 6(2), 123-145. Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatieindicatoren als strategisch instrument voor schoolontwikkeling [Feedback on school performance indicators as strategic instrument for school improvement]. Pedagogische Studiën, 81, 338-353. Vanhoof, J., Van Petegem, P., & De Maeyer, S. (2009). Attitude towards school self-evaluation. Studies in Educational Evaluation, 35, 21-28. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and School Improvement, 21(2), 167-188. Visscher, A.J. (2002). A framework for studying school performance feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 41-71). Lisse, The Netherlands: Swets & Zeitlinger. Visscher, A.J., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321-349. Visscher, A.J., & Coe, R. (Eds.). (2002). School improvement through performance feedback. Lisse, The Netherlands: Swets & Zeitlinger Wayman, J.C., Midgley, S., & Stringfield, S. (2007). Leadership for databased decision making: Collaborative educator teams. In A. B. Danzig, K. M. Borman, B. A.Jones & W. F. Wright (Eds.), Learner-centered leadership: Research, policy and practice (pp. 189-205). New Jersey, USA: Lawrence Erlbaum Associates. Webber, S., & Johnston, B. (2000). Conceptions of information literacy: New perspectives and implications. Journal of Information Science, 26(6), 381-397. Weiss, C.H. (1998). Have we learned anything new about the use of evaluation? American Journal of Evaluation, 19(1), 21-33. Williams, D., & Coles, L. (2007). Teachers’ approaches to finding and using research evidence: An information literacy perspective. Educational Research, 49(2), 185-206. 132
Chapter 5
Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for effectiveness and improvement in classrooms and schools in upper secondary education in Slovenia: Assessment of/for Learning Analytic Tool. School Effectiveness and School Improvement, 20(1), 89-122.
133
CHAPTER 6 EFFECTEN VAN ONDERSTEUNING BIJ SCHOOLFEEDBACKGEBRUIK
134
Chapter 6
CHAPTER 6: EFFECTEN VAN ONDERSTEUNING BIJ SCHOOLFEEDBACKGEBRUIK∗ Abstract Effects of support by school performance feedback use School development by systematic data use requires schools to be provided with information-rich environments. However, providing school performance feedback does not guarantee a successful use. Limited data literacy competences of the users are one of the main stumbling blocks. Support initiatives were developed and evaluated to overcome this shortcoming. In a randomized field study, the effects of two experimental conditions related to inservice and onservice education and training (INSET and ONSET) are compared against a control group. This study examines the relationship between data literacy competences, support provisions for data interpretation, actual usage of the feedback, and school improvements effects. The research was based on in-depth interviews involving 18 primary school principals. The results of a case ordered predictor-outcome metamatrix do not only reveal difficulties in handling the information but also incongruences in attitude towards feedback use between school principals and teachers. The ONSET-condition led to the most optimal results promoting a tailored support approach.
∗
Gebaseerd op Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Effecten van ondersteuning bij schoolfeedbackgebruik. Manuscript submitted for publication in Pedagogische Studiën.
135
Chapter 6
Samenvatting Gezien hun maatschappelijke rol, wordt van scholen verwacht dat hun benadering van schoolontwikkeling op een systematische manier gebeurt. Daarom worden ze ondermeer aangezet om het interne kwaliteitszorgbeleid te baseren op concrete data. Schoolfeedbackinitiatieven zijn een mogelijke bron van dergelijke data. Het gebruik van deze schoolfeedback blijkt echter niet vanzelfsprekend, ondermeer door een gebrek aan datageletterdheidscompetenties. Om op deze behoefte in te spelen worden verscheidene ondersteuningsactiviteiten opgezet, die ofwel binnen (ONSET) of buiten de school worden georganiseerd (INSET). In deze studie worden de resultaten gerapporteerd van een evaluatieonderzoek waarbij naast een INSET- en een ONSETondersteuningsopzet ook feedbackgebruikers in een controleconditie worden betrokken. Bijzondere aandacht wordt daarbij besteed aan het beïnvloeden van datageletterdheidscompetenties en het evalueren van effecten op vier niveaus. Onderzoeksgegevens werden verzameld via diepte-interviews met 18 schoolleiders uit de drie condities en werden verwerkt in een case ordered predictor-outcome meta-matrix. De resultaten tonen niet alleen een gebrek aan in kennis en vaardigheden om met schoolfeedback om te gaan, maar ook een verdeelde houding tussen schoolleiders en leerkrachten. Verder blijkt de ONSET-conditie tot de beste resultaten te leiden wat impliceert dat ondersteuning in functie van feedbackgebruik het best op maat van de school wordt aangeboden.
1. Probleemstelling Van scholen wordt in groeiende mate verwacht dat ze van schoolontwikkeling een systematisch proces maken (Nevo, 2002; Leithwood & Aiken, 1995). Om hen daarbij te assisteren wordt gestreefd naar het creëren van informatierijke omgevingen. Zo worden scholen ondermeer voorzien van feedback over hun functioneren en prestaties door speciaal daartoe opgezette schoolfeedbacksystemen. Dit gebeurt met de verwachting dat scholen deze feedback aanwenden in het kader van zelfevaluatie (Visscher & Coe, 2002; 2003). Het gebruik van dergelijke informatiebronnen als een beleidsinstrument blijkt echter niet vanzelfsprekend. Doorgaans blijven het gebruik en de schoolverbeteringseffecten gelimiteerd (Coe, 2002; Saunders & Rudd, 1999; Tymms, 1995; Schildkamp, Visscher, & Luyten, 2009; Van Petegem & Vanhoof, 2004; Zupanc, Urank, & Bren, 2009). Schoolfeedback ontvangen 136
Chapter 6
blijkt een noodzakelijke maar geen voldoende stap. Zowel de scholen als de feedbacksystemen moeten immers aan bepaalde voorwaarden voldoen (Visscher & Coe, 2003; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010). Één van de belangrijkste hinderpalen die een effectief gegevensgebruik in de weg staat, is het ontbreken van datageletterdheid bij de gebruikers (Earl & Fullan, 2003). Niet verwonderlijk zijn dan ook de onderzoeksbevindingen waarbij schoolleiders en leerkrachten aangeven behoefte te hebben aan bijkomende ondersteuning bij zowel het interpreteren als het verder gebruik van de data (Schildkamp & Teddlie, 2008; Schildkamp et al., 2009; Verhaeghe et al., 2010; Visscher & Coe, 2003; Zupanc et al., 2009).
2. Conceptueel kader 2.1.
Fasen in en types van schoolfeedbackgebruik
Schoolfeedbackgebruik kan op twee manieren omschreven worden. Enerzijds kan verwezen worden naar de verschillende stappen die feedbackgebruikers ondernemen om met de data aan de slag te gaan. Onderzoek leert dat om gebruik te maken van schoolfeedback het doordacht doorlopen van een cyclisch proces aangewezen is (Huffman & Kalnin, 2003; Learning Point Associates, 2004; Verhaeghe et al., 2010). Daarbij worden het ontvangen, lezen en bediscussiëren van de schoolfeedback onderscheiden, om tot een correcte interpretatie te komen. Nadat de school een sterkte-zwakteanalyse van haar resultaten heeft gemaakt, volgt een fase waarin met de schoolfeedback aan de slag wordt gegaan. Deze omvat het diagnosticeren door het zoeken naar verklaringen voor de resultaten en het plannen, uitvoeren en evalueren van acties. Omwille van een gebrek aan datageletterdheid en tijdsinvestering blijken scholen deze stappen niet of moeizaam te doorlopen (Earl & Fullan, 2003; Verhaeghe et al., 2010). Daarnaast kan bij het gebruiken van data binnen scholen verwezen worden naar verschillende types van gebruik. Gebaseerd op de indeling volgens Rossi, Lipsey en Freeman (2004) kan een onderscheid gemaakt worden in verschillende soorten gebruik van evaluatiegegevens, eveneens toegepast in de context van schoolfeedbackgebruik (Schildkamp et al., 2009; Verhaeghe et al., 2010; Weiss, 1998). Scholen kunnen bijvoorbeeld acties ondernemen (instrumenteel gebruik), aan het denken gaan (conceptueel gebruik), bevestiging zoeken van bestaande standpunten (symbolisch gebruik), het rapport in een verantwoordingcontext hanteren
137
Chapter 6
(strategisch gebruik) of het rapport gebruiken om teamleden te stimuleren of motiveren (motiverend gebruik). 2.2.
Effecten van schoolfeedbackgebruik
Het ultieme doel van schoolfeedbackgebruik is bij te dragen aan schoolontwikkeling (Visscher & Coe, 2002, 2003). Echter, schoolfeedbackgebruik blijkt niet steeds te resulteren in significant verbeterde leerlingprestaties (Fitz-Gibbon & Tymms, 2002; Schildkamp et al., 2009; Visscher, 2002). Bij het nagaan van schoolverbeteringseffecten dient dan ook ruimer gekeken te worden naar ondermeer effecten op de professionele ontwikkeling van teamleden (zoals een toenemende mate van assessment literacy; Zupanc et al., 2009), verbeterde onderwijsprocessen (zoals het intensifiëren van leerlingenbegeleiding, Schildkamp & Teddlie, 2008) en een verbeterd schoolfunctioneren (zoals het versterken van de cohesie in de school, Visscher & Coe, 2003). Ook onbedoelde en onwenselijke effecten kunnen zich voordoen zoals demotivatie bij leerkrachten door werkoverlast (Fitz-Gibbon & Tymms, 2002) of een te sterke focus op getoetste leerinhouden (Schildkamp & Teddlie, 2008; Visscher, 2002). 2.3.
Beïnvloedende factoren
Verschillen in schoolfeedbackgebruik en de effecten ervan kunnen toegeschreven worden aan een viertal cluster van factoren die refereren naar kenmerken van gebruikers, feedbacksystemen, ondersteuning en de educatieve context (Verhaeghe et al., 2010; Visscher & Coe, 2003). Gezien de gebrekkige datageletterdheidscompetenties van de gebruikers en de urgente vraag naar onderzoek over ondersteuning hierbij spitsen we ons in deze studie op deze twee factoren toe. Competenties bij schoolfeedbackgebruik Het begrip competentie verwijst naar de integratie van de kennis, vaardigheden en attitudes die nodig zijn om adequaat te handelen in specifieke situaties (Gonczi, 1994). Uit de onderzoeksliteratuur blijkt dat de mate van informatiegeletterdheid (Webber & Johnston, 2000) een grote rol speelt bij schoolfeedbackgebruik. Deze algemene term omvat de strategieën, vaardigheden en kennis die nodig zijn om informatienoden te bepalen en om de nodige informatie te verzamelen en te verwerken (Williams & Coles, p 188). Toegepast op het domein van datagebruik binnen 138
Chapter 6
de school spreekt men van datageletterdheid. Het datageletterdheid zijn is een noodzakelijke voorwaarde om data te kunnen omzetten in bruikbare informatie (Earl & Fullan, 2003). Echter, de beperkte kennis om met de gegevens aan de slag te gaan en de daarmee gepaard gaande onzekerheid vormen vaak een obstakel (Earl & Fullan, 2003; Kerr et al., 2006; Saunders, 2000; Verhaeghe et al., 2010). Er zou niet alleen een gebrek aan capaciteiten zijn om de data te interpreteren; ook onderzoeksvaardigheden zoals het formuleren van onderzoeksvragen en hypothesen zijn doorgaans niet sterk ontwikkeld (Earl & Fullan, 2003; Herman & Gribbons, 2001; Kerr et al., 2006). Het concept datageletterdheidscompetenties vraagt eveneens aandacht voor de houding ten aanzien van schoolfeedback. Een negatieve houding ten aanzien van schoolfeedback wordt door Bosker, Branderhorst en Visscher (2007) als één van de voornaamste hinderpalen voor het gebruik van feedbackinformatie naar voren geschoven. Het gaat dan bijvoorbeeld om het geloof van de gebruikers dat ze data nodig hebben om hun onderwijs te verbeteren (Schildkamp & Kuiper, 2010). De houding van gebruikers ten opzichte van datagebruik bepaalt dan ook grotendeels in hoeverre men bereid is om tijd en inspanningen te investeren in het gebruik van de informatie (Williams & Coles, 2007). Ondersteuning van scholen bij schoolfeedbackgebruik Gezien de gebrekkige datageletterdheidscompetenties zijn schoolfeedbackgebruikers vragende partij voor het ter beschikking stellen van ondersteuning bij zowel de data-interpretatie als het verder gebruik van de gegevens (bv. Schildkamp & Teddlie, 2008; Verhaeghe et al. , 2010; Visscher & Coe, 2003). Deze ondersteuning kan geboden worden door zowel externe ondersteuners – bijvoorbeeld educatieve diensten en feedbackleveranciers – als door schoolteamleden intern in de school. Voor het indelen van externe ondersteuningsinitiatieven kan Gardners (1995) continuüm voor nascholingsinitiatieven gebruikt worden. Aan de uitersten situeren zich initiatieven die buiten de school (Inservice Education and Training - INSET) en binnen de school plaatsvinden (Onservice Education and Training - ONSET). Een voordeel van INSET bijeenkomsten waarbij deelnemers uit verschillende scholen buiten de eigen school samengebracht worden - is dat men door sociale interactie formeel en informeel van elkaar kan leren (Mathison, 1992). Doordat doorgaans slechts één afgevaardigde per school deelneemt, kan echter een beperktere impact verwacht worden dan bij ONSET-initiatieven waarbij meerdere leden van het schoolteam kunnen betrokken worden. Toch is er het vertrouwen dat 139
Chapter 6
schoolleiders als katalysator de geleerde inzichten kunnen doorgeven aan het schoolteam (Kerr et al., 2006). Verscheidene studies tonen dan ook aan dat de meest succesvolle leiders in datagebruik wel voortrekker zijn maar dan via gedistribueerd leiderschap de taken voor datagebruik delen (Wayman, Midgley, & Stringfield, 2007). ONSET-initiatieven zouden meer kosteneffectief zijn dan inservice training doordat de training doorgaat binnen de school met de eigen data en eigen problemen als uitgangspunt. Bijgevolg is de kans groter dat de veranderingen aanvaard worden door de sterkere betrokkenheid en praktijkband (Gardner, 1995; Murnane, Sharkey, & Boudett, 2005). Wanneer daarenboven verschillende schoolteamleden aanwezig zijn, kan dit aanzetten tot meer intern overleg en verdere opvolging. Op die manier kan feedbackgebruik evolueren van een individuele aangelegenheid naar een gedeelde verantwoordelijkheid, al dan niet onder de vorm van collaborative data teams (Huffman & Kalnin, 2003; Lachat & Smith, 2005; Wayman et al., 2007). De rol van de schoolleider is ook bij deze evolutie van groot belang door ondermeer het creëren van een duidelijke visie en verwachtingen rond datagebruik (Young, 2006) en het coachen van de datateams (Lachat & Smith, 2005). 2.4.
Evaluatiemodel voor ondersteuningsinitiatieven – Onderzoeksvragen
Om de mogelijke effecten van ondersteuning bij schoolfeedbackgebruik te inventariseren en te integreren in het ruimer conceptueel kader doen we een beroep op de vier opeenvolgende evaluatieniveaus voor professionaliseringsactiviteiten van Kirkpatrick (1998). Vooreerst worden de reacties van de deelnemers gemeten, onmiddellijk na de ondersteuning. Het gaat om een algemene indruk en de relevantie en toepassingsmogelijkheden. Al te vaak blijft de evaluatie van ondersteuning beperkt tot dit niveau, terwijl de impact op de organisatie niet wordt nagegaan (Mathison, 1992; Rossi et al., 2004). Vervolgens wordt de impact op het leren van de deelnemers bekeken, of de toename aan kennis en bekwaamheden en de verandering in attitudes als gevolg van de ondersteuning. Ten derde wordt nagegaan of er een transfer is van wat er geleerd werd tijdens de ondersteuning naar de organisatie en of er gedragsveranderingen plaatsvinden. Tenslotte worden eventuele schoolverbeteringseffecten nagegaan in het resultaatsniveau. Daarbij wordt gekeken naar effecten van de ondersteuning op het bereiken van de doelen van de organisatie en op de organisatie zelf. Dit model kan toegepast worden om de impact van ondersteuningsinitiatieven bij schoolfeedbackgebruik te evalueren. In Tabel 1 wordt dit nader toegelicht. De centrale onderzoeksvraag daarbij is in 140
Chapter 6
welke mate verschillen in schoolfeedbackgebruik verklaard kunnen worden door ondersteuningsinitiatieven bij schoolfeedbackgebruik. Tabel 1 Invloed van ondersteuning op schoolfeedbackgebruik volgens model Kirkpatrick Evaluatieniveaus Kirkpatrick
Toepassing op schoolfeedbackgebruik
Reactie: Tevredenheid van de deelnemers
Tevredenheid van deelnemers over ondersteuning bij schoolfeedbackgebruik
Leren: Toename en/of verandering in kennis, vaardigheden en attitudes
Verandering in datageletterdheidscompetenties: kennis, vaardigheden en attitudes nodig voor succesvol schoolfeedbackgebruik
Gedrag: Transfer geleerde inzichten naar organisatie
Invloed op schoolfeedbackgebruik: - Fasen in gebruik - Types van gebruik
Resultaten: Effecten op de organisatie
Invloed op schoolverbeteringseffecten door schoolfeedbackgebruik
In deze bijdrage trachten we een antwoord te geven op de vraag naar de impact van een ondersteuningsinitiatief bij schoolfeedbackgebruik door gebruik te maken van het model van Kirkpatrick. Daarbij worden volgende onderzoeksvragen gesteld: 1. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op de tevredenheid van schoolfeedbackgebruikers (Reactie)? 2. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op datageletterdheidscompetenties van schoolfeedbackgebruikers (Leren)? Zoals eerder beschreven bekijken we hier de mogelijke invloed van ondersteuning op de kennis, vaardigheden en attitudes die gebruikers nodig hebben voor succesvol schoolfeedbackgebruik. 3. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op het gebruik van deze feedback binnen de school (Gedrag)? Kirkpatricks model impliceert dat het realiseren van een hoger impactniveau maar kan als een lager niveau gerealiseerd is. Indien ondersteuning gericht is op het beïnvloeden van datageletterdheidscompetenties, zal er eerst een impact zijn op de kennis, vaardigheden en attitudes van de deelnemers. Vervolgens zullen deze veranderde competenties bijdragen aan succesvol
141
Chapter 6
schoolfeedbackgebruik, dat in deze studie bepaald wordt in termen van ondernomen stappen als soorten van feedbackgebruik. 4. Welke impact heeft INSET- en ONSET bij schoolfeedbackgebruik op de schoolverbeteringseffecten door feedbackgebruik (Resultaten)? We verwachten hierbij pas van schoolverbeteringseffecten te spreken indien succesvol feedbackgebruik ze voorafgaat.
3. Methode Voor het beantwoorden van de onderzoeksvragen werd gekozen voor een veldexperiment met een posttest. De onderzoekspopulatie (N = 195 scholen) werd random toegewezen aan de verschillende condities. 3.1.
Onderzoekscondities
Vertrekkende van het continuüm van inservice en onservice training (Gardner, 1995) werd gekozen om twee ondersteuningsvarianten te ontwerpen en uit te testen, afgezet tegenover een controlegroep die geen bijkomende ondersteuning ontving (n = 150). De eerste experimentele conditie noemen we de INSET-conditie omdat de training niet doorging op de werkplek van de deelnemers en de leerinhouden gebaseerd waren op een fictief schoolvoorbeeld in plaats van de eigen schoolresultaten. Daarnaast onderscheiden we een ONSET-conditie aangezien zowel de plaats van de training als de aangeboden leerinhouden dicht bij de schooleigen context stonden. De kenmerken van beide condities worden toegelicht in Tabel 2. De leerdoelstellingen voor de twee experimentele ondersteuningscondities waren identiek. Deelnemers werden na afloop van de ondersteuning geacht in staat te zijn (1) in eigen woorden de centrale begrippen uit het schoolfeedbackrapport te omschrijven; (2) de figuren en de tabellen uit het schoolfeedbackrapport correct te interpreteren; (3) verklaringen aan te geven waarom prestaties minder goed of beter kunnen zijn dan die van de referentiegroep en (4) voor de eigen schoolcontext te omschrijven welke functie(s) het schoolfeedbackrapport kan vervullen. Deze leerdoelen richtten zich vooral op het tweede niveau van Kirkpatrick (1998), waarin de beïnvloeding van kennis, vaardigheden en attitudes werd beoogd. Daarnaast trachtte de ondersteuning ook indirect het schoolfeedbackgebruik te beïnvloeden (Gedrag) door feedbackgebruikers wegwijs te maken in de verschillende stappen voor systematisch feedbackgebruik. 142
Chapter 6 Tabel 2 Beschrijving van INSET-en ONSET-conditie Ondersteuners
INSET
ONSET
Twee medewerkers
Één van de twee medewerkers
Schoolfeedbackproject
van het Schoolfeedbackproject uit de INSET-ondersteuning
Opzet
Studievoormiddag
Schoolbezoek
Doelgroep
Meest betrokken persoon op
Bij voorkeur de schoolleider, de
school bij gebruik van het
zorgcoördinator en twee
schoolfeedbackrapport (keuze
leerkrachten (uiteindelijke keuze
aan de school overgelaten)
aan school overgelaten)
Deelnemers
- 23 deelnemers uit 23 scholen (10 in sessie 1 en 13 in sessie 2) - 20 schoolleiders en 3 zorgcoördinatoren
Planning
Ruim een maand na het
- 13 deelnemers uit 7 scholen - 6 x schoolleider met zorgcoördinator; 1 x schoolleider, zorgcoördinator en leerkracht Idem als INSET
ontvangen van het feedbackrapport Locatie
Universiteitsgebouw
Eigen school
Inhoud
- Aan de hand van
Idem als INSET maar aanvullend
feedbackrapport met fictieve
werd steeds een terugkoppeling
school
gemaakt naar het eigen
- Uitleg bij de concepten en
schoolfeedbackrapport.
representatievormen uit feedbackrapport - Leergesprek over de gebruiksmogelijkheden van de feedback - Toelichting over onderliggend schoolfeedbacksysteem - Inoefen- en evaluatiemoment Werkvorm
Een variatie van didactische
Idem als INSET, maar enkel met
werkvormen, van
eigen schoolteamleden
instructiegerichte presentaties tot vraaggesprekken en groepsdiscussies.
143
Chapter 6
3.2.
Selectie interviewrespondenten
Deze studie maakt deel uit van het Schoolfeedbackproject genaamd “Each school its own mirror” (Verhaeghe & Van Damme, 2006). In het kader daarvan ontvingen 195 Vlaamse scholen in het voorjaar 2008 feedback op vertrouwelijke basis, waarbij hun schoolresultaten vergeleken werden met een representatieve referentiegroep uit het SiBO-onderzoek (Maes, Van Petegem, & Van Damme, 2005). Het ging om gegevens van een cohorte leerlingen die (tot dan toe) van het einde van het kleuteronderwijs tot en met het vierde leerjaar opgevolgd werden voor wiskunde en taal (spelling, technisch en begrijpend lezen) aangevuld met informatie over instroomkenmerken van leerlingen. De centrale concepten in het feedbackrapport (leerwinst, toegevoegde waarde en gecorrigeerde scores) werden zodanig uitgelegd dat de noodzaak van statistische voorkennis zoveel mogelijk opgevangen werd. De feedbackdata werden ondersteund door grafische voorstellingswijzen (cirkeldiagram, groeicurven en kruistabellen). De tekst was gestandaardiseerd. Bijgevolg werd van schoolteamleden verwacht om zelf de schooleigen data te interpreteren aan de hand van de algemene uitleg. Uit deze groep scholen werden door toevalstrekking 45 basisscholen uitgenodigd om aan de ondersteuning deel te nemen. Daarvan namen er 23 deel aan de INSET- en 7 aan de ONSET-conditie (zie Figuur 1). 195 basisscholen uit Schoolfeedbackproject
45 vraag ondersteuning
150 geen vraag ondersteuning
23 INSET
7 ONSET
150 controle
6 interviews
6 interviews
6 interviews
Figuur 1. Overzicht steekproeftrekking.
Hoewel de toewijzing van de proefpersonen aan de verschillende condities random verliep, is een risico op selectievertekening mogelijk. Omdat dit een mogelijke bedreiging kan vormen voor de interne validiteit van het experiment werd op basis van eerder verzamelde gegevens onderzocht of deze subgroep op relevante criteria afweek van de populatie (N = 195). Uit deze analyses bleek dat de geselecteerde scholen niet statistisch significant verschilden op vlak van de houding ten aanzien van 144
Chapter 6
schoolfeedback, het verwachte gebruik van de schoolfeedback, de perceptie van relevantie van de schoolfeedback, de instroomkenmerken van leerlingen en de schoolprestaties uit de feedbackrapporten. Daarna werden door toevalstrekking uit iedere conditie zes respondenten geselecteerd voor deelname aan de interviews. 3.3.
Onderzoeksinstrument en –procedure
Data werden verzameld door middel van semigestructureerde diepteinterviews. Daartoe werden de schoolleiders een half jaar na de ondersteuningsinterventies bezocht op hun school door één van beide onderzoekers die de ondersteuningsinterventies hadden verzorgd. De interviewvragen zijn opgesteld volgens het eerder besproken conceptuele kader, passend binnen de vier evaluatieniveaus. Er werden geen vragen gesteld die rechtstreeks naar de invloed van ondersteuning op feedbackgebruik peilden, om antwoordvertekening door sociaalwenselijke antwoorden te vermijden. Doorvragen was toegelaten om meer verduidelijking of uitleg te krijgen (Lindlof & Taylor, 2002). Het interviewinstrument bestond uit een veertigtal vragen voor ruim een uur interviewtijd. Enkele voorbeelden van interviewvragen zijn: • Reactie: - Tevredenheid: Bent u tevreden over de ondersteuning van binnen en buiten de school samen die u in het kader van het gebruik van de schoolrapporten mocht genieten? • Leren: - Kennis en vaardigheden: Heeft u het gevoel voldoende vertrouwd te zijn met het interpreteren van dergelijke feedbackgegevens? Welke kennis en vaardigheden heeft men volgens u nodig om dit rapport correct te kunnen interpreteren? - Houding: Hoe staat u op dit moment tegenover het gebruik van schoolfeedback? Is het de investering waard? • Transfer - Fasen in gebruik: Graag zouden we willen weten welk traject het schoolrapport al heeft doorgemaakt sinds het hier in de school toekwam. Zou u kort kunnen aangeven welke stappen er werden gezet? - Types van gebruik: Heeft het schoolfeedbackrapport tot concrete actiepunten of beslissingen geleid? Bent u door het schoolfeedbackrapport anders gaan kijken naar uw school? • Resultaten
145
Chapter 6
- Effecten: Hoe zou u zelf de effecten van het gebruik dit schoolfeedbackrapport omschrijven? Ziet u ongewenste neveneffecten van het gebruik van dit schoolfeedbackrapport? • Ondersteuning - Genoten ondersteuning: Heeft u voor het interpreteren van de resultaten een beroep gedaan op anderen binnen de school? Heeft u een beroep gedaan op externen bij het interpreteren, diagnosticeren of gebruiken van het schoolfeedbackrapport? 3.4.
Analyse
Interviews werden opgenomen en nadien getranscribeerd. Daarna werden ze onafhankelijk door twee onderzoekers gecodeerd door middel van de kwalitatieve analysetool ATLAS.ti. Codes werden toegekend volgens de middle order approach, wat toelaat om aanvankelijk ruime categorieën later te verfijnen (Dey, 1993). De codering gebeurde hoofdzakelijk op een deductieve wijze volgens de codes uit een codeboek, gebaseerd op het theoretische kader. Eerst werden fragmenten geplaatst onder brede categorieën. Wanneer aan relevante passages geen voorgedefinieerde codes toegewezen konden worden, werden ze onder een brede categorie geplaatst om later aan nieuwe codes toegewezen te worden, die op inductieve wijze uit de data gegenereerd werden (Strauss & Corbin, 2007). Na de codering van de afzonderlijke interviews werden gegevens geanalyseerd volgens een case ordered predictor-outcome meta-matrix (Miles & Huberman, 1994). Bij deze analyse worden de respondenten opgedeeld volgens de onderzoekscondities waartoe ze behoren. Het doel van deze opzet is niet enkel om de cases afzonderlijk te beschrijven maar ook om een crosscase of variabele georiënteerde analyse uit te voeren. Deze werkwijze gaat in de richting van een verklarende analyse van de resultaten. De hypothese daarbij is dan de INSET- en ONSET-conditie zullen leiden tot een hogere mate van feedbackgebruik dan de controleconditie, met meer schoolverbeteringseffecten als gevolg. Om de sterkte van de aanwezige kenmerken te bespreken, maken we gebruik van gradatiecodes (afwezig – zwak aanwezig – sterk aanwezig – geen informatie). Zo werden per variabele strikte criteria opgesteld om te bepalen in welke mate het kenmerk aanwezig was. Deze gradaties maken het mogelijk een indicatie te geven van de sterkte van de variabelen zonder gegevens verregaand te kwantificeren. Om deze metamatrix (zie Figuur 2) te construeren werden volgende stappen ondernomen: 1. Anonimiseren van de transcripten om blind te kunnen coderen 146
Chapter 6
2. Volledige codering van de transcripten volgens het vooropgestelde codeboek 3. Samenvatting per case volgens de structuur van het codeboek 4. Toekenning van gradatiecodes per respondent, aan iedere variabele 5. Onderbrengen van cases in metamatrix, met gradatiecodes 6. Cases terug identificeren en ordenen volgens experimentele conditie en vervolgens naar gradatiecodes De interviews werden onafhankelijk gecodeerd door twee onderzoekers. Om de interbeoordelaarsbetrouwbaarheid na te gaan werden het codeboek en gradatieregels gezamenlijk opgesteld. Daarna werd een interview onafhankelijk door beiden gecodeerd en werd de interbeoordelaarsbetrouwbaarheid als de verhouding tussen het aantal overeenkomsten en het totale aantal toegekende codes onderzocht en verhoogd tot .87 (Kurasaki, 2000; Miles & Huberman, 1994).
4. Resultaten In de case ordered predictor-outcome meta-matrix (Figuur 2) worden de respondenten geordend per conditie en naar gradaties in feedbackgebruik. Hoe donkerder de celkleur, hoe sterker de betreffende variabele door de respondent werd gerapporteerd. In de volgende alinea’s behandelen we iedere variabele, zowel in algemene zin als per conditie. Daarbij wordt telkens verwezen naar het aantal respondenten per conditie dat een bepaalde uitspraak deed (C = Controlegroep, I = INSET-groep, O = ONSETgroep). 4.1.
Reactie
Om de tevredenheid na te gaan, werden de respondenten gevraagd om het totale pakket van de genoten ondersteuning te beoordelen, inclusief de INSET- of ONSET-ondersteuning en eventueel aanvullende interne en externe ondersteuning. Algemeen stellen we een toenemende tevredenheid van de genoten ondersteuning vast naarmate de intensiteit van de ondersteuning toeneemt. Zo blijkt de tevredenheid groter bij de ONSET-conditie dan bij de andere groepen. Enkele respondenten uit de controleconditie konden geen tevredenheidsuitspraken doen omdat er (quasi) geen ondersteuning op de school had plaatsgevonden (3C; / in Figuur 2). Om meer zicht te krijgen waarop de tevredenheidsuitspraken gebaseerd zijn, geven we een korte beschrijving van de genoten ondersteuning in de scholen. 147
/
= Afwezig = Zwak aanwezig = Sterk aanwezig = Niet van toepassing/ Geen informatie
Reactie
Tevredenheid
Leren
Kennis en vaardigheden Attitudes
Gedrag
Fasen in gebruik
Types gebruik
Resultaten
CONTROLEGROEP (C) 13 7 4 15 2 9 /
Ontvangen Lezen en bespreken Interpreteren Diagnose Planning acties Uitvoeren acties Evalueren acties Instrumenteel gebruik Conceptueel gebruik Symbolisch gebruik Strategisch gebruik Motiverend gebruik
/
/
8
INSET-GROEP (I) 17 14 1 16
12
6
ONSET-GROEP (O) 10 3 5 18 11
/
/
/
/
/
/ /
/
Effecten
Figuur 2. Impact van ondersteuning op schoolfeedbackgebruik: Resultaten in case ordered predictor-outcome meta-matrix.
148
/
Chapter 6
Gebruik interne ondersteuning De bevraagde respondenten blijken niet allemaal een beroep te hebben gedaan op expertise van andere teamleden in hun school. Slechts twee schoolleiders uit de controlegroep geven aan enige vorm van interne ondersteuning te hebben ondervonden terwijl dit bij alle ONSETrespondenten wel het geval was. In bijna alle gevallen werd de ondersteuning voorzien door de zorgcoördinator (2C, 6O) of zorgleerkracht (1C, 1I), al dan niet aangevuld met leden van een kernteam (1I, 1O). In één geval werd de ondersteuning aangeboden door de beleidsondersteuner (1I). Ik weet dat dit zijn nut heeft, maar je moet begrijpen dat wanneer je zo een rapportje krijgt, dat je nog andere dingen binnenkrijgt. Als schoolleiding moet je zien van: “Hoe zit dat hier in elkaar? Zo! Kort en bondig.” En als je dan verder dieper wil gaan dan kan je het aan je zorgcoördinator geven of aan een assistent, beleidsondersteuner, en dat die dat dan meer in detail gaan uitspitten. (Respondent 18) Verder blijkt dat indien er geen interne ondersteuning was, dat dit meestal door tijdsgebrek was (1C, 1I) of het wegens omstandigheden niet beschikbaar zijn van de zorgcoördinator (2C). Opvallend is dat leerkrachten niet vermeld werden als bron van ondersteuning. Gebruik externe ondersteuning Enkel voor de scholen uit de experimentele condities was per definitie externe ondersteuning aanwezig. Aanvullende vormen van externe ondersteuning werden over het algemeen niet gezocht. Slechts één respondent haalde aan een verkennend gesprek te hebben gevoerd met een pedagogisch begeleider (1I). Enkele redenen voor het beperkt aanspreken van pedagogische begeleiders werden aangegeven. Dezen zouden over onvoldoende expertise en middelen beschikken om scholen met dergelijke feedbackrapporten te begeleiden (1C, 1O) of zouden daarbij onvoldoende oog hebben voor de pedagogische eigenheid van de school (1C, 1O). De respondenten uit de controlegroep die wel noemenswaardig gebruikgemaakt hebben van de rapporten blijken meestal wel bijkomende ondersteuning gezocht te hebben (3C), bestaande uit een algemene studiedag over het onderzoeksproject, een informeel overleg binnen een samenwerkingsverband van methodescholen, of een overleg met de pedagogische begeleider en de schoolraad. 149
Chapter 6
4.2.
Leren
Respondenten werden gevraagd de kennis, vaardigheden en attitudes binnen de school te beschrijven. Er werd gekeken naar het gedeelde potentieel binnen de school eerder dan naar de individuele eigenschappen van de respondent. We kunnen niet eenduidig zeggen dat de ondersteuningsinitiatieven hebben geleid tot sterk verbeterde datageletterdheidscompetenties in deze studie. Hoewel de ONSET-groep er het beste lijkt uit te komen, lijkt de INSET-groep niet te verschillen van de controlegroep in competenties nodig voor het gebruik van hun schoolfeedbackrapport. Kennis en vaardigheden De grootste tekorten doen zich voor op vlak van kennis en vaardigheden nodig om de feedbackdata te interpreteren (2C, 5I, 2O), zelfs met de aangeboden uitleg in het rapport. De eerste keer dat ik er echt alleen mee op pad moest, was ik onzeker en was het zeker niet duidelijk. (Respondent 6) Andere tekorten in kennis en vaardigheden doen zich voor op het overbrengen van de informatie naar het schoolteam (1C, 1O) of in diagnosticeren en het plannen van acties (1C, 1I, 1O). Maar dat is dus overal het probleem, bijvoorbeeld ook als je iets aankaart bij een CLB [Centrum voor Leerlingenbegeleiding]. Ze doen testen en ze stellen dat en dat vast. En hoe moeten we dan verder? Daar geraken we dikwijls niet verder. Daar stopt het dikwijls. (Respondent 7) Verschillende verklaringen voor deze beperkte datacompetenties komen tijdens de interviews naar boven. Vooreerst geven respondenten aan dat de bestaande voorkennis vrij beperkt is en niet verder reikt dan eenvoudige statistieken bij klastoetsen (4I, 2O). Het ontbreken van deze voorkennis is verder ook te wijten aan de opleiding waarbij er onvoldoende aandacht is voor het leren gebruiken van data (1C, 1I). Daar worden wij elke dag meer en meer mee geconfronteerd. Maar dat vind ik persoonlijk ook een serieus mankement van de opleiding van onderwijzers en onderwijzeressen, dat de mensen daar niet mee vertrouwd zijn. Als je een aantal van die termen voorschotelt aan mijn collega’s, die slaan achterover. (Respondent 16)
150
Chapter 6
Scholen die geen moeilijkheden ondervinden hebben dat meestal te danken aan een uitgebreide voorkennis uit vooropleidingen of eerdere werkervaringen (2C, 3O). Een andere verklaring voor het ontbreken van deze datageletterdheid in sommige scholen zijn de directiewissels waarbij de nodige kennis niet doorgegeven wordt aan de opvolger (1C, 1I). Tenslotte is er op scholen een tijdsgebrek om datageletterdheidscompetenties op te bouwen en om data te kunnen interpreteren. Op die manier blijven ervaringen uit en wordt geen verdere kennis over deze schoolfeedbackrapporten opgebouwd. Enkele schoolleiders geven aan dat ze waarschijnlijk wel in staat zijn het rapport correct te interpreteren indien ze daarvoor voldoende tijd kunnen en/of willen vrijmaken (2C, 1I, 1O). Ik erken dat ik daar eigenlijk geen tijd in wil steken. Ik heb andere dingen die ook moeten gebeuren en dan vind ik dat dit te veel tijd vraagt in verhouding. (Respondent 18) Houding ten aanzien van schoolfeedback De positieve houding bij schoolleiders en zorgcoördinatoren is vooral te danken aan de groeiende interesse voor objectieve meetinstrumenten die de leerwinst in kaart brengen en een vergelijking met een referentiegroep mogelijk maken (4C, 2I, 5O). Volgens de respondenten zou die houding bij leerkrachten een stuk negatiever zijn (3C, 3I, 3O). Deze negatieve houding bij leerkrachten zou ondermeer toe te schrijven zijn aan de grote taaklast bij de dataverzameling (1I, 2O), een ongerustheid om negatief uit de resultaten te komen (1C, 2O), en het als bedreigend ervaren van externe evaluaties (1I). Leerkrachten zouden bovendien een voorkeur hebben feedback op leerlingenniveau (1C, 1I). Elke onderwijzer of onderwijzeres heeft puntjes, heeft een puntenboek, een Excel-werkmap en noem maar op. Dat zijn allemaal individuele resultaten van de kinderen. Het gaat over de kinderen zeggen zij, en de kinderen zijn belangrijk. Maar voor mij is de school belangrijk. Over de individuele kinderen heen kijken naar de prestaties van een school of van een groep binnen de school is niet evident voor ons. (Respondent 1) Bepaalde respondenten relativeren het nut van de feedbackrapporten door te wijzen op de beperkingen. Daarbij verwijzen ze naar de beperkte bewijskracht van de feedback waarbij slechts één cohorte leerlingen gevolgd werd (1C, 1I, 1O) die bovendien soms door leerlingenmobiliteit behoorlijk onstabiel is (1I). Bovendien doet ook de inhoudelijke overlap met 151
Chapter 6
andere beschikbare gegevensbronnen (3I) de meerwaarde van deze schoolfeedbackrapporten in vraag stellen. Daarnaast onthullen de beweegredenen om deel te nemen aan het Schoolfeedbackproject iets over de houding van de respondenten. Enkelingen nemen bijvoorbeeld enkel deel aan dit onderzoek omdat ooit het engagement aangegaan is, al dan niet door een vorige directie (2C, 2I, 1O). Ik vind het zelf jammer. Mijn voorganger is hiermee, om welke reden dan ook, mee gestart. Ik kan hem jammer genoeg niet meer vragen waarom. (…) Moest ik daar helemaal vanaf het begin mee gestart zijn, dat zou ik er zelf ook wel voor gekozen hebben om het samen met het team te dragen. Dan zou het er een stukje anders uitzien. (Respondent 8) 4.3.
Gedrag
Uit Figuur 2 blijkt dat de sterkte van datacompetenties positief samenhangt met de sterkte van gebruik. Indien we op zoek gaan naar verschillen tussen de conditiegroepen in feedbackgebruik, dan uiten deze zich vooral in de intensiteit van het lezen en bespreken, interpreteren en diagnosticeren van de feedbackinformatie, ten verdienste van respectievelijk de ONSET-, INSET- en controlegroep. Wat opvalt, is dat de scholen die teamleden het verregaandst bij deze processen betrekken, allen uit de experimentele condities komen. Fasen in schoolfeedbackgebruik Het goed ontvangen van de schoolfeedback lijkt een vanzelfsprekendheid maar blijkt dat niet steeds te zijn. Zo moesten twee scholen niet eens aan verdere plannen denken, omdat de feedback nooit uit de mailbox van de schoolleider is geraakt (1C, 1I). Bijgevolg kunnen de fasen van lezen, interpreteren en diagnosticeren enkel in de andere scholen in kaart worden gebracht. Slechts enkele schoolleiders kiezen ervoor alle leerkrachten actief te betrekken bij deze fasen (2O). In de andere gevallen worden leerkrachten enkel op de hoogte gebracht van de resultaten in een personeelsvergadering (3I, 4O), via individuele besprekingen (1C, 1O) en/of door rapporten vrijblijvend ter inzage aan te bieden (2C, 2I, 1O). In bepaalde gevallen worden resultaten eerst apart behandeld in een kernteam alvorens ze via een personeelsvergadering mee te delen (1I, 2O). Over het algemeen blijft de informatieverspreiding bij deze groep respondenten erg beperkt, zowel naar het aantal betrokkenen als naar de aard van de informatie toe. 152
Chapter 6
Ik krijg het binnen, ik bekijk het, ik stel het voor aan de leerkrachten en ik stel het voor op de personeelsvergadering. Daar houdt het meestal ook op, veel verder gaat het niet. (Respondent 16) Diegenen die ervoor kiezen de resultaten niet actief in het hele team te verspreiden hebben daar verschillende redenen voor. Sommigen hebben nog niets gedaan met de resultaten (2C, 1I) of verspreiden nooit dergelijke informatie (1C). Anderen zijn van mening dat de resultaten wegens leerkrachtenwissels (1C) of leerlingenmobiliteit geen valide beeld geven (1I) of voelen zich te onzeker over de interpretatie en gebruiksmogelijkheden (2C, 1I). Eerlijk gezegd is dit voor mij heel moeilijk om dat juist in te schatten. Dat vertel ik ook niet aan mijn leerkrachten omdat die dan misschien denken dat ik foute informatie geef terwijl dat ik denk dat het heel belangrijk is, maar ik kan het op dit moment niet juist inschatten. (Respondent 16) De resultaten illustreren dat slechts een minderheid van schoolleiders toekomt aan het plannen van acties (2C, 1I, 4O). Uit iedere conditie blijkt slechts één school overgegaan te zijn tot het implementeren van acties. Niet verwonderlijk is dat geen enkele school reeds toegekomen is aan het evalueren van de uitgevoerde acties. Soorten schoolfeedbackgebruik Het voorafgaande negatieve beeld vereist nuancering. De feedbackgegevens kunnen namelijk een invloed hebben op de schoolwerking zonder meteen uit te monden in concrete acties. Dat blijkt ook uit de resultaten, aangezien er meer sprake is van een conceptueel, symbolisch en strategisch gebruik dan van een instrumenteel gebruik. Zo rapporteert twee derde van de respondenten conceptueel gebruik (3C, 4I, 5O). Enkele waardevolle illustraties zijn het nauwgezetter gaan kijken naar resultaten (1C), het waakzamer zijn bij mindere resultaten (1C), het oordeel aanpassen over individuele leerkrachten n.a.v. goede resultaten (1C), het leren denken in leerevoluties in plaats van in aparte leerjaren (1C, 1I, 2O), het verruimen van de blik door de vergelijking met een referentiegroep (1I, 1O) en het genuanceerder kijken door gecorrigeerde scores (2I, 1O). Sommige scholen zijn overgegaan tot acties en rapporteren instrumenteel gebruik (1C, 1I, 1O) zoals de beslissing om te werken aan de schrijfmotoriek van de kinderen, om niveaulezen en leesmoeders in te voeren en om de aanpak van begrijpend lezen te veranderen. Daarnaast blijkt dat schoolleiders de resultaten gebruiken uit strategische doeleinden voor de onderwijsinspectie (3C, 4I, 2O). Soms 153
Chapter 6
gebeurt dit op een manier waarbij eerder het accent ligt op verantwoording dan op schoolverbetering (3I). De inspectie is verzot op het outputdossier en ik heb een heel kaftje met allerlei gegevens in en dat is daar een onderdeel van. Op een bepaald moment kregen die mannen dat onder ogen. (…) Die vragen altijd om al het materiaal te geven dat je hebt en die kaft was daar ook bij. (…) In die kaft zitten allerlei gegevens die ik heb over de kinderen en dat is voor hen een stokpaardje en dat past daar perfect in. Daar heb ik goed mee gescoord ondanks het feit dat ik het niet begreep. (Respondent 16) Nagenoeg alle scholen die nog geen onderwijsinspectie over de vloer kregen (/ in Figuur 2) geven aan dat ze de resultaten zouden voorleggen tijdens een doorlichting (2C, 2I, 2O). In een enkele school werden de rapporten gebruikt als vorm van publiciteit om leerlingen aan te trekken (1O). Niet iedere schoolleider staat daar echter voor open (2C). Misschien zijn er scholen die dat wel zouden willen gebruiken moesten ze allemaal zo heel hoog boven de curve uitsteken maar ik vind dat niet direct een goede manier om ouders of buitenstaanders om de oren te slaan met die grafieken en met dat cijfermateriaal. (Respondent 2) Heel wat schoolleiders gebruiken schoolfeedbackrapporten op een symbolische manier. Bestaande argumenten worden dan bijvoorbeeld kracht bijgezet door de resultaten (1C, 4I, 4O). Zo trachtte een schoolleider zijn teamleden ervan te overtuigen dat het niet is omdat kinderen anderstalig zijn, dat ze geen hoge scores kunnen behalen (1I). Één specifiek voorbeeld gaat niet over het overtuigen van leerkrachten maar wel van ouders. De school is ervan overtuigd dat leerlingen duidelijk leerwinst maken en daarom gestimuleerd moeten worden om volgens capaciteiten een studierichting te kiezen in het secundair onderwijs (1O). Nog een andere schoolleider haalt aan dat deze resultaten enkel gebruikt zijn omdat ze aansloten bij eerdere bevindingen van de school (1O). Daarnaast kan symbolisch gebruik ook inhouden dat resultaten doelbewust niet in team besproken worden omdat dit op dit moment niet constructief zou zijn voor de schoolwerking (1C). Respondenten uit de twee experimentele groepen blijken de resultaten meer op een symbolische manier te gebruiken. Dat geldt ook voor het motiverend gebruik van de schoolfeedback. Leerkrachten krijgen bijvoorbeeld een schouderklopje en bevestiging van het goede werk (1C, 3I, 4O) en/of net een signaal om verder iets te doen met de mindere resultaten (1I, 2O).
154
Chapter 6
Wij hadden altijd wel het idee van als we kijken naar ‘de grondstoffen’ die we binnenkrijgen en de kwaliteit van ‘de grondstoffen’, en zien wat we afwerken, dan moeten we zeggen: “Kijk we hebben toch wel goed werk geleverd”. Maar dat was altijd op basis van een gevoel. En nu eindelijk hebben we die houvast, doordat het wordt bevestigd door onderzoek. (Respondent 11) De feedback blijkt ook zelfvertrouwen te kunnen geven aan schoolteamleden, door te bevestigen dat de school het goed doet (1C, 1O). In sommige scholen waar ook mindere resultaten werden geboekt, werden enkel de positieve resultaten benadrukt, precies om te werken aan een positieve houding van het team ten aanzien van schoolfeedback (1I) of uit schrik om leerkrachten onterecht met de vinger te wijzen (1I). 4.4.
Resultaten
De uiteindelijke bedoeling van ondersteuning bij feedbackgebruik is bij te dragen tot schoolverbeteringseffecten. Een half jaar na het ontvangen van het feedbackrapport blijken enkele scholen reeds waardevolle effecten te rapporteren (2C, 1I, 2O). Er kan echter geen duidelijk verband aangetoond worden tussen de bereikte effecten en de drie onderzoekscondities. Wanneer we deze effecten nader bekijken zien we dat er mede dankzij het gebruik van dit rapport een grotere alertheid is gegroeid bij leerkrachten voor het uitvallen van leerlingen (1C), leerkrachten een duidelijker beeld hebben gekregen van de evolutie van de leerlingen (1C), er meer vertrouwen is in de werking van de school (1C, 2O) en een kritischere houding kwam tegenover de eigen schoolprestaties (1I). Twee scholen voelen zich dankzij de positieve resultaten heropgewaardeerd in de buurt (2O). Maar een naam of een faam die een school heeft in een buurt veranderen is heel moeilijk. En met contacten buiten, met ouders, komt dat nu nog geregeld ter sprake van: “Kijk, is dat wel een goede school? Zijn jullie wel goed bezig? Zou ik mijn kinderen niet beter naar een andere school doen?” En leerkrachten twijfelden vroeger dan voor een stuk aan hun eigen kunnen. Nu zijn ze daar ook veel directer in en gaan ze in contact met ouders ook veel meer durven zeggen van: “Neen, wij zijn goed bezig, wij hebben onze resultaten”. (Respondent 11) Daarnaast doen zich echter ook ongewenste of onvoorziene effecten voor. In één school leidde het invoeren van het schoolfeedbacksysteem tot teaching to the test (1C), ook al ging dat tegen de visie van de schoolleider in. 155
Chapter 6
Maar wie bedriegen ze daar uiteindelijk mee? Zichzelf toch! Je gaat toch als leerkracht toch niet naar die toetsen werken of je gaat ze toch geen gelijkaardige test geven zodat ze volgende week goed zouden scoren? Dan vallen die gewoon uit als ze in het middelbaar onderwijs komen. Dan ben je als school toch ook niet meer geloofwaardig met de resultaten die je naar voor brengt? (Respondent 7) In een andere school leverde het toetsen van de leerlingen een gevoel van teleurstelling en demotivatie op omdat de resultaten minder goed bleken dan verwacht (1O).
5. Discussie en conclusie 5.1.
Schoolfeedbackgebruik en ondersteuning
Vooreerst wijzen de onderzoeksresultaten op een grote variatie in de manier waarop scholen vormgeven aan schoolfeedbackgebruik. Ook de effecten van dit gebruik zijn zeer divers. In de volgende alinea’s concluderen we dat in het verklaren van de verschillen tussen scholen de theoretische verwachtingen in grote lijnen bevestigd werden. Daarbij werden twee variabelen nader bekeken. Een eerste variabele betrof de datageletterdheidscompetenties om met het onderzochte schoolfeedbackrapport aan de slag te gaan. Over het algemeen heeft de meerderheid van de respondenten nog moeite met de interpretatie van de data. Als het in deze stap misgaat, is een verder succesvol gebruik niet gegarandeerd (Earl & Fullan, 2003). Zo stelt Bandura (1977) dat het geloof in eigen kennis en vaardigheden belangrijk is om tot actie over te gaan. Ook voor de volgende fasen in gebruik blijken beperkte competenties vooralsnog een rem zijn. Competenties bestaan naast kennis en vaardigheden ook uit attitudes. Daarvoor werd gepeild naar de houding van de respondenten ten opzichte van schoolfeedbackgebruik, wat ook geen overwegend positief verhaal opleverde. De eerdere bevinding dat schoolleiders een positievere houding hebben dan leerkrachten werd door deze studie bevestigd (Vanhoof, Van Petegem, & De Maeyer, 2009; Zupanc et al., 2009). Leerkrachten hebben blijkbaar minder de kans om de meerwaarde en functionaliteit van schoolfeedbackgebruik te ervaren maar worden wel geconfronteerd met de lasten van de dataverzameling (Ingram, Louis, & Schroeder, 2004; Verhaeghe et al., 2010). Ze zijn minder vertrouwd met het gebruik van gegevens op schoolniveau en vinden dat de resultaten op groepsniveau te
156
Chapter 6
veraf staan van hun activiteiten op klasniveau (Schildkamp & Kuiper, 2010; Zupanc et al., 2009). Bovendien komt daar nog eens het bedreigende karakter van externe evaluaties bovenop, wat angst inboezemt voor individuele evaluaties (Ingram et al., 2004), ook in het geval wanneer het schoolfeedbackgebruik in het teken van zelfevaluatie eerder gericht is op de schoolwerking dan op aparte individuen (Kyriakides & Campbell, 2004). De tweede onderzochte variabele betrof ondersteuning bij schoolfeedbackgebruik. Ondersteuning werd in deze studie geoperationaliseerd in een INSET- en ONSET-conditie (Gardner, 1995). Door middel van gecontroleerde experimentele ondersteuningsinterventies bleek het mogelijk om differentiële effecten in de verschillende condities te onderzoeken. Ook al was de opzet beperkt door zijn eenmalige interventie, kleinschaligheid en verkennende resultaten, toch bood dit gecontroleerde design enkele waardevolle inzichten. Aan de hand van Kirkpatricks evaluatiemodel voor trainingsinitiatieven (1998) werden interventieeffecten op vier niveaus beschreven. Op het reactieniveau kunnen we zeggen dat de tevredenheid over de genoten ondersteuning groter was indien meer ondersteuning er genoten werd. De respondenten uit de controle- en INSET-groep gaven aan niet actief naar interne en externe ondersteuning gezocht te hebben. ONSETdeelnemers deden vaker beroep op schoolteamleden, waarschijnlijk doordat de zorgcoördinator ook betrokken was geweest in de ONSETinterventie. Deze respondenten drukten dan ook de grootste mate van tevredenheid uit. Deze resultaten indiceren dat de respondenten eerder een aanbodgerichte houding voor ondersteuning aannemen aangezien spontaan zeer beperkt actief beroep gedaan wordt op schoolteamleden of externe ondersteuningsdiensten. Verder is het opvallend dat leerkrachten zelden gezien worden als ondersteuningsbron. Verschillende verklaringen kwamen uit de resultaten naar voor. Zo hebben leerkrachten minder de mogelijkheid om zich van hun drukke taakschema los te maken (Huffman & Kalnin, 2003; Ingram et al., 2004) en houden ze er een minder positieve houding ten opzichte van schoolfeedbackgebruik op na (Ingram et al., 2004; Zupanc et al., 2009). Zorgcoördinatoren daarentegen worden wel aangesproken omdat zij vaak over de nodige datageletterdheidscompetenties beschikken door hun ervaring in het lezen en interpreteren van data uit testen en leerlingvolgsystemen. Bovendien valt schoolfeedbackgebruik te plaatsen onder hun taak van zorgcoördinatie op school. Bij het leerniveau werd de invloed bekeken van ondersteuning op de nodige datacompetenties om met het rapport om te gaan. De ONSET-groep kwam er als beste uit, gevolgd door de controlegroep. Op basis van deze 157
Chapter 6
bevindingen besluiten dat beide ondersteuningscondities eenduidig een positief effect gehad hebben op het schoolfeedbackgebruik is dus voorbarig. Daarom is het nodig om ook naar het volgende niveau te kijken, waarbij de transfer van de geleerde inzichten uit de ondersteuning op de organisatie wordt bekeken. We zien voor de experimentele groepen een zichtbaar voordeel voor de lees- , interpretatie- en diagnosefase. Slechts enkelen gaan over tot acties, waarbij geen duidelijk verschil tussen de condities uitgemaakt kan worden. Dit houdt in dat slechts beperkt instrumenteel gebruik waargenomen wordt. We nemen echter wel een verscheidenheid in conceptueel, symbolisch, strategisch en motiverend gebruik waar. Net zoals een verandering in denken kan leiden tot een verandering in handelen gaat een conceptueel gebruik een instrumenteel gebruik vooraf (Schildkamp & Teddlie, 2008; Vanhoof, Verhaeghe, Verhaeghe, Valcke, & Van Petegem, in druk). Deze resultaten zijn in die zin hoopgevend aangezien zo schoolfeedbackgebruik misschien geleidelijk aan ingang vindt in Vlaamse scholen. Echter, om dit gebruik op een hoger niveau te tillen en te integreren in bestaande kwaliteitszorg zijn bijkomende ondersteuning en middelen nodig. Gegeven de beperkte gebruiksresultaten en de eerder beperkte tijd tussen de geboden ondersteuning en de dataverzameling, is slechts beperkt sprake van schoolverbeteringseffecten. Dit houdt bijgevolg in dat geen verschillen tussen condities gevonden kunnen worden. 5.2.
Praktijkimplicaties
Uit deze onderzoeksresultaten volgt dat bij het opzetten van ondersteuningsinitiatieven bij schoolfeedbackgebruik, vooraf best een grondige behoefteanalyse gebeurt om zicht te krijgen op de ondersteuningsnoden van schoolleiders en hun teamleden. Vermits de ONSET-conditie over de gehele lijn er als beste uitkomt in dit onderzoek, kan dit implicaties hebben voor de opzet van ondersteuning. Door ondersteuning aan te bieden op de eigen school, aan de hand van de eigen data, met het eigen team wordt blijkbaar het best op deze ondersteuningsbehoeften ingespeeld. Dat geeft aan dat een persoonlijke ondersteuning op maat verkozen wordt boven een veralgemeende aanpak in studiedagen. Aansluitend bij deze werkwijze kan verwezen worden beschikbare literatuur rond het opzetten van collaborative data teams (Huffman & Kalnin, 2003; Lachat & Smith, 2005; Wayman et al., 2007). Voor de ondersteuners houdt dat ondermeer in dat ze duidelijk zicht moeten hebben op de schoolsituatie en in staat moeten zijn hun ondersteuning op maat af te stellen. Aanvullend dient benadrukt te worden dat 158
Chapter 6
ondersteuning niet mag ophouden bij de interpretatie van de gegevens. Scholen zouden minstens een aanzet moeten krijgen om met de gegevens aan de slag te gaan. Verder dient men bij het opzetten van schoolfeedbacksystemen in acht te nemen dat ondersteuning niet zomaar de sleutel tot succesvol gebruik is. Schoolleiders percipiëren namelijk niet enkel een gebrek aan datageletterdheidscompetenties, maar eveneens een gebrek aan tijd. Schoolfeedbackgebruik wordt daardoor niet geïntegreerd in een systematisch reflecteren over de schoolwerking. Extra middelen voor zowel beleidsmakers als leerkrachten om tijd voor deze taak vrij te maken kunnen als een voorwaarde gezien worden. Eveneens zou meer aandacht in de opleidingen voor leerkrachten en schoolleiders voor deze kwestie een bevorderende factor kunnen zijn. 5.3.
Wetenschappelijke relevantie en implicaties voor vervolgonderzoek
Vooreerst heeft deze opzet aangetoond dat Kirkpatricks model bruikbaar is voor verdere toepassing in onderzoek over ondersteuning bij datagebruik. Verder leverde deze studie een waardevolle poging om binnen deze context een gecontroleerde veldstudie op te zetten, wat nieuw is voor dit onderzoeksdomein. Voor vervolgonderzoek kan aanbevolen worden om deze onderzoekslijn verder uit te bouwen door quasi-experimenteel onderzoek op te zetten in educatieve contexten waar schoolfeedbackgebruik reeds verder uitgebouwd is. Zo dienen de mogelijke differentiële effecten te verklaren door gebruikersgebonden kenmerken verder onderzocht te worden. In aanvulling op deze studie kunnen kwantitatieve gegevensverzamelingen daartoe aangewend worden (bv. Vanhoof et al., in druk). Daarnaast bevelen we aan om de effecten van langetermijnondersteuning na te gaan. Longitudinaal onderzoek kan helpen verklaren of de gevonden verschillen tussen condities deels te wijten zijn aan de genoten ondersteuning of enkel aan verschillen tussen gebruikers. Dit neemt niet weg dat studies rond eenmalige ondersteuningsinitiatieven de nodige aandacht verdienen, zowel omdat ze een realiteit zijn in educatieve settings alsook omdat bijvoorbeeld deze onderzoeksresultaten waardevolle invloeden rapporteren. Daarbij is er best aandacht voor zowel korte- als langetermijneffecten, alsook voor effectgerichte en procesgerichte resultaten (Schildkamp & Teddlie, 2008; Schildkamp et al., 2009).
159
Chapter 6
Literatuur Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191-215. Bosker, R.J., Branderhorst, E.M., & Visscher, A.J. (2007). Improving the utilisation of management information systems in secondary schools. School Effectiveness and School Improvement, 18(4), 451-467. Coe, R. (2002). Evidence on the role and impact of performance feedback in schools. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger. Earl, L., & Fullan, M. (2003). Using data in leadership for learning. Cambridge Journal of Education, 33(3), 383-394. Fitz-Gibbon, C.T., & Tymms, P. (2002). Technical and ethical issues in indicator systems: Doing things right and doing wrong things. Education Policy Analysis Archives, 10(6), 68-83. Gardner, R. (1995). Onservice Teacher Education. In L. W. Anderson (Ed.), International Encyclopedia of Teaching and Teacher Education (pp. 628632). London: Pergamon Press. Gonczi, A. (1994). Competency based assessment in the professions in Australia. Assessment in Education: Principles, Policy & Practice, 1(1), 2744. Herman, J., & Gribbons, B. (2001). Lessons learned in using data to support school inquiry and continuous improvement: Final report to the Stuart Foundation. Los Angeles: University of Carolina, Center for the Study of Evaluation. Huffman, D., & Kalnin, J. (2003). Collaborative inquiry to make data-based decisions in schools. Teaching and Teacher Education, 19, 569-580. Ingram, D., Louis, K.S., Schroeder, R.G. (2004). Accountability policies and teacher decision making: Barriers to the use of data to improve practice. Teachers College Record, 106(6), 1258-1287. Kerr, K.A., Marsh, J.A., Ikemoio, G.S., Darilek, H., & Barney, H. (2006). Strategies to promote data use for instructional improvement: Actions, outcomes, and lessons from three urban districts. American Journal of Education, 112, 496-520. Kirkpatrick, D.L. (Ed.). (1998). Evaluating training programs: The four levels. San Francisco: Berrett-Koehler. Kurasaki, K.S. (2000). Intercoder reliability for validating conclusions drawn from open-ended interview data. Field Methods, 12(3), 179-194). Kyriakides, L., & Campbell, R.J. (2004). School self-evaluation and school improvement: A critique of values and procedures. Studies in Educational Evaluation, 30, 23-36. 160
Chapter 6
Lachat, M.A., & Smith, S. (2005). Practices that support data use in urban high schools. Journal of Education for Students Placed at Risk, 10(3), 333349. Learning Point Associates. (2004). Guide to using data in school improvement efforts: A compilation of knowledge from data retreats and data use at learning point associates. Opgehaald op 23 oktober 2007, van http://www.learningpt.org/pdfs/datause/guidebook.pdf Leithwood, K., & Aitken, R. (1995). Making schools smarter: A system for monitoring school and district progress. Newbury Park, CA: Corwin. Lindlof, T.R., & Taylor, B.C. (2002). Qualitative communication research methods (2nd ed.). London: Sage. Maes, F., Van Petegem P., & Van Damme, J. (2005). Schoolloopbanen in het basisonderwijs (SiBO): Doelstellingen en onderzoeksopzet. Paper gepresenteerd op de Onderwijs Research Dagen, Gent, België. Mathison, S. (1992). An evaluation model for inservice teacher education. Evaluation and Program Planning, 15, 255-261. Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed.). Thousand Oaks, CA: Sage. Murnane, R.J., Sharkey, N.S., & Boudett, K.P. (2005). Using studentassessment results to improve instruction: Lessons from a workshop. Journal of Education for Students Placed at Risk, 10(3), 269–280. Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp 3-16). Oxford, UK: Elsevier Science. Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic approach. Thousand Oaks: Sage. Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’ data: A science in the service of an art? Paper presented at the British Educational Research Association Conference, Brighton, University of Sussex. Schildkamp, K., & Kuiper, W. (2010). Data-informed curriculum reform: Which data, what purposes, and promoting and hindering factors. Teaching and Teacher Education, 26(3), 482-496. Schildkamp, K., & Teddlie, C. (2008). School performance feedback systems in the USA and in The Netherlands: A comparison. Educational Research and Evaluation, 14(3), 255-282. Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school self-evaluation instrument. School Effectiveness and School Improvement, 20(1), 69-88. Tymms, P. (1995). Influencing educational practice through performance indicators. School Effectiveness and School Improvement, 6(2), 123-145. 161
Chapter 6
Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatieindicatoren als strategisch instrument voor schoolontwikkeling. Pedagogische Studiën, 81, 338-353. Vanhoof, J., Van Petegem, P., & De Maeyer, S. (2009). Attitude towards school self-evaluation. Studies in Educational Evaluation, 35, 21-28. Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in druk).The influence of competences and support on school performance feedback use. Educational Studies. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using school performance feedback: Perceptions of primary school principals. School Effectiveness and School Improvement, 21(2), 167-188. Verhaeghe, J.P., & Van Damme, J. (2006). School performance feedback in Vlaanderen, een schets op basis op van een projectvoorstel. Informatie vernieuwing onderwijs (IVO), 27(103), 19-27. Visscher, A. J. (2002). A framework for studying school performance feedback systems. In A.J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 41-71). Lisse, The Netherlands: Swets & Zeitlinger. Visscher, A.J., & Coe, R. (Eds.). (2002). School improvement through performance feedback. Lisse, The Netherlands: Swets & Zeitlinger. Visscher, A.J., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321-349. Wayman, J.C., Midgley, S., & Stringfield, S. (2007). Leadership for databased decision making: Collaborative educator teams. In A.B. Danzig, K. M. Borman, B.A., Jones & W.F. Wright (Eds.), Learner-centered leadership: Research, policy and practice (pp. 189-205). New Jersey, USA: Lawrence Erlbaum Associates. Weiss, C.H. 1998. Have we learned anything new about the use of evaluation? American Journal of Evaluation, 19(1), 21-33. Webber, S., & Johnston, B. (2000). Conceptions of information literacy: New perspectives and implications. Journal of Information Science, 26(6), 381-397. Williams, D., & Coles, L. (2007). Teachers’ approaches to finding and using research evidence: An information literacy perspective. Educational Research, 49(2), 185-206. Young, V.M. (2006). Teachers’ use of data: Loose coupling, agenda setting, and team norms. American Journal of Education, 112, 521-547. Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for effectiveness and improvement in classrooms and schools in upper
162
Chapter 6
secondary education in Slovenia: Assessment of/for Learning Analytic Tool. School Effectiveness and School Improvement, 20(1), 89-122.
163
CHAPTER 7 GENERAL DISCUSSION AND CONCLUSION: FEEDBACK ON FEEDBACK
164
Chapter 7
CHAPTER 7: GENERAL DISCUSSION AND CONCLUSION: FEEDBACK ON FEEDBACK 1. Introduction In this final chapter of this doctoral dissertation on school performance feedback, an overall reflection is presented about the outcomes of the different studies. By resuming, integrating and summarizing these results, a comprehensive picture is developed in relation to the research objectives (RO). In addition, a general discussion is provided. The latter also requires us to discuss the limitations of the different studies, and directions for future research. After giving an overview of theoretical, practical, methodological and policy implications, we finally present a general conclusion.
2. Overview of research objectives and main findings 2.1. RO1: Exploring the characteristics of SPFSs Numerous school feedback initiatives have been set up to provide schools with confidential information about the way they function. This is expected to foster school improvement processes by inducing continuous selfreflection at the school level. However, up to now, no systematic description or inventory of SPFSs characteristics was available to inform feedback users and/or designers. Given that SPFS characteristics may influence the degree to which the feedback is actually used for school improvement (e.g., Schildkamp & Visscher, 2009; Verhaeghe et al., 2010), it is important that SPF designers and/or users consider in a critical way the key features of an SPFS. To make decisions based on data, users need to purposefully choose the type of SPFS that corresponds to their information needs. This requires the availability of a transparent overview of specific characteristics of available SPFSs, especially including their strengths and weaknesses. In Chapter 2, a preliminary framework was developed for describing and comparing SPFSs, which has been applied to five SPFSs. This framework comprehends analytical aspects related to the data gathering process, the data analysis approach, the content of the feedback report and about the numerical measures and graphical representations being used. The results of the surveys and in-depth interviews with directors of five SPFSs illustrate the wide variety in both the feedback reports and the underlying feedback system. Apparently, the SPFS designers did make deliberate decisions related to their feedback design, considering the 165
Chapter 7
ethical, practical, technical and infrastructural possibilities and constraints of the educational system in which they operate. With respect to the quality criteria of performance indicators, this descriptive and analytical study paid specific attention to the relevance, accuracy, cost-effectiveness, fairness and beneficence of the feedback delivered to schools (Fitz-Gibbon, 1996; Heck, 2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher, 2002). These quality criteria introduce the presence of several prerequisites, related to all components of an SPFS. First, with regard to the data gathering process, several procedures are built-in, in order to guarantee accurate (i.e. reliable and valid) data. Both testing instructions (protocols) and structured measurement instruments are supplied to the schools. An interesting observation in relation to some of these instruments is the technological features that enable tailored testing of pupils, at any moment, about any subject, at any place. The integration of test item banks, IRT techniques, computer adaptive testing and data compatibility with a school’s management information system seems to be the most promising way to attain accessible and low stakes testing. A clear example of the latter type of SPFS is the assessment Tools for Teaching and learning, developed at the Auckland University in New Zealand. Next, our study focused on several aspects of the data analyses being used by SPFSs: on (1) the underlying scaling models being used, (2) the data analysis model, (3) the opportunities for longitudinal measurements, (4) the inclusion of pupil mobility and (5) the levels of aggregation. A wide variety in scaling procedures and statistical analyses could be observed in the selected SPFSs. A key point of discussion is related to finding a balance between statistical correct - and thus complicated - analyses and accurate results on the one hand and understandable analyses and user friendly results on the other hand. For example, the analyses used in PIPS (Performance Indicators in Primary Schools; Centre for Evaluation and Monitoring) are fairly straightforward and not too complex. Though this underpins the user-friendliness of PIPS, it might also lead to less accurate data as schools are sometimes wrongly classified due to the lack of a multilevel analysis perspective (Goldstein & Spiegelhalter, 1996; Karsten, Visscher, Dijkstra, & Veenstra., 2010). Furthermore, it is important to realize that measuring can introduce types of error, of which users should be informed (Fitz Gibbon & Tymms, 2002; Mortimore & Sammons, 1994; Rowe, 2004; Goldstein & Myers, 1996; Goldstein & Spiegelhalter, 1996; Yang et al., 1999; Karsten et al., 2010). Finally, in view of the analysis of the five SPFSs a discussion started in relation to different conceptions of value added; in particular when it comes to a fair comparison of a school’s 166
Chapter 7
performance with a reference group. The discussion about value-added illustrates that both the conceptualization and operationalization of this concept is highly problematic, due to actual constraints (e.g., pupil mobility), ethical constraints (e.g., adjustment for pupil characteristics), technical constraints (e.g., model complexity), and practical constraints (e.g., immeasurability of variables). Next, the feedback content of the SPFSs has been analyzed in order to evaluate the data relevance. The SPFSs in this study focus mainly on a limited number of cognitive outcomes (e.g., in relation to language, mathematics and/or science), which are part of the core curriculum in most countries. Developers of SPFSs might consider how to include other subject areas in the SPFSs, as well as more attitudinal, behavioral and contextual information, because the latter is critical when school staff is expected to make data-driven improvement decisions. They will need a broader range of data (Schildkamp & Kuiper, 2010). When analyzing the feedback content, the analysis of the five SPFSs also centered on the numerical measures and graphical representations being used. We could observe the use of a wide range of numerical measures, comprising adjusted, expected, predicted and raw scores. Examples are band scores, cut-off scores, grade scores, learning gain scores, mean scores, percentages, percentiles, rescaled scores, standardized scores, and value-added scores. These measures and the accompanying graphical representations assume a sufficient level of assessment literacy of SPFS users in view of a correct understanding of the feedback. However, research revealed that even simple numerical conceptions and representations are often interpreted incorrectly (Earl & Fullan, 2003; Zupanc et al., 2009). This raises the question whether feedback suppliers also ought to provide specific support to guarantee that the feedback delivered can lead to the desired school improvement outcomes and cannot result in harmful effects (Fitz Gibbon & Tymms, 2002; Rowe, 2004). 2.2. RO 2: Developing a framework for SPF use, including influencing factors and effects To develop a framework for SPF use (Chapter 3), we could build on a basic model developed by Visscher (2002; Visscher & Coe, 2003). His framework discerns four sets of factors influencing the use of the performance feedback, including the design process and features of the underlying SPFSs, the implementation process and the school organizational features. This framework served as a basis for the studies conducted in this dissertation, although some adaptations were made. Visscher and Coe 167
Chapter 7
embed the process of feedback use in the broader school environment, which are defined as context-related factors in our framework. Furthermore, we distinguish support-related factors as a separate set instead of positioning it within the implementation process and characteristics of the feedback system. As a result, the following set of influential factors is outlined: Factors related to the educational context, to school and users, to SPFSs, and to support. The second major adjustment to the Visscher framework was a refinement of the conceptualization of SPF use. In the framework of Visscher, only types of SPF usage are discerned. In our approach, we additionally discern phases in SPFS use (Verhaeghe et al., 2010): (1) the reception of the feedback in a school, (2) the reading and discussing of the feedback information in order to come to (3) an interpretation of the school’s results, followed by (4) a diagnosis or the search for explanations and (5) the planning of improvement actions, which are (6) implemented and about which the outcomes are (7) evaluated. Finally, in Chapter 3, two additional types of feedback use were added to the typology described by Visscher (i.e. instrumental, conceptual, symbolic, and strategic use): a motivating/stimulating and pupil-directed type of feedback use. The study described in Chapter 3 verifies the components of this updated framework by involving a sample of primary school principals, actively engaged in the School Feedback Project in Flanders. Semistructured in-depth interviews and a predefined coding scheme were used as qualitative instruments. This resulted in a validation of all framework components and some additions to the framework. The updated and validated framework influenced the other studies set up in the context of this doctoral thesis. Key elements of this framework influenced the studies presented in Chapters 5 and 6. These chapters discuss a quantitative (path modeling) and a qualitative study (case ordered predictor-outcome metamatrix), building on an experimental design. The following key findings result from the studies described in Chapter 3 and 6. Firstly, the context-related factors that influence school performance feedback refer to the educational climate in which SPFSs are developed and implemented. For example, in Flanders, it holds that there is no strong pressure on data use, due to a lack of a national assessment policy, the lack of central assessment system, the non-coercive role of the educational inspectorate, and the autonomy granted to schools. As a result, no strong data culture is observed in schools. Second, characteristics of the feedback and related SPFSs also influence feedback use. As described in Chapter 2, the perceptions of feedback users about the relevance, accuracy, costeffectiveness, fairness and beneficence of the feedback delivered to 168
Chapter 7
schools, will mainly determine what efforts will be made to put feedback use into practice. With respect to relevance, we found that some respondents of the School Feedback Project lack information about school subjects other than mathematics and language. Furthermore, they mention that aggregated feedback information is especially interesting for actors involved in mesolevel school activities, in contrast to teachers who prefer pupil level information that can be linked to microlevel interventions. Concerning the feedback interpretability, our findings illustrate that principals experience difficulties in interpreting feedback information. Remarkably, feedback that ought to have a signalizing function was rather perceived as being valid only when it matched prior conceptions and experiences of its users. Thirdly, school- and user-related factors that play an important role in the use of the feedback could be linked to users’ data literacy and expectations about feedback use. Furthermore, also the priorities in task schemes within a school and the perception of the school’s performance level appear to influence feedback use. Principals state that no clear expectations were defined prior to using the school feedback, that their data-literacy skills are limited and that feedback data use is not a priority. Furthermore, when feedback results were perceived as unsatisfying, this seems to confirm the feedback intervention theory for being willingly to reduce the gap between the observed and intended outcomes (Black & William, 1998; Hattie and Timperley, 2007) or it resulted in withholding feedback information by school principals to not discourage their school staff. Fourthly, some support related factors had to be discussed. Support needs are observed during the different phases of SPFS use: from the interpretation phase to the implementation of improvement actions. Principals suggested two scenarios to involve external support services. These suggestions inspired the design of the INSET and ONSET interventions discussed in Chapter 6. The interview results, obtained in the studies in Chapters 3 and 6, show that, in general, school feedback is not intensively used and has a limited impact on the actual way schools function. Mostly, schools did hardly attain the phase of planning future actions on the base of the school feedback. This resulted – consequently - in a limited instrumental usage. However, conceptual use was reported more often, which was also found in Chapter 5. This suggests that conceptions about SPF use starts to enter school related discussions and it suggests that it starts to affect teacher thinking. In the framework, attention is paid to the expected impact of school feedback use. Thus far, only basic indicators for intermediate school improvement effects could been observed; such as an increased interest of school staff in feedback results, a decreased reluctance to start feedback team 169
Chapter 7
discussions, a clearer picture of the learning gains of pupils, more confidence in the school’s functioning and an increase in the reflection on a school’s performance. Next to the expected outcomes of school feedback use, also unintended outcomes could be outlined. For example, feedback did lead to an increase in teaching to the test or to feelings of disappointment of school staff when confronted with unsatisfying pupil performance. 2.3. RO 3 & 4: Exploring data literacy competences & effects of alternative data representation modes on feedback interpretation abilities The analysis of SPFS characteristics - in Chapter 2 - revealed a typical use of complex numerical measures and graphical representations in the feedback reports. Furthermore, findings from the study - described in Chapter 3 illustrated that the interpretation phase is one of the main stumbling phases in the process of feedback use. This is in line with the discussions in the literature about the limited data literacy competences of data users. However, no empirical assessment of data literacy competences related to SPF use has yet been carried out and reported in the literature. Furthermore, to our knowledge, no research findings focusing on the interaction of data literacy competences with the characteristics of SPFSs have been published. This explains the relevance of the study reported in Chapter 4 about the research objectives 3 and 4. Additionally, also the research findings reported in Chapters 5 (n = 116) and 6 (n = 18) contribute to studying RO 3, since they report about the data literacy competences of feedback users. An experimental design with a post-test was set up, focusing on two alternative ways to explain value added, in combination with three alternative approaches to represent learning gain and value-added. The participants were freshmen in domain of the educational sciences, enrolled at Ghent University (n = 312). Tests were calibrated (by IRT based techniques) to assess both the ability levels of the students and the item difficulty levels. Students were asked to assume the role of a school principal who received a school performance feedback report based on the results from a longitudinal study in which his/her school participated (similar to the feedback reports produced by the “School Feedback Project”). The students received an introduction to the central concepts and were given a set of related graphical representations, developed and presented via a PowerPoint-presentation. Subsequently, they were requested to complete a knowledge and skill test related the interpretation of school feedback (test reliability ranging from .72 to .90). Both conceptual 170
Chapter 7
(i.e., understanding central concepts) and procedural knowledge (i.e., deriving information from graphical representations) was tested. The descriptive results in Chapter 4 indicate that users experience major difficulties to successfully solve procedural value-added items (only 35 % of respondents were able to do so). We can explain this by referring to the cognitive load theory (Chandler & Sweller, 1991; Sweller, van Merriënboer, & Paas, 1998), as high cognitive demands are posed on the users when interpreting value-added scores. The working memory is not able to cope with too much information at the same time. Examining the nature of the errors the participants make when calculating value added, patterns could be observed in the incorrect answers. This enabled us to reconstruct the thinking process of participants and to identify basic misconceptions. A typical misconception, made when calculating value added was for example the confusion of the heights of curves with their slopes, also known as the slope-height confusion (Beichner, 1994; Clement, 1989; Kramarski, 2004; Leinhardt, Zaslavsky, & Stein, 1990). Furthermore, respondents mostly gave correct answers to the conceptual questions related to the information that was literally explained in the school feedback presentation (87% correct answers). In contrast, low test scores were observed when the questions required deep level conceptual thinking (24% correct answers). In Chapter 5, data literacy competences of school principals, participating in the School Feedback Project, were examined by means of an IRT calibrated data literacy test and a self-report based survey (indicators of self-efficacy with respect to data interpretation and the consecutive diagnosis phase). The test used reflected a reliability of .83 and consisted of items measuring the conceptual and procedural understanding of the feedback reports. The data literacy test results reveal that only 42% of the respondents answered half of the questions correctly, though some school principals succeeded in interpreting all the information from the report. Analysis of the difficulty of the literacy test items points out that most principals experience difficulties in relation to procedural items. The conceptual questions were apparently less difficult. Although test scores were rather disappointing, most of the respondents reflected a positive self-efficacy score relating to their ability to interpret and use the feedback report (M = 3.81, SD =0.74). The unsatisfying data literacy skills are reconfirmed when looking at the findings from the in-depth interviews in Chapter 6 with school principals, participating in the School Feedback Project. Even if elaborate explanations are provided within the feedback reports, users encounter and report interpretation difficulties. Furthermore, communicating the feedback findings to other staff members or looking for explanations appear to be difficult for school principals. 171
Chapter 7
Regarding attitudes towards SPF use - another aspect of data literacy - the studies reported in Chapters 5 and 6 show a positive attitude towards SPF use. The scale results reported in Chapter 5 (range 1-6, M = 3.97, SD = 1.08, α = .91) imply that feedback use is considered as a relevant activity that fosters self-evaluation. However, school principals report a less positive attitude of their teachers (Chapter 6), as they are confronted with considerable demands related to the data collection, and that they may feel threatened by the feedback results. Therefore they seem to prefer pupil level information instead of aggregated school feedback data. With respect to RO 4, central in the research reported in Chapter 4, we can conclude that our findings confirm the research hypothesis that users experience difficulties in interpreting complex conceptual and graphical information, due to the interplay between the inherent complexity of SPF and their lack of prior knowledge. We compared two alternative ways to explain value added on the final understanding of the concept. This study proved to be helpful to detect which alternative explanation facilitated a better conceptual and/or procedural understanding. Explaining the concept in terms of “the difference between observed and expected growth” appears to be better than explaining it in terms of “the difference between the school’s adjusted growth curve and the reference growth curve”. In terms of the alternative graphical representations used in the experiment, it is rather surprising that the tables did not add to the users’ understanding of the feedback report. However, this does not imply that the use of tables in combination with growth curves is not advisable. Previous research indicates that different information is derived from tables and graphs (Meyer, Shinar, & Leiser, 1997); both sources of information have merits, depending on the task being performed (Schnotz & Bannert, 2003). An appropriate use of tables and graphs can therefore avoid extraneous cognitive load and foster a better understanding. 2.4. RO 5: Exploring effects of support on SPF use The research findings in the studies reported in Chapters 2, 3 and 4 demonstrate that one of the main stumbling blocks in SPF use is the interpretation phase, primarily due to a lack of data literacy competences. This finding raises the question for appropriate support initiatives, in view of SPF use. Therefore, a field experiment with post test (n = 195) was set up, building on the insights developed during the previous studies. This resulted in an experimental study, reported in Chapters 5 (IRT testing, survey research, path modeling) and 6 (in-depth interviews, case ordered predictor-outcome meta-matrix). In both studies, participants were 172
Chapter 7
principals of schools involved in the School Feedback Project. The support initiatives that were studied encompassed an INSET (inservice education and training) and an ONSET (onservice education and training) initiative. The INSET and ONSET approach can be positioned on the continuum reported by Gardner (1995). Both support initiatives did build on suggestions of school principals (see Chapter 3), as a solution to the data interpretation difficulties they encountered. This helped to distinguish three research conditions to which respondents were randomly assigned: an INSET (n = 23), an ONSET (n = 7) and a control group (n = 150). In Chapter 5, the results of the INSET and control group have been presented. The INSET intervention included a half-a-day workshop about SPF interpretation and use, using a fictitious feedback report as instructional material, and organized in a university building. In contrast, the ONSET intervention was organized in the school of the principal, where his/her own school feedback information was discussed. In view of the evaluation of the differential support effects, we built on Kirkpatrick’s (1998) four levels in the evaluation of training initiatives. First, the Reaction level, refers to the extent in which participants are satisfied about the support initiative. Next, the Learning level examines the increase in knowledge and skills and the change in attitudes. This was studied by examining whether the support did contribute to an increase in data literacy competences. Third, the Behavior level checks the transfer of what has been learned to the local organization. In our studies, this focus on the Behavioral level encompasses the influence of the support intervention on the phases in SPF use and types of SPF use, as defined in our theoretical framework in Chapter 3. The fourth level – the Results level - refers to the effects of the support initiative on achieving the organization’s aims and on the organization itself. This was measured by asking for the perceived (school improvement) effects of SPF use. The study about the impact of the INSET approach - as compared to the control group - was reported in Chapter 5. A path model (X² (df) = 11.3 (13), p = 0.58; RMSEA = 0.01; AGFI = 0.92; GFI = 0.97) was tested to check whether principals in the INSET research condition attained significantly higher data literacy competences (attitudes, knowledge and skills, and selfefficacy), reported a higher extent of feedback use and reported a higher number of perceived effects. Building on our theoretical framework, we expected the INSET initiative to affect the SPF-related competences in a direct way. This hypothesis was only partly confirmed since the support provision did have a statistically significant effect on the mastery of knowledge and skills and on self-efficacy, but not on the attitudes related to SPF use. The impact on self-efficacy remained limited. This can be explained by the limited scope of the support initiative, the raise in awareness about 173
Chapter 7
the complexity of school feedback, or by the quality of the support intervention. The path model test results also reveal no significant direct impact of support on phases in SPF use, types of SPF use and no significant impact on the perceived effects of school performance feedback use. Only indirect effects of support on these variables are found. These indirect effects are in line with Kirkpatrick’s (1998) model, implying that a higher level can only be achieved if lower levels have been attained. Specific for our study, this means that the phases in SPF use (Level 3), types of SPF use (Level 3) and the resulting school improvement effects (Level 4) will only be influenced by a particular support intervention, when this support had a prior effect on the data literacy competences (Level 2) of its users. The qualitative study, focusing on both the INSET and ONSET training provision (Chapter 6), checked the impact on the following dependent variables; satisfaction with the training (Level 1), data literacy competences (Level 2), SPF use (level 3), and perceived effects of feedback use (Level 4). ONSET participants report a higher satisfaction level, and attain a higher data literacy competence level. Differences in feedback use could be observed; and this in relation to the phases of reading and discussing, interpreting and diagnosing. This differential impact can be linked to the content of the ONSET approach. These contents were less prominent in the INSET condition and were lacking in the control condition. No differences were found in effects of data use.
3. General discussion: “Mirror, mirror on the wall” Data-driven decision making is a buzz words that recently entered the educational jargon (cf. the fancy abbreviation “D3M”). The related usage of concepts, such as learning gain, output measurement, value added, etc. is overwhelming to the (often) statistically less literate school staff. However, teachers and school principals are supposed to master these concepts. This expectation is implicitly in the way educational authorities and related educational quality assurance systems (e.g., the inspectorate) position policy papers that underline autonomy, accountability, and continuous school improvement. Central in the discourse about school improvement is the creation of data-rich environments that inform schools about their functioning. It is in this context that SPFSs become important and help to present a mirror for each school. “Mirror, mirror on the wall, are we doing well at all?” is the central question that has to be answered by school staff. The motive and need to look into the mirror is not personal vanity, but 174
Chapter 7
either external pressure (cf. accountability) or an internal motivation (cf. school improvement) or a combination of both driving forces. Instead of getting “wrinkled by age”, schools are expected to look better by being able to close the gap between observed and the desired outcomes (Black & William, 1998; Hattie & Timperley, 2007; Kluger & DeNisi, 1996). SPFSs, therefore, should pinpoint the strengths and weaknesses in school performance. In order to be effective, school feedback should be helpful to answer three questions (Hattie & Timperley, 2007; Hattie, 2009): Where am I going (Feed up)? How am I going (Feed back)? Where to go next (Feed forward)? The first question refers to the learning intentions and goals, the learning targets and expectations underlying the curricula. The second question asks to what extent the school attains its targets, while the answer on the third question offers directions for future action. The literature about the impact of current SPFSs and an analysis of the nature of the SPFSs, indicate that current SPFSs are mainly geared to answer the second question (Feed back). Additionally, simply receiving feedback will – as such - not guarantee that the feedback will be used. Several participants in our studies mentioned that they would like to receive additional information; e.g., concrete improvement indications and concrete directions for improvement actions. The latter implies that limiting the process of school feedback to “holding a mirror”, will not easily lead to a sufficient or adequate level of self-reflection and related improvement actions. However, it can be questioned whether SPF suppliers should fulfill the additional need for school feedback support. As they are external agents, they have a less clear view on all input, process, and contextual variables that influence performance within a particular school. It looks more sound to cooperate with actors that are more closely related to the schools, such as educational advisors. Furthermore, a debate should start about the function of SPFSs to determine whether school feedback should be conceptualized in a broader way, and should therefore go beyond a signal function. In the context of school feedback, the question “How am I going?” might pose specific problems. Feedback users expecting that a full picture will be presented “in the mirror” about their school, might be disillusioned. Users have to understand that an SPFSs reports on certain aspects of the school’s functioning, that has been measured at a particular moment, by involving particular (groups of) pupils/students, and building on specific measurement instruments and techniques. Feedback results should therefore be linked to other data sources, and to personal experiences. In case this results in conflicting findings, educators have to search for 175
Chapter 7
explanations rather than denying specific feedback results. This was exemplified in our own studies. In certain cases, the validity of the SPF was questioned or even denied by principals, when the feedback did not match the current policy or plans of the school. School feedback will not work if users only “see what their eyes want to see”. School feedback use assumes an open mind of its users. A second issue related to the how-am-I-goingquestion, is that feedback users might only attain a blurred view in their mirror, due to complex nature of the feedback and the limited data literacy competences of the user. The provision of additional support in data interpretation would be helpful to offer “glasses” to develop a better understanding of the feedback. It can be suggested that the provision of SPF support is an ethical requirement (cf. feedback should at least do no harm; Fitz-Gibbon and Tymms, 2002) that is to be delivered by the feedback suppliers. A third issue can be raised that centers on the possibility that the mirror presents a distorted view of the school reality. The question has to be asked whether SFPSs offer neutral or objective information. Every approach to develop feedback, builds on assumptions about what is relevant and when data are accurate. As discussed in Chapter 2, these assumptions seem to differ considerably from system to system. Therefore, a clear insight should be available about the underlying rationales to select certain feedback characteristics. At least, users should get informed about the strengths and limitations of the SPFS and the feedback received. A final discussion concerns the extent to which schools, principal and teachers fully exploit the level of autonomy granted within the Flemish educational system. This introduces the impact of personal characteristics of feedback users (Kluger & DeNisi, 1996). Flemish teachers and principals are relatively free in designing their pedagogical project, choosing learning methods, designing curricula and monitoring their quality. In the Flemish context, we can question whether schools adopt a clear and powerful level of “autonomy” and translate this into a school quality assurance policy. In our studies, we hardly observed related indicators. Some schools, for example, only added the feedback results to the output section of the self evaluation report they prepared in view of a visit of the inspectorate. In specific cases, the feedback was not read, nor screened. In this process, a key role is played by the school principal. In most cases, the feedback report entered the school via the desk of the school principal. He or she determined whether the information was neglected or was distributed to the school community members and was the starting point for a quality related school team discussion. In the latter case, school principals demonstrated a distributed leadership role, and school quality care became the responsibility of all actors involved in the school system. This mirrors a 176
Chapter 7
broad view on professional development, profession identity, and an inquiry habit of mind (Earl & Katz, 2006). It is to be stressed that the latter is one of the core competences of Flemish teachers (Flemish Government, 2007).
4. Limitations of the studies and directions for future research In the following paragraphs, we discuss a list of main critiques and or shortcomings that can be raised in relation to our studies. At the same time, this list helps to define directions for future research. 4.1. Study samples The selection of research participants can be critiqued in a number of ways. As the studies in this dissertation were part of a broader R&D project that aimed at designing, developing and implementing an SPFS to be used in the Flemish context, the recruitment of research participants was set up in a particular way. In three studies, the samples consisted of primary school principals, drawn from the larger pool of primary schools participating in the SiBO project/ School Feedback Project (Chapters 3, 5, & 6). This sample is relatively small (n = 195) when compared to the 2321 schools organizing primary education in Flanders (Vlaamse overheid, Beleidsdomein Onderwijs en Vorming, 2010). This resulted in rather small scale studies. Also, the involvement of the principals in a pupil monitoring project might have introduced a sampling bias since these principals expressed a clear interest in examining pupil performances. Furthermore, this small group was regularly asked to fill out research instruments (surveys and tests). As a result, the response rate declined; though remained satisfactory for our studies. Furthermore, the research samples were only put in a user context linked to one particular SPFS in the Flemish educational context. This introduces the need to expand our research by involving a larger and more varied sample of principals, which is chosen from varying educational contexts and in view of working with other SPFSs. This could help to validate the current research findings. For example, within the UK, about 4.500 primary schools (and their principals) participate in PIPS related research. This amount of participants creates opportunities to carry out more advanced types of statistical analyses (e.g., multi-level analysis). Or, better quality tests could be developed since a sound IRT calibration approach requires a minimum of 500 respondents for each test item.
177
Chapter 7
The nature and quality of the research samples is also an issue in the studies reported in Chapters 2 and 4. In Chapter 2, only five SPFSs have been selected. This is not a representative selection. These five systems were selected because they reflect the wide variety in SPFSs on the one hand, but the selection was also driven by a pragmatic issue on the other hand: to what extent was a spokesperson available to be involved in the qualitative study. A more comprehensive inventory of SPFSs, used worldwide, will offer perspectives to further develop the analytical framework focusing on characteristics of SPFSs. On this base, an additional research line could start that involves school feedback designers. In the study described in Chapter 4, the discussion about the quality of the sample takes a different direction. The decision to involve students affects the external validity of the research findings. Although this experiment did lead to interesting findings, the results of this study require to be validated with a sample of principals or teachers. We cannot assume that the data literacy competences of university freshmen are comparable to these of inservice teachers or principals. Due to practical constraints, it is very difficult to set up experimental studies involving school staff (e.g., administering data literacy tests). Alternative research designs should be considered, such as quasi-experimental designs with non-randomized groups of participants. The studies reported on in the Chapters 3, 5 and 6 solely build on the experiences of primary school principals. The involvement of other school team members (teachers, care coordinators) can be considered. Since we can expect that the availability of school data will increase over time, it might be realistic that specific school staff members develop the related data literacy competences. This type of task specialization seems to increase in the Flemish educational system. Another approach could build on an international project, involving teachers from countries where data use is already a better integrated in the school culture (e.g., UK, The Netherlands, New Zealand, etc.). As stated earlier, a selection bias can have played a role since the research participants were volunteers (Rossi, Lipsey en Freeman 2004). This is critical in view of the internal validity of the study. Future research should examine whether this subgroup is different from the population of school principals by checking relevant school population characteristics. Nevertheless, efforts have been undertaken to control for this type of bias in the studies reported in Chapters 3, 4, 5 and 6.
178
Chapter 7
4.2. Research design and data analysis A major advantage of applied research is bridging the gap between scientific research and practice (Broekkamp, Vanderlinde, van HoutWolters, & van Braak, 2009). However, applied research set up in a typical school context, introduces several limitations. In the case of this dissertation research, this affected the number of research participants involved in the studies. They were all linked to the same School Feedback Project. Furthermore, it can affect the external validity of the research findings. Finally, to prevent the risk of putting too much pressure on the principals in the project, both the number of research instruments, the duration of interventions, the administration of pretests and intermediate tests, etc. had to be limited. In future research, it is preferable to set up more continuous support provisions, to develop a baseline as to the dependent measures (pretests), and to set up follow-up tests. Furthermore, a longitudinal perspective could be implemented to study the growth in data literacy competences and the changes in school feedback use during different consecutive feedback cycles. Lastly, the delayed effects of SPF usage on student achievement could be studied. Such effects can only be expected after several SPF cycles and a persistent effort in taking up effective SPF use. A next limitation builds on the measurement instruments used in the different studies. Most did build on self-reporting (e.g., surveys and interviews described in Chapters 3, 5 and 6) of the principals’ perceptions about SPF. These perceptions can only be considered as proxies for their attitudes towards SPF use, their actual feedback use on the school and the concrete school improvement effects caused by SPF use. This limitation introduces the need for research that links the ‘perceived’ to the ‘expected’ and the ‘actual’ use of SPF. To measure school improvement effects, measurement techniques, such as school observations, video analyses of staff meetings, analyses of inspection visit reports, class tests and school documents by researchers are more optimal choices. Data resulting from the use of these instruments help to develop a broader view of a school’s functioning (e.g., Schildkamp & Kuiper, 2010). In the experimental studies, in addition to the skills and knowledge tests developed in view of the studies reported in Chapters 4 and 5, other tests could be used. For instance, it could be interesting to present the feedback reports to school staff and to invite them to make a concrete interpretation of the numerical measures and graphical representation of the feedback results. These concrete actions could be videotaped and consequently analyzed. This could help to get more adequate information about the way principals or 179
Chapter 7
teachers interpret the data representations. In the literature, this measurement approach has yet not been applied to research data-driven decision making; though some preliminary results of observation studies are reported in Santelices and Taut (2009), Van Petegem and Vanhoof (2004), and Verhaeghe, Verhaeghe, Valcke, and Vanhoof (2008). Furthermore, future research about SPF interpretation should also focus on individual differences and preferences in data-interpretation, since little is known about the impact of these differences on feedback interpretation. We can also criticize the number of variables incorporated in the pathmodel in Chapter 5 and the meta-matrix in Chapter 6. Our choice was guided by the feasibility of the study and our prior research interest in specific variables linked to data interpretation competences. This implies that our research model presents a reduction of reality. It was therefore not a surprise that not all variance in the endogenous variables could be explained by the model used in Chapter 5 (34% unexplained variance in total; only 11% explained variance in knowledge and skills to be related to the support intervention). Also, remarks can be made about our scale development and the lack of cross-validation of these instruments (Hoyle, 1995). However, as most studies were exploratory in nature, our findings must be considered as preliminary results to be studied in depth in further research. The studies outlined in Chapter 2 and 3, were of a descriptive analytic nature. Though they do not result in spectacular findings, the findings are valuable since they helped to develop the conceptual framework for the following studies. However, an overall framework on SPF use, comprising all relevant influencing factors is still lacking, as well as empirical validation of all existing frameworks (Visscher, 2002; Visscher & Coe, 2003). The literature about data usage and SPF use is growing; future meta-evaluative research about influencing factors is advisable. However, in this case a full validation of conceptual frameworks will remain difficult since “not everything that can be counted counts, and not everything that counts can be counted“(Cameron, 1963, p 13). The studies in Chapters 5 and 6 built on a controlled field experiment. This is very new in the SPFS literature. But it is clear that difficulties have been encountered. First, we ran into ethical objections since not all respondents were provided with the advantages of the onservice training (ONSET). Next, the experimental conditions are bound to criteria for controllability; this is not the case in reality (Rossi, Lipsey en Freeman, 2004). For example, the support intervention in the INSET-condition was organized in such a way that questions of principals concerning their personal school report could not get answered because of avoiding interaction with the ONSET-condition. Only in the ONSET-condition there 180
Chapter 7
was room to discuss the school’s own feedback results. In normal circumstances, we expect that principals would get input form the support providers about particular school related questions. Finally, we still have to question the extent to which we could control for the impact of confounding, interacting variables in the field experiments. For example, participants in the control condition were not prevented from search for support in data use. We therefore promote the design and implementation of more controlled field experiments and quasi experimental studies to examine the factors affecting SPF use, especially in contexts where feedback use is an integrated part of a school’s self-evaluation process. 4.3. Results Issues can be raised concerning the validity, the limited explained variance, the exploratory nature and the exemplary nature of our research findings. We did already discuss these before. However, we want to stress that the aim of the studies reported in this dissertation was not yet to come to generalizable findings. Rather, we wanted to explore and illustrate school feedback characteristics (Chapter 2), feedback use (Chapter 3), difficulties in feedback interpretation (Chapter 4), and effects of feedback support (Chapters 5 and 6). Furthermore, we have to stress that the conceptual frameworks presented in Chapters 2 and 3, are not be considered as comprehensive. Not all potentially relevant influencing factors have been incorporated in Chapter 3, or all school feedback characteristics in Chapter 2. Other limitations can be linked to the grounding of the studies in the research literature. It has to be stressed that a broad domain of the literature had to be explored in order to develop conceptual frameworks. This body of the literature encompasses literature about school effectiveness and school improvement, literature about data-driven decision making, about SPFSs, about data representations, about cognitive load, about inservice teacher training, about the evaluation of training initiatives, feedback theory, etc. Though a clear attempt was made to build on the most actual state of the literature, we are aware that there might be shortcomings. However, the peer review experienced in the context of conference and article submissions was a helpful step to guarantee a basic quality of the work presented. In future studies, the literature base should be expanded. In all studies, and in particular in Chapter 4, we continuously highlighted the interpretation problems in relation to school feedback information. This might suggest that SPF is simply too complex in view of presenting relevant 181
Chapter 7
information. This does not hold for all aspects of SPF. Much depends on the way information is gathered, reported and distributed. The Internet Testing Unit (INTU) from the Centre for Evaluation and Monitoring, for example, developed an “event mapper”. This is a self assessment tool to monitor a school’s environment by asking students to build on questions by clicking on an online map of the school. This can become very informative to detect risk areas in a school; e.g., to detect and prevent bullying. This example shows that SPF can build on inspiring and innovative ways to gather data, process the information and to distribute feedback reports. Creativity in developing these innovative directions is central to future research. The results of the studies in Chapters 4, 5 and 6 mainly focused on problems that feedback users did experience during the interpretation phase. A similar analysis of obstacles during the phases of diagnosis, planning, implementation and evaluation should be performed. This will again result in a better understanding of the support needs of SPF users. This future analysis of support needs is a prerequisite for designing adequate support initiatives. This will require cooperation with the relevant actors (i.e., school staff, feedback suppliers, inspection members, educational advisors). Furthermore, we want to stress that the use of SPF is something that needs time to grow within the Flemish educational context. Some disappointing results in our studies indicate that feedback use remains mainly conceptual and that only preliminary school improvement effects can be observed. The research findings about conceptual feedback use are nevertheless promising as they might precede more intensive types of feedback use (Schildkamp & Teddlie, 2008; Vanhoof, Verhaeghe, Verhaeghe, Valcke, & Van Petegem, in press). SPF is therefore to be considered as a large scale educational innovation that takes time to get embedded in all facets of the educational arena and the thinking processes and strategies of the actors involved.
5. Implications of the results Drawing on the findings from the five studies outlined above, some theoretical, methodological, practical and policy implications are suggested. Some overlap with the directions for future research and are therefore described rather concisely in the next paragraphs.
182
Chapter 7
5.1. Theoretical implications A first - conceptual - implication is that a more refined description of SPF use (see Chapter 3) has been developed and added to the research literature. In addition to the detection of types in feedback use, also phases in feedback use are now considered. Furthermore, also the existing types of feedback use (instrumental, conceptual, symbolic and strategic) were elaborated with a pupil-directed and motivating/stimulating feedback usage type. Finally, more attention is now paid to the intermediate impact of SPF instead of the narrow focus on the improvement of student performance as a single school improvement effect. Chapter 2 resulted in a more detailed framework for the analysis and comparison of characteristics of SPFSs. After expanding and adjusting a preliminary SPF framework, we could develop a set of standards for SPFS developers and for SPF usage. This can result in the future in the development of efficient instruments for data driven decision making. Furthermore, it might inspire educational researchers to set up quasiexperimental designs to study the way principals develop after receiving different types of school feedback. Finally, the experimental approach, reported in Chapter 4, presents an innovative theoretical direction since it links the interpretation of SPF to research about graphical data representations. In the context of data-driven school improvement, this theoretical research field remains largely unexplored. We expect that this study will trigger further assumptions and empirical research about the way users approach numerical outcomes and graphical representations as used in SPF reports. 5.2. Methodological implications From a methodological point of view, some characteristics of our studies inspire future research directions. The qualitative studies about SPF use (Chapters 3 & 6) illustrate a particular controlled selection of participants (e.g., theoretical sampling) and a particular systematic approach to the analysis of the results (e.g., conceptually ordered predictor-outcome metamatrix). The way these qualitative studies have been set up can inspire future qualitative research designs and the way they can tackle issues related to developing a systematic approach and a clear analysis direction. From a methodological point of view, our test calibration approach, building on IRT, has proven to be adequate. The approach has several advantages as compared to the application of classical test theory: (1) more exact and reliable measures are obtained; (2) more information about the 183
Chapter 7
quality of the individual test items and the ability levels of the respondents can be gathered. This information helps – in a better way – to track the identification of interpretation difficulties. The latter results helped to rethink how to present data to school staff and how to develop support provisions. Furthermore, IRT allows to link several tests along a common ability scale, creating opportunities to measure growth in ability over time. Another methodological implication is the promotion of practical research. The studies in Chapters 5 and 6 illustrate that it is possible to evaluate the impact of workshops on different levels, looking beyond the reaction level (Mathison, 1992; Rossi, Lipsey, & Freeman, 2004). 5.3. Practical implications Since – in this dissertation - we mostly focused on applied research, our list of practical implications is the longest. We can especially provide an enumeration of ideas in view of the design or SPFSs and the related implementation process. Many of these recommendations are especially deduced from the discussion about SPFS characteristics that was elaborated in Chapter 2. First, school feedback system designers should try to minimize the efforts for school staff and pupils in view of test administration. Developers should try to build on data from existing management information systems and available test item banks. Another efficiency measure builds on the adoption of computer adaptive testing. Second, with respect to the content of the feedback reports, more attention should be paid to non-cognitive performance indicators and other school subjects next to the predominant domains mathematics, language and sciences. Attention should be paid to the development of attitudes towards school and school subjects, socio-emotional variables (such as wellbeing). Third, data analysis approaches to produce school feedback should be upgraded in order to adopt multilevel modeling and the statistical adjustment for student background characteristics. However, raw (or observed) scores should always be reported, because they refer to the actual achievement level of a school. Users should get informed about the shortcomings and strengths of the analysis methods used. Furthermore, SPF designers should always try to find a balance between statistically correct and user-friendly feedback. Fourth, the presentation format of the school feedback should be well considered. We advice to pursue more conformity in the way data sources are present in a numerical way and in a graphical way. Furthermore, 184
Chapter 7
feedback designers should consider graphical representations that support the processing of the represented information (Kluger & DeNisi, 1996; Schnotz & Bannert, 2003). Feedback reports should be designed according to the cognitive tasks that are necessary to understand the information (e.g., line graphs to illustrate growth). Furthermore, the interpretability of the feedback information should be evaluated in pilot studies. The latter is important to guarantee that the feedback information and presentation format fits the prior knowledge of the SPF users. If the data literacy competences of school staff are insufficient to result in a correct interpretation of the SPF, the provision of proper support is to be taken up by the SPFS designers. However, since the support needs might exceed the interpretation phase and users also encounter difficulties during further steps of data use, other actors have to take up a support role. A long-term cooperation with educational advisors and the educational inspectorate could be helpful to create tailored ONSET trajectories. This will require – during an initial phase - that these stakeholders are thoroughly introduced into the characteristics and possibilities of the SPFSs. The promotion of data-literacy competences could also become a part of teacher education programs. If teachers are expected to adopt a role in the quality assurance cycle of their school, they should be introduced to the prevalent numerical measures and graphical representations that are relevant for SPF interpretation. This is expected to prevent the pitfalls in data use. At a practical level, also recommendations for future SPF users can be derived from our studies. Firstly, users should not use data from an SPFS without being informed about its characteristics and possibilities. Furthermore, users should expect and require that repeated measurements are pursued to attain reliable results about the student performance being studied. It is recommended that data from at least 3 consecutive school years are used to develop school improvement actions (van de Grift, 2009). In addition, data triangulation should be promoted, integrating the SPF results with other data sources, in order to end up with grounded decisions. Since we promote the “alert” use of SPF rather than a remedial usage, this implies that SPF helps to develop an understanding of a school’s functioning, and goes beyond offering clear-cut solutions and remediation approaches. Finally, school principals should be encouraged to involve their school team in discussing SPF. In such a way, they foster the development of a data-driven school improvement approach and a distributed leadership position in developing school policies (Huffman & Kalnin, 2003; Lachat & Smith, 2005; Wayman, Midgley, & Stringfield, 2007).
185
Chapter 7
5.4. Policy implications Finally, also policy implications follow from our research findings. First, applied research should be promoted to drive both theory development, the design of SPFSs and the implementation of a data-driven school improvement approach (Broekkamp et al., 2009). The study in Chapter 2 illustrates that several SPFSs emerged from projects initially sponsored by educational governments. More resources should be made available to schools to get support in the usage of SPF, to adopt the school wide use of - commercially - available SPFSs and to create possibilities for spending time devoted to data use. Furthermore, educational policy makers should be aware that the creation of information-rich environments is not a guarantee for effective feedback use. This requires the establishment of support initiatives. In addition, the educational inspectorate needs to be informed about the potential of SPFSs and should stimulate schools to effectively use the data in their decision making, instead of merely adding it to the school quality report. Furthermore, to stimulate school improvement and regular selfevaluation at the school level, more initiatives to participate in low-stakes testing should be promoted.
6. Final conclusion To ensure that SPFSs will be used as intended (i.e., for school improvement purposes), several conditions related to the users, the nature of SPF, the available user support and the educational context have to be fulfilled. Much can be gained when SPFSs provide schools with accurate, relevant and user friendly data. Decisions made by SPFS developers about the design of the SPFS affect school processes and learner results in ways that are not yet fully understood. More research is needed to expand and adjust the framework developed thus far. Furthermore, attempts to develop the dataliteracy competences of school staff are critical in view of the current trends in macro-level school policies and the way schools have to develop their autonomy. However, the first research findings in relation to SPF use in Flemish schools are promising. Although - thus far - strong effects of SPF use are lacking, there are some indications that data use is developing into an accepted and standard feature of an internal school quality policy. However, to bring current data use to a higher level, future research should center on the evaluation of the impact of School Performance Feedback
186
Chapter 7
and on support provisions in view of the development of data literacy and feedback use.
References Beichner, R.J., (1994). Testing student interpretation of kinematics graphs. American Journal of Physics, 62(8), 750-762. Black, P. & William, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7-75. Broekkamp, H., Vanderlinde, R., van Hout-Wolters, B., & van Braak, J. (2009). De relatie tussen onderwijsonderzoek en onderwijspraktijk verkend in Nederland en Vlaanderen [The relation between educational research and educational practice explored in The Netherlands and in Flanders]. Pedagogische Studien, 86(4), 313-320. Cameron, W.B. (1963). Informal Sociology: A casual introduction to sociological thinking. New York: Random House. Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of instruction. Cognition and Instruction, 8(4), 293-332. Clement, J. (1989). The concept of variation and misconceptions in Cartesian graphing. Focus on Learning Problems in Mathematics, 11, 7787. Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge Journal of Education, 33(3), 383-394. Earl, L.M., & Katz, S. (2006). Leading schools in a data-rich world: Harnessing data for school improvement. Thousand Oaks, CA: Sage. Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in indicator systems: Doing things right and doing wrong things. Education Policy Analysis Archives, 10(6), 1-28. Retrieved from http://epaa.asu.edu/ojs/article/viewFile/285/411 Fitz-Gibbon, C.T. (1996). Monitoring education: Indicators, quality and effectiveness. London: Cassell. Flemish Government. (2007, February 6). 15 december 2006. - Decreet betreffende de lerarenopleidingen in Vlaanderen [15 December 2006. Decree on teacher education in Flanders]. Belgian Official Gazette, pp. 5888-5897. Gardner, R. (1995). Onservice Teacher Education. In L. W. Anderson (Ed.), International Encyclopedia of Teaching and Teacher Education (pp. 628632). London: Pergamon Press. Goldstein, H. & Myers, K. (1996). Freedom of information: Towards a code of ethics for performance indicators. Research Intelligence, 57, 12-16. 187
Chapter 7
Goldstein, H. & Spiegelhalter, D.J. (1996). League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society: Series A: Statistics in Society, 159(3), 385-443. Hattie, J. & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81-112. Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York: Routledge. Heck, R. (2006). Assessing school achievement progress: Comparing alternative approaches. Educational Administration Quarterly, 42(5), 667-699. Hoyle, R.H. (Ed.). (1995). Structural equation modeling: Concepts, issues and applications.Thousand Oaks, CA: Sage. Huffman, D., & Kalnin, J. (2003). Collaborative inquiry to make data-based decisions in schools. Teaching and Teacher Education, 19, 569-580. Karsten, S., Visscher, A.J., Bert Dijkstra, A., & Veenstra, R. Towards standards for the publication of performance indicators in the public sector: The case of schools. Public Administration, 88(1), 90-112. Kirkpatrick, D.L. (Ed.). (1998). Evaluating training programs: The four levels. San Francisco: Berrett-Koehler. Kluger, A.N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254–284. Kramarski, B. (2004). Making sense of graphs: Does metacognitive instruction make a difference on students’ mathematical conceptions and alternative conceptions? Learning and Instruction, 14(6), 593-619. Lachat, M.A., & Smith, S. (2005). Practices that support data use in urban high schools. Journal of Education for Students Placed at Risk, 10(3), 333349. Leinhardt, G., Zaslavsky, O., & Stein, M.K. (1990). Functions, graphs, and graphing: Tasks, learning, and teaching. Review of Educational Research, 60(1), 1-64. Meyer, J., Shinar, D., & Leiser, D. (1997). Multiple factors that determine performance with tables and graphs. Human Factors, 39(2), 268-286. Mortimore, P. & Sammons, P. (1994). School effectiveness and value added measures. Assessment in Education: Principles, Policy and Practice, 1(3), 315. Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic approach. Thousand Oaks: Sage.
188
Chapter 7
Rowe, K. & Lievesley, D. (2002). Constructing and using educational performance indicators. Paper presented at the 2002 Asia-Pacific Educational Research Association, Melbourne, Australia. Rowe, K. (2004). Analysing and reporting performance indicator data: 'Caress' the data and user beware! Paper presented at the 2004 Public Sector Performance and Reporting Conference, Sydney, Australia. Santelices, V., & Taut, S. (2009, September). Comprehension and use of value-added school performance indicators reported to teachers and parents. Paper presented at the European Conference on Educational Research, Vienna. Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems in the USA and in the Netherlands: A comparison. Educational Research and Evaluation, 14(3), 255-282. Schildkamp, K., & Visscher, A. (2009). Factors influencing the utilisation of a school self-evaluation instrument. Studies in Educational Evaluation, 35(4), 150-159. Schnotz, W., Bannert, M. (2003). Construction and inference in learning from multiple representation. Learning and Instruction, 13(2), 141-156. Sweller, J., van Merriënboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive architecture and instructional design. Educational Psychology Review, 10(3), 251-296. Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatieindicatoren als strategisch instrument voor schoolontwikkeling [Feedback on school performance indicators as strategic instrument for school improvement]. Pedagogische Studiën, 81, 338–353. Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in press).The influence of competences and support on school performance feedback use. Educational Studies. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and School Improvement, 21(2), 167-188. Verhaeghe, G., Verhaeghe, J.P., Valcke, M, & Vanhoof, J. (2008, March). Understanding school performance feedback: A contribution to the development of effective school performance feedback. Paper presented at the annual meeting of the American Educational Research Association, New York. Visscher, A.J. (2002). A framework for studying school performance feedback systems. In A.J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 41-71). Lisse, The Netherlands: Swets & Zeitlinger.
189
Chapter 7
Visscher, A.J., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321-349. Vlaamse overheid, Beleidsdomein Onderwijs en Vorming (2010). Vlaams onderwijs in cijfers, 2009-2010 [The Flemish education in numbers, 2009-2010]. Brussels: Scheys. Wayman, J. C., Midgley, S., & Stringfield, S. (2007). Leadership for databased decision making: Collaborative educator teams. In A.B. Danzig, K. M. Borman, B.A. Jones & W.F. Wright (Eds.), Learner-centered leadership: Research, policy and practice (pp. 189-205). New Jersey, USA: Lawrence Erlbaum Associates. Yang, M., Goldstein, H., Rath, T., & Hill, N. (1999). The use of assessment data for school improvement purposes. Oxford Review of Education, 25(4), 469-483. Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for effectiveness and improvement in classrooms and schools in upper secondary education in Slovenia: Assessment of/for Learning Analytic Tool. School Effectiveness and School Improvement, 20(1), 89-122.
190
NEDERLANDSTALIGE SAMENVATTING [SUMMARY IN DUTCH]
191
Samenvatting
NEDERLANDSTALIGE SAMENVATTING [SUMMARY IN DUTCH] 1. Inleiding Van scholen wordt in groeiende mate verwacht dat ze van schoolontwikkeling een systematisch proces maken en zich opstellen als lerende organisatie (Nevo, 2002; Leithwood & Aiken, 1995). Om hen daarin te ondersteunen worden informatierijke omgevingen gecreëerd. Zo worden scholen ondermeer voorzien van feedback over hun functioneren en hun prestaties aan de hand van speciaal daartoe opgezette schoolfeedbacksystemen (SFSen). SFSen zijn externe systemen, bedoeld om “performance” gerelateerde informatie te leveren aan scholen, op een confidentiële manier. Dit gebeurt vanuit de verwachting dat scholen deze feedback zullen aanwenden voor een zelfevaluatie en de interne schoolontwikkeling (Visscher & Coe, 2002, p xi). Een belangrijk uitgangspunt is dat schoolfeedback een meerwaarde zou vormen ten opzichte van de bestaande informatiebronnen in scholen en de eigen intuïties en ervaringen van schoolteamleden (Earl & Fullan, 2003). Het gebruik van informatiebronnen als een omvattend beleidsinstrument blijkt echter niet vanzelfsprekend te zijn. Doorgaans blijven het gebruik en de schoolverbeteringseffecten beperkt (Coe, 2002; Saunders & Rudd, 1999; Tymms, 1995; Schildkamp, Visscher, & Luyten, 2009; Van Petegem & Vanhoof, 2004; Zupanc, Urank, & Bren, 2009). Het krijgen van specifieke schoolfeedback blijkt een noodzakelijke maar geen voldoende stap voor het bevorderen van een systematische reflectie op schoolniveau. Zowel binnen de scholen als de aan de kenmerken van feedbacksystemen moet immers aan bepaalde voorwaarden voldaan zijn (Visscher & Coe, 2003; Verhaeghe, Vanhoof, Valcke, & Van Petegem, 2010). Één van de belangrijkste hinderpalen die een effectief gegevensgebruik in de weg staat, is het ontbreken van datageletterdheid bij de gebruikers (Earl & Katz, 2006). Het is dan ook niet verwonderlijk dat uit heel wat onderzoeksbevindingen blijkt dat schoolleiders en leerkrachten een behoefte hebben aan bijkomende ondersteuning; zowel bij het interpreteren als het verder gebruiken van de data (Schildkamp & Teddlie, 2008; Schildkamp, Visscher, & Luyten, 2009; Verhaeghe et al., 2010; Visscher & Coe, 2003; Zupanc, Urank, & Bren, 2009).
192
Samenvatting
2. Conceptueel kader Als theoretische basis voor de onderzoeken in het proefschrift werd vooral beroep gedaan op de wetenschappelijke literatuur over schooleffectiviteit, datagebruik (cf. data-driven decision making), datarepresentatie en nascholingsinitiatieven. Twee centrale begrippen in de schooleffectiviteitsliteratuur zijn enerzijds schoolverantwoording en anderzijds schoolontwikkeling. Dit eerste begrip is vooral van toepassing op onderwijscontexten waarin centrale toetsing en externe controle centraal staan. Het tweede begrip verwijst naar een meer recente aanpak waarin gegevensgebruik binnen scholen voor zelfevaluatie en interne kwaliteitszorg centraal staan. Hoewel beide motieven voor datagebruik op het eerste gezicht tegenstrijdig lijken, komen beide benaderingen in de praktijk dikwijls samen voor (Earl & Fullan, 2003; Hofman, Dijkstra, & Hofman, 2009; Maier, 2010; Vanhoof & Van Petegem, 2007; Zupanc, Urank, & Bren, 2009). SFSen sluiten zeer expliciet aan op schoolontwikkeling omdat ze verondersteld worden de zelfreflectie te bevorderen. Maar om tot effectieve resultaten te kunnen leiden, dienen SFSen aan een aantal kwaliteitscriteria te voldoen. Daarvoor wordt verwezen naar literatuur over prestatie-indicatoren, die nuttig blijken wanneer ze relevante, accurate, kosteneffectieve en faire informatie aanreiken (Fitz-Gibbon, 1996; Heck, 2006; Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher, 2002). Naast de inherente kwaliteit van de indicatoren blijken ook variabelen in de schoolsetting bepalend te zijn voor nuttig en succesvol gebruik. Zo wordt in de literatuur benadrukt dat instellingen of organisaties die feedbackinformatie leveren, alles moeten bewerkstelligen om bij te dragen aan positieve effecten bij de gebruikers (Goldstein & Myers, 1996; Fitz-Gibbon, 1996, Fitz-Gibbon & Tymms, 2002). Om schoolfeedbackgebruik te omschrijven, vertrekken we van het conceptueel raamwerk van Visscher (2002; Visscher & Coe, 2003). Verschillen in schoolfeedbackgebruik en de effecten ervan worden toegeschreven aan vier cluster van factoren m.b.t. de (1) kenmerken van de gebruikers, (2) de feedback en het onderliggende SFS, (3) de geboden ondersteuning en (4) de educatieve context (Verhaeghe et al., 2010; Visscher & Coe, 2003). Deze factoren hebben een invloed op het schoolfeedbackgebruik, dat we omschrijven in termen van fasen in schoolfeedbackgebruik en soorten van feedbackgebruik. Onderzoek leert dat om gebruik te maken van schoolfeedback het aangewezen is op een doordachte manier een cyclisch proces te doorlopen. In die cyclus wordt het (a) ontvangen, (b) lezen en bediscussiëren van de schoolfeedback onderscheiden, om (c) tot een correcte interpretatie te komen. Nadat de school een sterkte-zwakteanalyse van haar resultaten heeft gemaakt, volgt 193
Samenvatting
een fase waarin met de schoolfeedback aan de slag wordt gegaan. Deze omvat het (d) diagnosticeren door het zoeken naar verklaringen voor de resultaten en het (e) plannen, (f) uitvoeren en (g) evalueren van acties. Door een gebrek aan datageletterdheid en tijd blijken scholen deze stappen niet allemaal, of slechts moeizaam te doorlopen (Earl & Fullan, 2003; Verhaeghe et al., 2010). Naast deze cyclische aanpak, wordt bij het gebruik van feedback informatie verwezen naar types van gebruik. Hiervoor helpt de indeling van Rossi, Lipsey en Freeman (2004), die een onderscheid maken soorten gebruik van evaluatiegegevens; een indeling die we kunnen toepassen in de context van schoolfeedbackgebruik (Schildkamp, Visscher, & Luyten, 2009; Verhaeghe et al., 2010; Weiss, 1998). Scholen kunnen acties ondernemen (instrumenteel gebruik), aan het denken gaan (conceptueel gebruik), bevestiging zoeken van bestaande standpunten (symbolisch gebruik), het rapport in een verantwoordingcontext hanteren (strategisch gebruik) of het rapport gebruiken om teamleden te stimuleren of motiveren (motiverend gebruik). Zoals reeds werd aangegeven in de inleiding, is het ultieme doel van schoolfeedbackgebruik bij te dragen aan schoolontwikkeling, ondermeer in termen van verbeterde leerresultaten van de lerenden (Visscher & Coe, 2002; 2003). Maar schoolfeedbackgebruik blijkt niet altijd reeds te resulteren in significant verbeterde leerlingprestaties (Fitz-Gibbon & Tymms, 2002; Schildkamp, Visscher, & Luyten, 2009; Visscher, 2002). Bij het nagaan van schoolverbeteringseffecten ligt het daarom voor de hand dat ook naar mediërende effecten gekeken worden; bijv. de effecten op de professionele ontwikkeling van teamleden (zoals een toenemende mate van assessment literacy; Zupanc, Urank, & Bren, 2009), de verbeterde onderwijsprocessen (zoals het intensifiëren van leerlingenbegeleiding, Schildkamp & Teddlie, 2008) en/of een verbeterd schoolfunctioneren (zoals het versterken van de cohesie in de school, Visscher & Coe, 2003). Schoolfeedbackgebruik kan ook resulteren in onbedoelde en onwenselijke effecten, zoals demotivatie bij leerkrachten, een overbevraging van leerkrachten (Fitz-Gibbon & Tymms, 2002) of een te sterke focus op getoetste leerinhouden, ook genoemd “teaching to the test” (Schildkamp & Teddlie, 2008; Visscher, 2002).
3. Het Schoolfeedbackproject: Een spiegel voor elke school Dit doctoraatsonderzoek werd opgezet in de context van het Schoolfeedbackproject genaamd “Each school its own mirror” (Verhaeghe & Van Damme, 2006). In het kader van dit project werd een prototype van een schoolfeedbacksysteem ontwikkeld. In de context van het ontwikkelingsonderzoek ontvingen 195 Vlaamse scholen jaarlijks feedback 194
Samenvatting
op vertrouwelijke basis, waarbij hun schoolresultaten vergeleken werden met een representatieve referentiegroep uit het SiBOonderzoek (Maes, Van Petegem, & Van Damme, 2005). In het SiBOonderzoek worden gegevens verzameld van een cohorte leerlingen die van het einde van het kleuteronderwijs tot en met de overgang naar het secundair onderwijs opgevolgd worden voor wiskunde en taal (spelling, technisch en begrijpend lezen) aangevuld met informatie over de instroomkenmerken van de leerlingen. Bij het uitwerken van de feedbackrapporten werd bij de vergelijking met de referentiegroep de betekenis van de eigen schoolprestaties uitgelegd aan de hand van een aantal centrale concepten: leerwinst, toegevoegde waarde en gecorrigeerde scores. Deze begrippen werden zodanig uitgelegd dat niet verwacht werd van de feedbackgebruikers veel statistische voorkennis te bezitten. De feedbackdata werden bovendien ondersteund met grafische voorstellingen (cirkeldiagrammen, groeicurven en kruistabellen). De tekst werd voor elke school gestandaardiseerd. Daarnaast werd van schoolteamleden verwacht om zelf de schooleigen data te interpreteren.
4. Onderzoeksdoelstellingen en –opzet In het kader van dit doctoraatsonderzoek werden vijf onderzoeken opgezet en uitgevoerd. De volgende vijf centrale onderzoeksdoelstellingen (OD) stonden voorop: •
OD 1: Het verkennen van de kenmerken van SFSen
Hoofdstuk 2 maakt de lezer wegwijs in kenmerken van SFSen. Gegevens werden verzameld door middel van vragenlijstenonderzoek en diepteinterviews bij feedbackontwikkelaars. Een descriptieve analyse van vijf SFSen leidde tot een eerste vergelijkend kader om een discussie over de kenmerken van SFSen op gang te brengen. •
OD 2: Het ontwikkelen van een raamwerk voor het in kaart brengen van schoolfeedbackgebruik, de beïnvloedende factoren en de verwachte effecten
In hoofdstuk 3 wordt een raamwerk ontwikkeld en uitgetest om schoolfeedbackgebruik, de beïnvloedende factoren en de uiteindelijke effecten op de schoolwerking te beschrijven. Daarbij werden schoolleiders uit het Schoolfeedbackproject geïnterviewd.
195
Samenvatting
• •
OD 3: Het verkennen van de datageletterdheidscompetenties van SFS gebruikers OD 4: Het verkennen van effecten van alternatieve datarepresentaties en de datageletterdheidscompetenties van SFS gebruikers
Enkele centrale concepten uit schoolfeedbackrapporten (vb. toegevoegde waarde en leerwinst) worden in een experiment uitgetest op hun interpreteerbaarheid. Respondenten werden random verdeeld over de condities die verschillen in de manier waarop de centrale begrippen worden uitgelegd en gerepresenteerd. Met behulp van gekalibreerde toetsen (d.m.v. IRT-technieken) werd het vaardigheidsniveau van de respondenten bepaald. Deze resultaten worden in het vierde hoofdstuk gerapporteerd. •
OD 5: Het verkennen van effecten van alternatieve vormen van ondersteuning op schoolfeedbackgebruik
De hoofdstukken 5 en 6 pakken deze laatste onderzoeksdoelstelling aan waarin de invloed van types van ondersteuning van schoolleiders uit het Schoolfeedbackproject bij schoolfeedbackgebruik werd onderzocht. Effecten van ondersteuning werden nagegaan door middel van vragenlijsten en een gekalibreerde toets (Hoofdstuk 5) en diepte-interviews (Hoofdstuk 6).
5. Voornaamste bevindingen OD 1: Het verkennen van de kenmerken van SFSen Tot nog toe ontbrak een helder kader om schoolfeedbacksystemen te beschrijven en te vergelijken. In voorliggend proefschrift werd hiervoor een eerste aanzet gegeven, met als doel zowel feedbackontwikkelaars als feedbackgebruikers te informeren over de basiskenmerken van SFSen. Daarbij komen ook voor- en nadelen van SFSen aan bod. Voor de aanpak van deze onderzoeksdoelstelling werden de kenmerken van vijf SFSen met betrekking tot hun dataverzamelingsmethode en technieken voor dataanalyse in kaart gebracht. Vervolgens werd de inhoud van de feedbackrapporten kritisch ontleed, met inbegrip van de gebruikte numerieke maten en de grafische representatievormen. Aparte aandacht werd besteed aan de kwaliteitscriteria voor de geleverde feedback: relevantie van de feedback, kosteneffectiviteit, accuraatheid, fairheid en het benadrukken van positieve effecten (Fitz-Gibbon, 1996; Heck, 2006; 196
Samenvatting
Rowe, 2004; Rowe & Lievesley, 2002; Schildkamp & Teddlie, 2008; Visscher, 2002). Uit de analyse blijkt vooral dat de onderzochte SFSen heel sterk verschillen in hun kenmerken. Om een idee te kunnen krijgen op de accuraatheid van de data, moeten we een goed zicht hebben van de gebruikte dataverzamelingsmethode. Zowel gestructureerde testinstructies als meetinstrumenten werden gebruikt. Interessant hierbij zijn het gebruik van technologieondersteunde toepassingen. Vooral de combinatie van computeradaptief testen met toetsen samengesteld uit itembanken en de gegevensuitwisseling met studentenadministratiesystemen blijken grote voordelen op te leveren voor de gebruiker. Vervolgens werd gekeken naar de gebruikte methoden voor dataanalyse, de schaalconstructies, de mogelijkheden voor longitudinale metingen en de gerapporteerde aggregatieniveaus. Voorts werd onderzocht in welke mate rekening is gehouden met leerlingenmobiliteit. Bij deze analyse stond centraal in welke mate voldaan werd aan de eisen voor accuraatheid én gebruiksvriendelijkheid. Daarna werd de feedbackinhoud van de verschillende SFSen nader bekeken. Daarbij bleek dat de nadruk vooral ligt op cognitieve inhouden. Verder werd ook onderzocht welke numerieke maten en grafische representaties in de rapporten werden gebruikt. Er werd een zeer brede waaier aan datarepresentaties vastgesteld. De keuze voor bepaalde representatievormen heeft meteen gevolgen voor de veronderstelde interpretatievaardigheden van de schoolfeedbackgebruikers. OD 2: Het ontwikkelen van een raamwerk voor schoolfeedbackgebruik, de beïnvloedende factoren en de verwachte effecten Vertrekkende van het conceptueel raamwerk, ontwikkeld door Visscher (2002; Visscher & Coe, 2003), werd een onderzoek opgezet om percepties van schoolleiders over schoolfeedbackgebruik in kaart te brengen. Daarbij werd aandacht besteed aan de beïnvloedende factoren, de fasen in schoolfeedbackgebruik, de soorten van feedbackgebruik, en de uiteindelijk effecten van feedbackgebruik op de schoolwerking. Informatie werd verzameld door middel van diepte-interviews bij deelnemers aan het Schoolfeedbackproject. Een analyse van deze resultaten hielp om het conceptueel model van Visscher verder uit te breiden. Daarbij werden vier clusters van factoren onderscheiden, die een invloed uitoefenen op schoolfeedbackgebruik: factoren gerelateerd aan de onderwijscontext, aan de gebruikers/school, aan de mogelijkheden voor ondersteuning en aan kenmerken van het SFS. Schoolfeedbackgebruik werd - aanvullend op het 197
Samenvatting
kader van Visscher - ook omschreven in termen van te ondernemen stappen in een cyclisch proces van datagebruik. De schoolleiders rapporteerden daarbij voornamelijk problemen in de interpretatiefase. De feedback van het Schoolfeedbackproject bleek in de meeste gevallen nog niet geïntegreerd te zijn in de schoolwerking. Wat betreft types van feedbackgebruik, werd maar zelden een instrumentele gebruiksvorm gerapporteerd. Het is dan ook niet verwonderlijk dat heel wat schoolverbeteringseffecten door schoolfeedbackgebruik bij deze bevraagde groep uitbleven. OD 3: Het verkennen van datageletterdheidscompetenties van SFS gebruikers Uit de resultaten van de vorige studies bleken de interpretatievaardigheden van schoolfeedbackgebruikers beperkt. Dit is vooral kritisch omdat de aangeboden representatievormen duidelijk een mate van datageletterdheid veronderstellen. Daarom werd een experimenteel onderzoek opgezet (Hoofdstuk 4) waarbij twee aanpakken voor het verklaren van het begrip toegevoegde waarde en drie verschillende representaties werden vergeleken in functie van hun interpreteerbaarheid. Respondenten volgden een gestandaardiseerde instructie (doornemen van een schoolfeedbackrapport in de rol van een schoolleider die de resultaten van de eigen school te zien krijgt) en er werden kennis- en vaardigheidstoetsen afgenomen. De toetsen, waarvan de resultaten door middel van IRT-technieken werden geanalyseerd, helpen de moeilijkheidsgraad van ieder toetsitem te bepalen en helpen eveneens het vaardigheidsniveau van de deelnemers te bepalen in het interpreteren van de feedbackinformatie. Ook werd gezocht naar patronen in de fouten, die kunnen verwijzen naar misconcepties bij de respondenten. Uit de resultaten blijkt dat vooral de procedurele toetsvragen, waarbij gevraagd werd om resultaten van toegevoegde waarde te interpreteren van grafische representaties, moeilijkheden opleveren (slechts 35% van de respondenten losten deze correct op). Dit kan verklaard worden door de hoge eisen die hierbij gesteld worden aan het werkgeheugen (cf. cognitive load theory; Chandler & Sweller, 1991; Sweller, van Merriënboer, & Paas, 1998). Eén bepaalde misconceptie bleek zeer vaak voor te komen, waarbij de hellingsgraad en de hoogte van curves verkeerd geïnterpreteerd werden (cf. slope-height confusion; Beichner, 1994; Clement, 1989; Kramarski, 2004; Leinhardt, Zaslavsky, & Stein, 1990). In het vijfde hoofdstuk werd een vergelijkbare datageletterdheidstoets gebruikt om het kennis- en vaardigheidsniveau te bepalen van 198
Samenvatting
schoolleiders, na het ontvangen van hun feedbackrapport in de context van het Schoolfeedbackproject. Hieruit bleek dat slechts 42% van de deelnemers erin slaagde om de helft van de items correct te beantwoorden. Deze zwakke resultaten komen niet overeen met de hogere inschatting van hun eigen kennisniveau (vijfpuntenschaal; M = 3.81, SD =0.74). Datageletterdheidscompetenties bestaan naast kennis en vaardigheden ook uit attitudes ten aanzien van schoolfeedbackgebruik. Wanneer we hiernaar peilden bij de schoolleiders (hoofdstukken 5 en 6), bleek dat zij een positieve houding aannemen en er van uitgaan dat dit soort datagebruik bij hen aanzet tot zelfevaluatie. Maar tegelijkertijd geven ze aan dat hun leerkrachten een stuk minder positief tegen schoolfeedback aankijken. Mogelijke verklaringen hiervoor zijn dat leerkrachten vooral geconfronteerd worden met de lasten van de dataverzameling, zich bovendien bedreigd voelen door deze evaluatie en een voorkeur hebben voor leerlingendata van hun eigen klas, in plaats van geaggregeerde gegevens op schoolniveau. OD 4: Het verkennen van de effecten van alternatieve datarepresentaties en de datageletterdheidscompetenties van SFS gebruikers Een samenspel van een beperkte voorkennis en de inherent complexe feedbackinformatie blijkt te leiden tot zwakke toetsscores bij de respondenten in de experimentele groep. De aanpak om het begrip toegevoegde waarde uit te leggen in termen van “het verschil tussen geobserveerde en verwachte gemiddelde” leidde tot betere toetsscores dan de aanpak om het begrip uit te leggen in termen van “het verschil tussen het gecorrigeerde gemiddelde en het gemiddelde voor de referentiegroep”. Verder blijkt uit de resultaten dat het toevoegen van tabellen aan de groeicurven niet bijdraagt tot een betere feedbackinterpretatie. Dit opvallende resultaat doet vragen rijzen bij de rol van gebruikte representatievormen. Afhankelijk van welke informatie afgelezen moet worden van deze figuren, is de ene dan wel een andere representatievorm geschikt (Schnotz & Bannert, 2003). OD 5: Het verkennen van de effecten ondersteuningsaanpakken op schoolfeedbackgebruik
van
alternatieve
In de hoofdstukken 5 en 6 worden de resultaten van een ondersteuningsinterventie gerapporteerd. Schoolleiders uit het Schoolfeedbackproject (n = 195) namen deel aan een experiment waarin ze 199
Samenvatting
ad random werden toegewezen aan één van de volgende drie condities: één conditie waarbij ondersteuning op school werd aangeboden (ONSET, n = 7), één waarbij de ondersteuning plaatsvond op een locatie buiten de school (INSET, n = 23) en één waarbij geen ondersteuning werd aangeboden (controlegroep, n = 150). De INSET-groep werd uitgenodigd op een studievoormiddag in een universiteitsgebouw. Zij kregen uitleg over de interpretatie van de feedbackrapporten en over de gebruiksmogelijkheden en dit aan de hand van een fictief scholenrapport. Dezelfde uitleg kwam aan bod in de ONSET-groep, maar daarbij werd de schoolleider op de school bezocht en werden de eigen schoolresultaten in de training betrokken. Kirkpatricks model (1998) voor de evaluatie van trainingsinitiatieven bood daarbij de structuur aan voor de evaluatie van de resultaten uit deze studie. In het reactieniveau werd nagegaan in hoeverre de deelnemers tevreden waren over de ondersteuning. Vervolgens werd - op het leerniveau nagegaan of er sprake was van een toename in datageletterdheidscompetenties (kennis, vaardigheden, attitudes). Daarna werd op het gedragsniveau onderzocht of wat geleerd werd ook toegepast werd binnen de school. Tenslotte werd op het resultaatsniveau bekeken of er sprake was van schoolverbeteringseffecten in de verschillende ondersteuningscondities, als gevolg van het schoolfeedbackgebruik. In hoofdstuk 5 werden enkel de INSET- en de controlegroep vergeleken. De relatie tussen de verschillende variabelen werden uitgetest in een padmodel (X² (df) = 11.3 (13), p = 0.58; RMSEA = 0.01; AGFI = 0.92; GFI = 0.97). De toetsing van dit model toonde aan dat de ondersteuning enkel op een directe manier - leidde tot significant hogere scores op de kennis- en vaardigheidstoets en op een hogere inschatting van de eigen datageletterdheid. Indirecte effecten werden vastgesteld door de ondersteuningsinterventie op de fasen in gebruik en types van gebruik. De kwalitatieve studie, gerapporteerd in hoofdstuk 6, maakte gebruik van een metamatrix waarin de verschillende onderzoeksdeelnemers geordend werden per conditie (ONSET, INSET en controle) en naar mate van feedbackgebruik. ONSET-deelnemers rapporteerden een hogere mate van tevredenheid, een sterkere beheersing van datageletterdheidscompetenties en een intensiever doorlopen van de fasen lezen en bespreken, interpreteren en diagnosticeren.
6. Conclusie Onderwijsoverheden verwachten van scholen dat ze data aanwenden voor hun interne kwaliteitszorg. Uit de resultaten van de hier gerapporteerde 200
Samenvatting
onderzoeken blijkt dat datagebruik in de context van schoolfeedbackgebruik eerder beperkt blijft. Kritische feedbackgerelateerde begrippen zoals “leerwinst”, “toegevoegde waarde”, en “outputmetingen” blijken de eerder statistisch ongeletterde schoolleiders te overdonderen en als gevolg daarvan nauwelijks te informeren over de eigen effectiviteit van de schoolwerking. Het aanbieden van schoolfeedback blijkt niet automatisch te leiden tot zelfreflectie. Om te garanderen dat schoolfeedback gebruikt wordt voor schoolverbeteringsinitiatieven, moet namelijk aan een aantal voorwaarden voldaan zijn m.b.t. de gebruikers, de SFSen, de ondersteuning en de educatieve context. De onderzoeksresultaten geven aan dat nog veel kan verbeterd worden aan de accuraatheid, relevantie en gebruiksvriendelijkheid van de geleverde schoolfeedback. Dit betekent dat meer evaluatieonderzoek nodig is in relatie tot schoolfeedbackinitiatieven. Wat de onderzoeksresultaten zeer sterk duidelijk maken, is dat veel aandacht moet geschonken worden aan de ontwikkeling van datageletterdheidscompetenties van feedbackgebruikers. Pas dan kan verwacht worden dat de kansen tot zelfreflectie en autonome kwaliteitszorg ten volle benut worden. De vraag naar een dergelijke ondersteuning gaat verder dan louter een ondersteuning bij de interpretatie van de data. Scholen willen ook op weg gezet worden bij het nemen van beslissingen op basis van hun schoolfeedback. Om hierop in te gaan, zal intensieve samenwerking tussen feedbackleveranciers, inspectieleden en pedagogische begeleiders nodig zijn; vooral om ondersteuning op maat te kunnen aanbieden. Daarnaast moeten scholen aangezet worden om deze feedbackgegevens aan te grijpen om eigen inzichten en eerdere bevindingen te vergelijken en te integreren in hun dagelijkse werking. Het effectief leren gebruiken van schoolfeedback is dan een nieuwe taak voor leerkrachten die momenteel vooral gewoon zijn om individuele leerlinggegevens van de eigen klas te verwerken. Maar gebruikers moeten ook geïnformeerd worden over de sterke en zwakke punten van de geleverde feedback. Niettegenstaande de uiteindelijke effecten van schoolfeedbackgebruik in onze studies beperkt bleven, zijn er wel indicaties gevonden die de meerwaarde aantonen van schoolfeedbackgebruik. Gezien voorliggend onderzoek duidelijke beperkingen heeft - bijvoorbeeld in termen van de omvang, de onderzoeksopzet en de gekozen afhankelijke en mediërende variabelen - is verder onderzoek op basis van deze eerste bevindingen aangewezen.
201
Samenvatting
Literatuur Beichner, R.J., (1994). Testing student interpretation of kinematics graphs. American Journal of Physics, 62(8), 750-762. Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of instruction. Cognition and Instruction, 8(4), 293-332. Clement, J. (1989). The concept of variation and misconceptions in Cartesian graphing. Focus on Learning Problems in Mathematics, 11, 7787. Coe, R. (2002). Evidence on the role and impact of performance feedback in schools. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 3-26). Lisse: Swets & Zeitlinger. Earl, L. & Fullan, M. (2003). Using data in leadership for learning. Cambridge Journal of Education, 33(3), 383-394. Earl, L.M., & Katz, S. (2006). Leading schools in a data-rich world: Harnessing data for school improvement. Thousand Oaks, CA: Sage. Fitz-Gibbon, C.T. & Tymms, P. (2002). Technical and ethical issues in indicator systems: Doing things right and doing wrong things. Education Policy Analysis Archives, 10(6), 1-28. Retrieved from http://epaa.asu.edu/ojs/article/viewFile/285/411 Fitz-Gibbon, C.T. (1996). Monitoring education: Indicators, quality and effectiveness. London: Cassell. Goldstein, H. & Myers, K. (1996). Freedom of information: Towards a code of ethics for performance indicators. Research Intelligence, 57, 12-16. Heck, R. (2006). Assessing school achievement progress: Comparing alternative approaches. Educational Administration Quarterly, 42(5), 667-699. Hofman, R.H., Dijkstra, N.J., & Hofman, W.H.A. (2009). School selfevaluation and student achievement. School Effectiveness and School Improvement, 20(1), 47-68. Kirkpatrick, D.L. (Ed.). (1998). Evaluating training programs: The four levels. San Francisco: Berrett-Koehler. Kramarski, B. (2004). Making sense of graphs: Does metacognitive instruction make a difference on students’ mathematical conceptions and alternative conceptions? Learning and Instruction, 14(6), 593-619. Leinhardt, G., Zaslavsky, O., & Stein, M.K. (1990). Functions, graphs, and graphing: Tasks, learning, and teaching. Review of Educational Research, 60(1), 1-64. Leithwood, K., Aitken, R., & Jantzi, D. (2006). Making schools smarter: Leading with evidence. (3rd. ed.) Tousand Oaks, CA: Corwin Press.
202
Samenvatting
Maes, F., Van Petegem P., & Van Damme, J. (2005). Schoolloopbanen in het basisonderwijs (SiBO): Doelstellingen en onderzoeksopzet. Paper gepresenteerd op de Onderwijs Research Dagen, Gent, België. Maier, U. (2010). Accountability policies and teachers' acceptance and usage of school performance feedback - a comparative study. School Effectiveness and School Improvement, 21(2), 145-165. Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 3–16). Oxford, UK: Elsevier Science. Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic approach. Thousand Oaks: Sage. Rowe, K. & Lievesley, D. (2002). Constructing and using educational performance indicators. Paper presented at the 2002 Asia-Pacific Educational Research Association, Melbourne, Australia. Rowe, K. (2004). Analysing and reporting performance indicator data: 'Caress' the data and user beware! Paper presented at the 2004 Public Sector Performance and Reporting Conference, Sydney, Australia. Saunders, L., & Rudd, P. (1999, September). Schools’ use of `value added’ data: A science in the service of an art? Paper presented at the British Educational Research Association Conference, Brighton, University of Sussex. Schildkamp, K. & Teddlie, C. (2008). School performance feedback systems in the USA and in the Netherlands: A comparison. Educational Research and Evaluation, 14(3), 255-282. Schildkamp, K., Visscher, A., & Luyten, H. (2009). The effects of a school self-evaluation instrument. School Effectiveness and School Improvement, 20(1), 69-88. Schnotz, W., Bannert, M. (2003). Construction and inference in learning from multiple representation. Learning and Instruction, 13(2), 141-156. Sweller, J., van Merriënboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive architecture and instructional design. Educational Psychology Review, 10(3), 251-296. Tymms, P. (1995). Influencing educational practice through performance indicators. School Effectiveness and School Improvement, 6(2), 123-145. Van Petegem, P., & Vanhoof, J. (2004). Feedback over schoolprestatieindicatoren als strategisch instrument voor schoolontwikkeling [Feedback on school performance indicators as strategic instrument for school improvement]. Pedagogische Studiën, 81, 338–353. Vanhoof, J. & Van Petegem, P. (2007). Matching internal and external evaluation in an era of accountability and school development: Lessons
203
Samenvatting
from a Flemish perspective. Studies in Educational Evaluation, 33(2), 101-119. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and School Improvement, 21(2), 167-188. Verhaeghe, J.P., & Van Damme, J. (2006). School performance feedback in Vlaanderen, een schets op basis op van een projectvoorstel. Informatie vernieuwing onderwijs (IVO), 27(103), 19-27. Visscher, A.J. (2002). A framework for studying school performance feedback systems. In A. J. Visscher & R. Coe (Eds.), School improvement through performance feedback (pp. 41-71). Lisse, The Netherlands: Swets & Zeitlinger. Visscher, A.J., & Coe, R. (2003). School performance feedback systems: Conceptualisation, analysis, and reflection. School Effectiveness and School Improvement, 14(3), 321-349. Weiss, C.H. (1998). Have we learned anything new about the use of evaluation? American Journal of Evaluation, 19(1), 21-33. Zupanc, D., Urank, M., & Bren, M. (2009). Variability analysis for effectiveness and improvement in classrooms and schools in upper secondary education in Slovenia: Assessment of/for Learning Analytic Tool. School Effectiveness and School Improvement, 20(1), 89-122.
204
RESEARCH VALORISATION: PUBLICATIONS
205
Publications
RESEARCH VALORISATION: PUBLICATIONS 1. Articles in SSCI journals (a1) 1.1. Published – in press Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Using School Performance Feedback: Perceptions of Primary School Principals. School Effectiveness and School Improvement, 21(2), 167-188. Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Valcke, M., & Van Petegem, P. (in press).The influence of competences and support on school performance feedback use. Educational Studies. 1.2. Submitted Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Van Petegem, P., & Valcke, M. (2010). School characteristics facilitating school performance feedback use by teachers. Manuscript submitted for publication in School Effectiveness and School Improvement. Verhaeghe, G., Schildkamp, K., & Luyten, H. (2010). Characteristics of School Performance Feedback Systems. Manuscript submitted for publication in Educational Administration Quarterly. Verhaeghe, G., Vanhoof, J., Valcke, M., & Van Petegem, P. (2010). Effecten van ondersteuning bij schoolfeedbackgebruik. Manuscript submitted for publication in Pedagogische Studiën. Verhaeghe, G., Verhaeghe, J. P., & Valcke, M. (2010). Value-added results of schools: How to represent school feedback information. Manuscript submitted for publication in The Journal of Educational Research.
2. Articles in journals not included in the SSCI (a3) Vanhoof, J., Verhaeghe, G., Van Petegem, P., & Valcke, M. (in press). Datageletterdheid versterken bij scholen: Lessen uit het Schoolfeedbackproject [Strengthening the data literacy in schools: Lessons from the School Feedback Project]. Kwaliteitszorg in Het Onderwijs. Vanhoof, J., Verhaeghe, G., Van Petegem, P., Verhaeghe, J.P., & Valcke, M. (2009). Verschillen in het gebruik van schoolfeedback: Een verkenning van verklaringsgronden [Differences in school performance feedback use: An exploration of explanations]. Tijdschrift voor Onderwijsrecht & Onderwijsbeleid, 2009(4), 306-322.
206
Publications
Verhaeghe, G., Vanhoof, J., Van Petegem, P., Verhaeghe, J.P., & Van Damme, J. (in press). Het gebruik van outputgegevens in basisscholen: Concretiseringen en illustraties uit het Schoolfeedbackproject [The use of output results in primary schools: Concretizations and illustrations from the School Feedback Project). Kwaliteitszorg in Het Onderwijs.
3. Chapters in books (b2) Vanhoof, J., Verhaeghe, G., Verhaeghe, J.P., Van Petegem, P., & Valcke, M. (2010). Improving data literacy in schools: Lessons from the School Feedback Project. In K. Schildkamp, M.K. Lai & L. Earl (Eds.), Data-driven decision making around the world: Challenges and opportunities. Manuscript submitted for publication.
4. Conference contributions Verhaeghe, G., Verhaeghe, J.P. (2006, December). School Performance Feedback als instrument voor kwaliteitszorg en middel tot reflectie over schoolbeleid. Paper presented at the Vlaams Forum voor Onderwijsonderzoek, Antwerp. Verhaeghe, G., Verhaeghe, J.P. (2007, June). Verstaanbare schoolfeedback een realiteit? Paper presented at the Onderwijs Research Dagen (ORD), Groningen. Verhaeghe, G., Verhaeghe, J.P., (2007, September). An attempt to develop effective school performance feedback. Paper presented at the preconference of the European Conference on Educational Research, Ghent. Verhaeghe, G., Verhaeghe, J.P., Valcke, M, & Vanhoof, J. (2008, March). Understanding school performance feedback: A contribution to the development of effective school performance feedback. Paper presented at the annual meeting of the American Educational Research Association, New York. Verhaeghe, G., Vanhoof, J., & Van Petegem, P. (2008, June). Diepteinterviews naar het gebruik van schoolfeedback. Paper presented at the Onderwijs Research Dagen, Eindhoven. Verhaeghe, G., Vanhoof, J., Verhaeghe, J.P., & Van Petegem, P. (2008, September). Feedback on school performance feedback: In-depth interviews about the comprehensibility and usability. Paper presented at the European Conference on Educational Research, Göteborg. Vanhoof, J., Verhaeghe, G., Van Petegem, P., & Valcke, M. (2009, January). The effect of support on the interpretation and use of school feedback. 207
Publications
Poster presented at the International Congress for School Effectiveness and School Improvement, Vancouver. Verhaeghe, G., Vanhoof, J., Verhaeghe, J.P., & Van Petegem, P. (2009, January). Feedback on the use and interpretation of school performance feedback: Perceptions of primary school principals. Paper presented at the International Congress for School Effectiveness and School Improvement, Vancouver. Vanhoof, J., Verhaeghe, G., & van Petegem, P. (2009, May). Schoolfeedbackgebruik: Proces, resultaat en impact van ondersteuning. Paper presented at the Onderwijs Research Dagen, Leuven. Vanhoof, J., Verhaeghe, G., Van Petegem, P., & Valcke, M. (2010, September). Does support matter in interpreting and using school feedback? Findings from a quasi-experimental study. Paper presented at the European Conference on Educational Research, Vienna. Verhaeghe, G., Vanhoof, J., Van Petegem, P., & Valcke, M. (2010, January). Supporting school performance feedback use: An experimental study. Poster presented at the International Congress for School Effectiveness and School Improvement, Kuala Lumpur. Vanhoof, J., Verhaeghe, G., & Van Petegem, P. (2010, January). Data use and the impact of a training initiative of data use. Symposium paper presented at the International Congress for School Effectiveness and School Improvement, Kuala Lumpur. Verhaeghe, G., Vanhoof, J., Van Petegem, P., & Valcke, M. (2010, August). Supporting School Performance Feedback Use: An Experimental Study. Paper presented at the European Conference on Educational Research, Helsinki.
208