














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The disconnect between student evaluations of teaching (SET) and faculty development efforts, highlighting the importance of considering local contexts and the limitations of SET as a measure of teaching effectiveness. Key topics include college teaching, student evaluations of teaching, pedagogies, educational assessment, and faculty development.
What you will learn
Typology: Lecture notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!
Branching Paths: A Novel Teacher Evaluation Model for Faculty Development Kim A. Park,^1 James P. Bavis,^1 and Ahn G. Nu^2 (^1) Department of English, Purdue University (^2) Center for Faculty Education, Department of Educational Psychology, Quad City University Author Note Kim A. Park https://orcid.org/0000- 0002 - 1825 - 0097 James P. Bavis is now at the MacLeod Institute for Music Education, Green Bay, WI. We have no known conflict of interest to disclose. Correspondence concerning this article should be addressed to Ahn G. Nu, Dept. of Educational Psychology, 253 N. Proctor St., Quad City, WA, 09291. Email: agnu@qcityu.com Commented [AF1]: The running head is a shortened version of the paper's title that appears on every page. It is written in all capitals, and it should be flush left in the document's header. No "Running head:" label is included in APA 7. If the paper's title is fewer than 50 characters (including spaces and punctuation), the actual title may be used rather than a shortened form. Commented [AF2]: Page numbers begin on the first page and follow on every subsequent page without interruption. No other information (e.g., authors’ last names) are required. Commented [AF3]: The paper's title should be centered, bold, and written in title case. It should be three or four lines below the top margin of the page. In this sample paper, we've put three blank lines above the title. Commented [AF4]: Authors' names appear one double- spaced line below the title. They should be written as follows: First name, middle initial(s), last name. Omit all professional titles and/or degrees (e.g., Dr., Rev., PhD, MA). Commented [AF5]: Authors' affiliations follow immediately after their names. If the authors represent multiple institutions, as is the case in this sample, use superscripted numbers to indicate which author is affiliated with which institution. If all authors represent the same institution, do not use any numbers. Commented [AF6]: Author notes contain the following parts in this order:
Abstract A large body of assessment literature suggests that students’ evaluations of their teachers (SETs) can fail to measure the construct of teaching in a variety of contexts. This can compromise faculty development efforts that rely on information from SETs. The disconnect between SET results and faculty development efforts is exacerbated in educational contexts that demand particular teaching skills that SETs do not value in proportion to their local importance (or do not measure at all). This paper responds to these challenges by proposing an instrument for the assessment of teaching that allows institutional stakeholders to define the teaching construct in a way they determine to suit the local context. The main innovation of this instrument relative to traditional SETs is that it employs a branching “tree” structure populated by binary-choice items based on the Empirically derived, Binary-choice, Boundary-definition (EBB) scale developed by Turner and Upshur for ESL writing assessment. The paper argues that this structure can allow stakeholders to define the teaching construct by changing the order and sensitivity of the nodes in the tree of possible outcomes, each of which corresponds to a specific teaching skill. The paper concludes by outlining a pilot study that will examine the differences between the proposed EBB instrument and a traditional SET employing series of multiple-choice questions (MCQs) that correspond to Likert scale values. Keywords: college teaching, student evaluations of teaching, scale development, ebb scale, pedagogies, educational assessment, faculty development Commented [AF8]: Note that both the running head and the page number continue on the pages that follow the title. Commented [AF9]: The word "Abstract" should be centered and bolded at the top of the page. Commented [AF10]: By standard convention, abstracts do not contain citations of other works. If you need to refer to another work in the abstract, mentioning the authors in the text can often suffice. Note also that some institutions and publications may allow for citations in the abstract. Commented [AF11]: An abstract quickly summarizes the main points of the paper that follows it. The APA 7 manual does not give explicit directions for how long abstracts should be, but it does note that most abstracts do not exceed 250 words (p. 38). It also notes that professional publishers (like academic journals) may have a variety of rules for abstracts, and that writers should typically defer to these. Commented [AF12]: The main paragraph of the abstract should not be indented. Commented [AF13]: Follow the abstract with a selection of keywords that describe the important ideas or subjects in your paper. These help online readers search for your paper in a database. The keyword list should have its first line indented 0. inches. Begin the list with the label " Keywords: " (note the italics and the colon). Follow this with a list of keywords written in lowercase (except for proper nouns) and separated by commas. Do not place a period at the end of the list.
personnel considerations, informing important decisions like hiring, firing, tenure, and promotion. Seldin (1993; as cited in Pounder, 2007) finds that 86% of higher educational institutions use SETs as important factors in personnel decisions. A 1991 survey of department chairs found 97% used student evaluations to assess teaching performance (US Department of Education). Since the mid-late 1990s, a general trend towards comprehensive methods of teacher evaluation that include multiple forms of assessment has been observed (Berk, 2005). However, recent research suggests the usage of SETs in personnel decisions is still overwhelmingly common, though hard percentages are hard to come by, perhaps owing to the multifaceted nature of these decisions (Boring et al., 2017; Galbraith et al., 2012). In certain contexts, student evaluations can also have ramifications beyond the level of individual instructors. Particularly as public schools have experienced pressure in recent decades to adopt neoliberal, market-based approaches to self-assessment and adopt a student-as-consumer mindset (Darwin, 2012; Marginson, 2009), information from evaluations can even feature in department- or school-wide funding decisions (see, for instance, the Obama Administration’s Race to the Top initiative, which awarded grants to K-12 institutions that adopted value-added models for teacher evaluation). However, while SETs play a crucial role in faulty development and personnel decisions for many education institutions, current approaches to SET administration are not as well-suited to these purposes as they could be. This paper argues that a formative, empirical approach to teacher evaluation developed in response to the demands of the local context is better-suited for helping institutions improve their teachers. It proposes the Heavilon Evaluation of Teacher, or HET, a new teacher assessment instrument that can strengthen current approaches to faculty development by making them more responsive to teachers’ local contexts. It also proposes a pilot study that will clarify the differences between this new instrument and the Introductory Composition at Purdue (ICaP) SET, a more traditional instrument used for similar purposes. The results of this study will direct future efforts to refine the proposed instrument. Commented [AF20]: Here, we've made an indirect or secondary citation (i.e., we've cited a source that we found cited in a different source). Use the phrase "as cited in" in the parenthetical to indicate that the first- listed source was referenced in the second- listed one. Include an entry in the reference list only for the secondary source (Pounder, in this case). Commented [AF21]: Here, we've cited a source that does not have a named author. The corresponding reference list entry would begin with "US Department of Education." Commented [AF22]: Sources with three authors or more are cited via the first-listed author's name followed by the Latin phrase "et al." Note that the period comes after "al," rather than "et." Commented [AF23]: For the sake of brevity, the next page of the original paper was cut from this sample document.
Methods section, which follows, will propose a pilot study that compares the results of the proposed instrument to the results of a traditional SET (and will also provide necessary background information on both of these evaluations). The paper will conclude with a discussion of how the results of the pilot study will inform future iterations of the proposed instrument and, more broadly, how universities should argue for local development of assessments. Literature Review Effective Teaching: A Contextual Construct The validity of the instrument this paper proposes is contingent on the idea that it is possible to systematically measure a teacher’s ability to teach. Indeed, the same could be said for virtually all teacher evaluations. Yet despite the exceeding commonness of SETs and the faculty development programs that depend on their input, there is little scholarly consensus on precisely what constitutes “good” or “effective” teaching. It would be impossible to review the entire history of the debate surrounding teaching effectiveness, owing to its sheer scope—such a summary might need to begin with, for instance, Cicero and Quintilian. However, a cursory overview of important recent developments (particularly those revealed in meta-analyses of empirical studies of teaching) can help situate the instrument this paper proposes in relevant academic conversations. Meta-analysis 1. One core assumption that undergirds many of these conversations is the notion that good teaching has effects that can be observed in terms of student achievement. A meta-analysis of 167 empirical studies that investigated the effects of various teaching factors on student achievement (Kyriakides et al., 2013) supported the effectiveness of a set of teaching factors that the authors group together under the label of the “dynamic model” of teaching. Seven of the eight factors (Orientation, Structuring, Modeling, Questioning, Assessment, Time Management, and Classroom as Learning Environment) corresponded to moderate average effect sizes (of between 0.34–0.41 standard deviations) in measures of Commented [AF24]: Second-level headings are flush left, bolded, and written in title case. Third level headings are flush left, bolded, written in title case, and italicized. Commented [AF25]: Fourth-level headings are bolded, written in title case, and punctuated with a period. They are also indented and written in-line with the following paragraph. Commented [AF26]: When presenting decimal fractions, put a zero in front of the decimal if the quantity is something that can exceed one (like the number of standard deviations here). Do not put a zero if the quantity cannot exceed one (e.g., if the number is a proportion).
abilities and attitudes, and family and community” (McKenzie et al., 2005, p. 2). Student achievement varies greatly due to non-teacher factors like socio-economic status and home life (Snook et al., 2009). This means that, even to the extent that it is possible to observe the effectiveness of certain teaching behaviors in terms of student achievement, it is difficult to set generalizable benchmarks or standards for student achievement. Thus is it also difficult to make true apples-to-apples comparisons about teaching effectiveness between different educational contexts: due to vast differences between different kinds of students, a notion of what constitutes highly effective teaching in one context may not in another. This difficulty has featured in criticism of certain meta-analyses that have purported to make generalizable claims about what teaching factors produce the biggest effects (Hattie, 2009). A variety of other commentators have also made similar claims about the importance of contextual factors in teaching effectiveness for decades (see, e.g., Bloom et al., 1956; Cashin, 1990; Theall, 2017). The studies described above mainly measure teaching effectiveness in terms of academic achievement. It should certainly be noted that these quantifiable measures are not generally regarded as the only outcomes of effective teaching worth pursuing. Qualitative outcomes like increased affinity for learning and greater sense of self-efficacy are also important learning goals. Here, also, local context plays a large role. SETs: Imperfect Measures of Teaching As noted in this paper’s introduction, SETs are commonly used to assess teaching performance and inform faculty development efforts. Typically, these take the form of an end-of- term summative evaluation comprised of multiple-choice questions (MCQs) that allow students to rate statements about their teachers on Likert scales. These are often accompanied with short-answer responses which may or may not be optional. SETs serve important institutional purposes. While commentators have noted that there are crucial aspects of instruction that students are not equipped to judge (Benton & Young, 2018), SETs nevertheless give students a rare institutional voice. They represent an opportunity Commented [AF27]: To list a few sources as examples of a larger body of work, you can use the word "see" in the parenthetical, as we've done here.
to offer anonymous feedback on their teaching experience and potentially address what they deem to be their teacher’s successes or failures. Students are also uniquely positioned to offer meaningful feedback on an instructors’ teaching because they typically have much more extensive firsthand experience of it than any other educational stakeholder. Even peer observers only witness a small fraction of the instructional sessions during a given semester. Students with perfect attendance, by contrast, witness all of them. Thus, in a certain sense, a student can theoretically assess a teacher’s ability more authoritatively than even peer mentors can. While historical attempts to validate SETs have produced mixed results, some studies have demonstrated their promise. Howard (1985), for instance, finds that SET are significantly more predictive of teaching effectiveness than self-report, peer, and trained-observer assessments. A review of several decades of literature on teaching evaluations (Watchel, 1998) found that a majority of researchers believe SETs to be generally valid and reliable, despite occasional misgivings. This review notes that even scholars who support SETs frequently argue that they alone cannot direct efforts to improve teaching and that multiple avenues of feedback are necessary (L’hommedieu et al., 1990; Seldin, 1993). Finally, SETs also serve purposes secondary to the ostensible goal of improving instruction that nonetheless matter. They can be used to bolster faculty CVs and assign departmental awards, for instance. SETs can also provide valuable information unrelated to teaching. It would be hard to argue that it not is useful for a teacher to learn, for example, that a student finds the class unbearably boring, or that a student finds the teacher’s personality so unpleasant as to hinder her learning. In short, there is real value in understanding students’ affective experience of a particular class, even in cases when that value does not necessarily lend itself to firm conclusions about the teacher’s professional abilities. However, a wealth of scholarly research has demonstrated that SETs are prone to fail in certain contexts. A common criticism is that SETs can frequently be confounded by factors
female (regardless of the instructor’s actual gender) (Macnell et al., 2015). The classes were identical in structure and content, and the instructors’ true identities were concealed from students. The study found that students rated the male identity higher on average. However, a few studies have demonstrated the reverse of the gender bias mentioned above (that is, women received higher scores) (Bachen et al., 1999) while others have registered no gender bias one way or another (Centra & Gaubatz, 2000). The goal of presenting these criticisms is not necessarily to diminish the institutional importance of SETs. Of course, insofar as institutions value the instruction of their students, it is important that those students have some say in the content and character of that instruction. Rather, the goal here is simply to demonstrate that using SETs for faculty development purposes—much less for personnel decisions—can present problems. It is also to make the case that, despite the abundance of literature on SETs, there is still plenty of room for scholarly attempts to make these instruments more useful. Empirical Scales and Locally-Relevant Evaluation One way to ensure that teaching assessments are more responsive to the demands of teachers’ local contexts is to develop those assessments locally, ideally via a process that involves the input of a variety of local stakeholders. Here, writing assessment literature offers a promising path forward: empirical scale development, the process of structuring and calibrating instruments in response to local input and data (e.g., in the context of writing assessment, student writing samples and performance information). This practice contrasts, for instance, with deductive approaches to scale development that attempt to represent predetermined theoretical constructs so that results can be generalized. Supporters of the empirical process argue that empirical scales have several advantages. They are frequently posited as potential solutions to well-documented reliability and validity issues that can occur with theoretical or intuitive scale development (Brindley, 1998; Turner & Upshur, 1995, 2002). Empirical scales can also help researchers avoid issues caused
by subjective or vaguely-worded standards in other kinds of scales (Brindley, 1998) because they require buy-in from local stakeholders who must agree on these standards based on their understanding of the local context. Fulcher et al. (2011) note the following, for instance: Measurement-driven scales suffer from descriptional inadequacy. They are not sensitive to the communicative context or the interactional complexities of language use. The level of abstraction is too great, creating a gulf between the score and its meaning. Only with a richer description of contextually based performance, can we strengthen the meaning of the score, and hence the validity of score-based inferences. (pp. 8 – 9) There is also some evidence that the branching structure of the EBB scale specifically can allow for more reliable and valid assessments, even if it is typically easier to calibrate and use conventional scales (Hirai & Koizumi, 2013). Finally, scholars have also argued that theory-based approaches to scale development do not always result in instruments that realistically capture ordinary classroom situations (Knoch, 2007, 2009). The most prevalent criticism of empirical scale development in the literature is that the local, contingent nature of empirical scales basically discards any notion of their results’ generalizability. Fulcher (2003), for instance, makes this basic criticism of the EBB scale even as he subsequently argues that “the explicitness of the design methodology for EBBs is impressive, and their usefulness in pedagogic settings is attractive” (p. 107). In the context of this particular paper’s aims, there is also the fact that the literature supporting empirical scale development originates in the field of writing assessment, rather than teaching assessment. Moreover, there is little extant research into the applications of empirical scale development for the latter purpose. Thus, there is no guarantee that the benefits of empirical development approaches can be realized in the realm of teaching assessment. There is also no guarantee that they cannot. In taking a tentative step towards a better understanding of how these assessment schema function in a new context, then, the study described in the next section Commented [AF29]: Quotations longer than 40 words should be formatted as block quotations. Indent the entire passage half an inch and present the passage without quotation marks. Any relevant page numbers should follow the concluding punctuation mark. If the author and/or date are not referenced in the text, as they are here, place them in the parenthetical that follows the quotation along with the page numbers. Commented [AF30]: When citing multiple sources from the same author(s), simply list the author(s), then list the years of the sources separated by commas.
also invited to respond to two short-answer prompts: “What specific suggestions do you have for improving the course or the way it is taught?” and “what is something that the professor does well?” Responses to these questions are optional. The remainder of the MCQs (thirty in total) are chosen from a list of 646 possible questions provided by the Purdue Instructor Course Evaluation Service (PICES) by department administrators. Each of these PICES questions requires students to respond to a statement about the course on a five-point Likert scale. Likert scales are simple scales used to indicate degrees of agreement. In the case of the ICaP SET, students must indicate whether they strongly agree , agree , disagree , strongly disagree , or are undecided. These thirty Likert scale questions assess a wide variety of the course and instructor’s qualities. Examples include “My instructor seems well-prepared for class,” “This course helps me analyze my own and other students' writing,” and “When I have a question or comment I know it will be respected,” for example. One important consequence of the ICaP SET within the Purdue English department is the Excellence in Teaching Award (which, prior to Fall 2018, was named the Quintilian or, colloquially, “Q” Award). This is a symbolic prize given every semester to graduate instructors who score highly on their evaluations. According to the ICaP site, “ICaP instructors whose teaching evaluations achieve a certain threshold earn [the award], recognizing the top 10% of teaching evaluations at Purdue.” While this description is misleading—the award actually goes to instructors whose SET scores rank in the top decile in the range of possible outcomes, but not necessarily ones who scored better than 90% of other instructors—the award nevertheless provides an opportunity for departmental instructors to distinguish their CVs and teaching portfolios. Insofar as it is distributed digitally, it is composed of MCQs (plus a few short-answer responses), and it is intended as end-of-term summative assessment, the ICaP SET embodies Commented [AF31]: Italicize the anchors of scales or responses to scale-like questions, rather than presenting them in quotation marks. Do not italicize numbers if the scale responses are numbered.
the current prevailing trends in university-level SET administration. In this pilot study, it serves as a stand-in for current SET administration practices (as generally conceived).
Like the ICaP SET, the HET uses student responses to questions to produce a score that purports to represent their teacher’s pedagogical ability. It has a similar number of items (28, as opposed to the ICaP SET’s 34). However, despite these superficial similarities, the instrument’s structure and content differ substantially from the ICaP SET’s. The most notable differences are the construction of the items on the text and the way that responses to these items determine the teacher’s final score. Items on the HET do not use the typical Likert scale, but instead prompt students to respond to a question with a simple “yes/no” binary choice. By answering “yes” and “no” to these questions, student responders navigate a branching “tree” map of possibilities whose endpoints correspond to points on a 33- point ordinal scale. The items on the HET are grouped into six suites according to their relevance to six different aspects of the teaching construct (described below). The suites of questions correspond to directional nodes on the scale—branching paths where an instructor can move either “up” or “down” based on the student’s responses. If a student awards a set number of “yes” responses to questions in a given suite (signifying a positive perception of the instructor’s teaching), the instructor moves up on the scale. If a student does not award enough “yes” responses, the instructor moves down. Thus, after the student has answered all of the questions, the instructor’s “end position” on the branching tree of possibilities corresponds to a point on the 33-point scale. A visualization of this structure is presented in Table 1.
the purpose of international pedagogical research within the European Union. The most recent version of the ICALT contains 32 items across six topic domains that correspond to six broad teaching skills. For each item, students rate a statement about the teacher on a four-point Likert scale. The main advantage of using ICALT items in the HET is that they have been independently tested for reliability and validity numerous times over 17 years of development (see, e.g., Van de Grift, 2007). Thus, their results lend themselves to meaningful comparisons between teachers (as well as providing administrators a reasonable level of confidence in their ability to model the teaching construct itself). The six “suites” of questions on the HET, which correspond to the six topic domains on the ICALT, are presented in Table 1. Table 1 HET Question Suites Suite # of Items Description Safe learning environment (^4) Whether the teacher is able to maintain positive, nonthreatening relationships with students (and to foster these sorts of relationships among students). Classroom management 4 Whether the teacher is able to maintain an orderly, predictable environment. Clear instruction 7 Whether the teacher is able to explain class topics comprehensibly, provide clear sets of goals for assignments, and articulate the connections between the assignments and the class topics in helpful ways. Commented [AF34]: Tables are formatted similarly to figures. They are titled and numbered in the same way, and table-following notes are presented the same way as figure-following notes. Use separate sequential numbers for tables and figures. For instance, this table is presented as Table 1 rather than as Table 2, despite the fact that Figure 1 precedes it.
Suite # of Items Description Activating teaching methods 7 Whether the teacher uses strategies that motivate students to think about the class’s topics. Learning strategies (^6) Whether teachers take explicit steps to teach students how to learn (as opposed to merely providing students informational content). Differentiation (^4) Whether teachers can successfully adjust their behavior to meet the diverse learning needs of individual students. Note. Item numbers are derived from original ICALT item suites. The items on the HET are modified from the ICALT items only insofar as they are phrased as binary choices, rather than as invitations to rate the teacher. Usually, this means the addition of the word “does” and a question mark at the end of the sentence. For example, the second safe learning climate item on the ICALT is presented as “The teacher maintains a relaxed atmosphere.” On the HET, this item is rephrased as, “Does the teacher maintain a relaxed atmosphere?” See Appendix for additional sample items. As will be discussed below, the ordering of item suites plays a decisive role in the teacher’s final score because the branching scale rates earlier suites more powerfully. So too does the “sensitivity” of each suite of items (i.e., the number of positive responses required to progress upward at each branching node). This means that it is important for local stakeholders to participate in the development of the scale. In other words, these stakeholders must be involved in decisions about how to order the item suites and adjust the sensitivity of each node. This is described in more detail below. Once the scale has been developed, the assessment has been administered, and the teacher’s endpoint score has been obtained, the student rater is prompted to offer any textual Commented [AF35]: When a table is so long that it stretches across multiple pages, repeat the column labels on each new page. Most word processors have a feature that does this automatically. Commented [AF36]: In addition to presenting figures and tables in the text, you may also present them in appendices at the end of the document. You may also use appendices to present material that would be distracting or tedious in the body of the paper. In either case, you can use simple in-text references to direct readers to the appendices.
References Ambady, N., & Rosenthal, R. (1993). Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology, 64 (3), 431–441. http://dx.doi.org/10.1037/0022-3514.64.3. American Association of University Professors. (n.d.). Background facts on contingent faculty positions. https://www.aaup.org/issues/contingency/background-facts American Association of University Professors. (2018, October 11). Data snapshot: Contingent faculty in US higher ed. AAUP Updates. https://www.aaup.org/news/data-snapshot- contingent-faculty-us-higher-ed#.Xfpdmy2ZNR Anderson, K., & Miller, E. D. (1997). Gender and student evaluations of teaching. PS: Political Science and Politics , 30 (2), 216–219. https://doi.org/10.2307/ Armstrong, J. S. (1998). Are student ratings of instruction useful? American Psychologist , 53 (11) , 1223 – 1224. http://dx.doi.org/10.1037/0003-066X.53.11. Attiyeh, R., & Lumsden, K. G. (1972). Some modern myths in teaching economics: The U.K. experience. American Economic Review , 62 (1), 429–443. https://www.jstor.org/stable/ Bachen, C. M., McLoughlin, M. M., & Garcia, S. S. (1999). Assessing the role of gender in college students' evaluations of faculty. Communication Education , 48 (3), 193–210. http://doi.org/cqcgsr Basow, S. A. (1995). Student evaluations of college professors: When gender matters. Journal of Educational Psychology , 87 (4), 656–665. http://dx.doi.org/10.1037/0022- 0663.87.4. Becker, W. (2000). Teaching economics in the 21st century. Journal of Economic Perspectives , 14 (1), 109–120. http://dx.doi.org/10.1257/jep.14.1. Benton, S., & Young, S. (2018). Best practices in the evaluation of teaching. Idea paper, 69. Commented [AF38]: Start the references list on a new page. The word "References" (or "Reference," if there is only one source), should appear bolded and centered at the top of the page. Reference entries should follow in alphabetical order. There should be a reference entry for every source cited in the text. Commented [AF39]: Source with two authors. Commented [AF40]: All citation entries should be double- spaced. After the first line of each entry, every following line should be indented a half inch (this is called a "hanging indent"). Commented [AF41]: Source with organizational author. Commented [AF42]: Note that sources in online academic publications like scholarly journals now require DOIs or stable URLs if they are available. Commented [AF43]: Shortened DOI.
Second edition of a print book. Berk, R. A. (2005). Survey of 12 strategies to measure teaching effectiveness. International Journal of Teaching and Learning in Higher Education, 17 (1), 48–62. Bloom, B. S., Englehart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals. Addison-Wesley Longman Ltd. Brandenburg, D., Slinde, C., & Batista, J. (1977). Student ratings of instruction: Validity and
http://dx.doi.org/10.1007/BF Carrell, S., & West, J. (2010). Does professor quality matter? Evidence from random assignment of students to professors. Journal of Political Economy , 118 (3), 409–432. https://doi.org/10.1086/ Cashin, W. E. (1990). Students do rate different academic fields differently. In M. Theall, & J. L. Franklin (Eds.), Student ratings of instruction: Issues for improving practice. New Directions for Teaching and Learning (pp. 113–121). Centra, J., & Gaubatz, N. (2000). Is there gender bias in student evaluations of teaching? The Journal of Higher Education, 71 (1), 17–33. https://doi.org/10.1080/00221546.2000. Davis, B. G. (2009). Tools for teaching (2nd ed.). Jossey-Bass. Denton, D. (2013). Responding to edTPA: Transforming practice or applying shortcuts? AILACTE Journal, 10 (1), 19–36. Dizney, H., & Brickell, J. (1984). Effects of administrative scheduling and directions upon student ratings of instruction. Contemporary Educational Psychology, 9 (1), 1–7. https://doi.org/10.1016/0361-476X(84)90001- 8 DuCette, J., & Kenney, J. (1982). Do grading standards affect student evaluations of teaching? Some new evidence on an old question. Journal of Educational Psychology, 74 (3), 3 08 –