Analysis of Factors Affecting Student Evaluation of Teaching Effectiveness in Saudi Higher Education: The Case of Jubail University College

The attributes of effective teaching in higher education remains controversial and has never been conclusive. The purpose of this study is to determine the factors affecting the students’ perceptions of teaching effectiveness, and how the instructor and course attributes can significantly influence teaching effectiveness as measured by students in course evaluation surveys. The study analyzed 3,798 student evaluations of faculty at Jubail University College using factor analysis to find out the factors loading and average extract variance value. The study predicted that there is a significant relationship between the five dimensions of teaching and students’ ratings of teaching effectiveness (i.e. instructor’s personality, knowledge, teaching ability, marking and grading policy, and course attributes and learning outcomes). The findings support the hypothesis that there is a significant relationship between effective teaching dimensions and students ratings. The study contributes to the body of literature on evaluation of teaching effectiveness in Saudi higher education.


Introduction
In the recent past years the expansion of higher education sector in the Kingdom of Saudi Arabia has raised the attention of policy makers to issues of educational quality, rather than mass production of graduates, and that there is a need to move the issue to the forefront of the educational agenda. Shevlin [28] asserted that, student evaluation of teaching (SET) is ubiquitous in the UK and the US universities. Moreover, in the UK, information from SET is considered as not only important evaluative information, but also a guide for potential changes in course material and method of delivery, which is clearly noted by the Quality Assurance Agency for Higher Education (QAA) in UK. In Australia, the move towards performance-based funding as part of quality assurance in Australian universities has driven a shift from voluntary to mandatory use of course evaluation surveys to evaluate and reward teaching staff performance [27]. Accordingly, higher education quality assurance agencies in Saudi Arabia such as National Commission for Academic Accreditation and Assessment (NCAAA) are given much autonomy to monitor the quality of higher education and to prescribe the necessary tools to ensure compliance with quality standards.
Considering this, higher education institutions in the Kingdom are mandated to utilize different instruments to examine and evaluate the quality of their educational infrastructure and the quality of teaching and learning [20]. The quest for excellence in college and university teaching is also becoming a worldwide concern and universities pay increasing attention to the quality of the pedagogy practice in their classrooms to assess how effectively professors are teaching [31].
Student evaluation of teaching effectiveness (SETE) has long history in higher education and used by many universities throughout the world as part of their quality assurance [8,22]; yet it has been described by many researchers as one of the most difficult and contentious task, lacking consensus on the attributes of effective teaching, and a cast of doubt concerning the validity and reliability of the measures used so far [17,18,19]. Previous studies in regional universities (Arabian Gulf) suggested that many colleges and universities have adopted the American educational model, hence, have implemented student evaluation of teaching as an instrument for assessing the effectiveness of their course offering and evaluating their academic staff [1]. On the other hand, prior research in the areas has also suggested that many faculty believe that SETE is simply a "popularity contest and have no relation to effective teaching" [19]. In fact, these authors argued that many faculty believe that factors beyond their direct control, such as the nature of the course and section size can impact student evaluations of faculty. There is a plethora of factor analysis literature which attempted to examine the determinant dimensions of SETE effectiveness [8,13,31], however, the literature revealed a serious lack of consensus and heated debate on the attributes and dimensions that constitute SETE [17].
The purpose of this paper is to contribute to the existing body of empirical literature on student evaluation of teaching effectiveness by empirically investigating a number of dimensions of teaching effectiveness and their ratings by college students in the Saudi higher education context.
The rest of the paper is organized as follows: The next section is the literature review and hypotheses development. This is followed by data and methodology. The analysis and discussion of the results follow this. The final section is the summary and conclusion.

Literature Review and Hypothesis
The literature on SETE is substantial, including comprehensive reviews of research on the subject by Al-Issa and Sulieman [1] who found 2988 articles on the subject published in highly ranked scholarly journals between 1990 and 2005. There exist multiple definitions in the literature of effective teaching and the measures that can accurately captured it [17]. Despite its popularity, several researches indicated that SETE should not be considered as the only possible source of information about faculty teaching effectiveness, and "it is certainly not the best source of that information" [10]. The arguments between the proponents and opponents are never conclusive. To demonstrate the paradox and inconclusive nature of SETE as a measure of teaching effectiveness [13] argued that; on one side, students regularly interact with their instructors so their opinions and evaluations are valid and relevant; on the other side, if students are "carelessly and mindlessly" completing an evaluation form, then SETE is invalid.
The proponents of SETE suggest that it is a source of rich insight on teaching effectiveness and student learning experience, and that it is potentially efficacious and authentic indicator of teaching performance as long as the data generated are used as primary rather than generalized [12]. According to Mary [16] SETE are widely used in tenure and promotion decisions, and in assessing merit. Despite the concern about their legitimacy, however, if properly designed SETEs are both valid and reliable measures to capture the faculty teaching effectiveness. Chulkov and Alstine [5] highlighted that, SETE's proponents utilized the influence of international accrediting bodies such as the Association of Advancement of Collegiate Schools of Business (AACSB) to promote the use of SETE. According to these authors, AACSB states that business schools should use instructional evaluations as a basis for professional development efforts for individual faculty members and for the faculty in general.
Opponents and critics of SETE argue that, students do not have the knowledge or the experience to evaluate the complex dimensions of teaching [1]. Furthermore, student assessment of teaching quality of their tutors' might be used for professional development. They may also be used for punitive purposes such as management control over promotion, professional advancement and dismissal [20]. In another study Moragan [19] found that many faculty believe that students' evaluation has no relation to effective teaching, and that many factors beyond their control, such as the difficulty of course material and section size can impact student evaluation of faculty. Crumbley and Reichelt [8] further asserted that empirical evidence has confirmed that student evaluations are misleading and dysfunctional and that SETE provides incentives for faculty to manipulate their student assessment grading and marking to enhance their SETE scores. For example, this same study by Crumbley and Reichelt [8] found that 53.5% of faculty surveyed knew of other colleagues who have reduced grading standards and course content to improve their SETE scores. To these authors, this represents a "power shift" and a move from professors running universities and colleges to students/administrators controlling higher education; where students simply "punishing difficult professors on their evaluation forms" (ibid: p.382). According to Tanner [30] the current trends in SET have greatly contributed to undermining the teaching profession. Smithson John [29] argued that course evaluation surveys completed by students touch on only few aspects of course and teaching quality.
Despite of the substantial body of literature around SETE, there is a little agreement as to which factors to be used to capture and measure the valid and reliable evaluations. Shevlin [28] asserted that "despite the perceived importance of SET, there appears to be little agreement on the nature and number of dimensions/factors that represent teaching effectiveness" (p.398). The following Table 1 summarized some of the factors employed by authors in education research in an attempt measure the attributes of effective teaching.
Despite the wide range of SET factor analysis literature and empirical studies [28] questioned, whether we really measuring the relevant variables of teaching effectiveness, and how far these measures are valid and reliable in making critical personnel decisions. Marsh [14] suggested that, student evaluations designed to measure and reflect effective teaching, should be "multidimensional/multifaceted" using multiple factors.
Moreover, d'Apollonia and Abrami [9] explained that, majority of the studies in the area focused on the relationship between ratings in SETE surveys and variable related to student characteristics, lecturer behavior and attributes, and course attributes [28]. Findings from the literature on SET factors and variables analysis revealed diverse and contradictory results.  From these various literature discussed above, this study was devised to examine the relationship between SETE ratings and the instructor's personality traits, his/her behavior in marking and grading, his/her knowledge and teaching ability, the course attributes and the course learning outcomes. It is predicated that: Hypothesis: H1. There is a significant relationship between instructor's personality traits and SETE ratings. H2. There is a significant relationship between students' perceived fairness in assessments marking, grading and workload and SETE ratings. H3. There is a significant relationship between instructor's knowledge and teaching abilities and SETE ratings. H4. There is a significant relationship between the course attributes and SETE ratings. H5. There is a significant relationship between students' perception of learning outcomes and SETE ratings.

Data and Methodology
To examine the factors that affect Jubail University College (JUC) student evaluation of teaching effectiveness, data from course evaluation survey (CES) were analyzed (see appendix 1). JUC is a small university college which has two separate campuses for male and female students. In the academic year 2016/2017, there were 3669 students enrolled in the JUC seven academic programs that offer bachelor degrees in business administration, MIS, computer science, interior design, English language, mechanical engineering and civil engineering.
For our study we used data obtained from CES conducted during the first and second semesters of the academic year 2015/2016. The survey consisted of 26 specific question divided into five sections in which students evaluate their faculty performance at the start of the course, during the course, after the course, and overall rating with the last section (3 questions) giving opportunity for comments. The responses in 23 questions were measured on a five-point scale, ranging from 1 (strongly disagree) to 5 (strongly agree). The survey CES questions were re-grouped under five factors that predicted to significantly influence the rating of SETE, namely: the instructor's personality traits, his/her behavior in marking and grading, his/her knowledge and teaching ability, the course attributes and the course learning outcomes. (see appendix 2) The survey is administered electronically where all enrolled full-time undergraduate students were mandatory required to complete the survey in a pre-designated computer labs. During the academic year 2015/2016 there were 2267 students enrolled (preparatory year students are excluded) in full-time courses during the first semester of which, 1331 were female student and 936 were male. 2000 students completed the survey with 88% response rate. The number of female respondents is 1160 giving a response rate of 87%, while 840 male students completed the survey with a response rate of 90%. During the second semester there were 2080 students enrolled, and 1798 responded (86%). 1221 were female student out of which 999 responded (82%) and 859 male students; around 799 responded (93%).
The study measured 23 attributes of teaching effectiveness which were administered to students electronically. The 23 attributes were designed to measure five dimensions of teaching effectiveness. Four attributes related to the instructor's personality measuring traits such as punctuality, commitment, enthusiasm and caring. Four attributes related to the instructor's knowledge and teaching ability. Six attributes related to course characteristics, four attributes related to the students' perception of the instructor's behavior in assessments marking and grading, and four attributes related to students' perception of the course learning outcomes. With the last items being the overall rating of the student satisfaction with the teaching effectiveness considered as a dependent variable measured by calculating the overall mean the scores in the above mentioned factors. Responses to the items were made on a 5-point Likert scale anchored with (strongly agree =5), (agree=4), (neutral=3), (disagree=2), and (strongly disagree =1).
To measure the reliability of the CES questionnaire we used Cronbach's alpha. According to Chen [4], Cronbach's alpha is one of the most popular methods to assess reliability. It is useful when the measuring tool has many similar questions or items to which the participant can respond [7]. The value of alpha measures the internal consistency or homogeneity between the items. According to Raykov [24], high correlations among the items are associated with high alpha value [4]. A value of alpha ranges from 0 to 1 and the higher the alpha the higher the reliability.
The reliability was tested using SPSS software and values were calculated for each of the multi-item factors, which were included in our model. All the items/questions in our CES showed high Cronbach's alpha values, which indicated that our measurements were reliable. Table 2 presents Cronbach's alpha, factor loading and average extract variance values for each construct in the research model. The lower accepted value of Cronbach's alpha is 0.7 [21] even though Robinson [25] argued that a value of 0.6 could be accepted. All Cronbach's alpha values listed in Table (2) are greater than 0.7. Although Cronbach's alpha is a widely used technique and indicator of reliability, it has been criticized for underestimation [3,6,23,24]. The problem of underestimation results from the underlying assumption for Cronbach's alpha where, all measured items are equally weighted or the path coefficients from the latent factor to the measured items are assumed to be equal. Failure to meet this assumption results in Cronbach's alpha underestimating reliability. Alternatively, Wertz [32] developed the composite reliability in assessing the reliability for a set of indicators. The composite reliability relaxes the assumption in Cronbach's alpha assessment and it is "a closer approximation under the assumption that the parameter estimates are accurate" [6], and thus it has been considered a more accurate measurement than Cronbach's alpha [11]. Table 2 also shows the reliability assessment using the composite reliability. The accepted value for composite reliability is 0.70 or higher [21].
To test our hypothesis we used SPSS software to run the t-test for the data to calculate the t-value/s and p-value/s. If the calculated p-value is < 0.05 the null hypothesis is rejected, otherwise it is accepted. Figure 1 hypothesized a five-factor measurement model for the 22 questions in the CES questionnaire measuring students' evaluation of teaching effectiveness. The instructor's personality (IP) is measured by respective items IP1 to IP4, instructor's knowledge and teaching abilities (IKT) is measured by items IKT1 to IKT4, the course attributes (CA) is measured by items CA1 to CA6, the instructor's assessment and grading behavior (AG) is measured by items AG1 to AG4, and the course learning outcomes is measured by items LO1 to LO4. The reliability was specified at Cronbach's alpha ≥ 0.7 and for average variance extract it is ≥ 0.6 of the items in each factor. Table 2   The Factor loadings indicate that the questions used in the CES for student evaluation of teaching effectiveness are good indicators and explanatory of the instructor's personality, his/her knowledge and teaching abilities, the course attributes, the instructor's assessment and grading behavior, and students' perceptions of the course learning outcomes. All the factor loadings and average variance extract are positive, relatively high (> 0.70) therefore, statistically significant.

Results Analysis & Discussion
The following Figure 2 & Figure 3 summarize the p-values for male and female branches at (α = 0.05) for all items used by end of second semester. The figures below indicate that the p-values are less than α (0.05), therefore, our study reject all the null hypothesis in favor of our alternate hypothesis stated earlier. This result indicates that there is a significant relationship between SETE and each of the five dimensions used in the evaluation tool.
However, when looking at the p-value test from a very close glance, we will find some variations. For example, the analysis of the data on departmental basis revealed that in particular departments some of the null hypothesis are accepted.   Civil; and all are greater than α (0.05). The above results indicate that Mechanical and Civil engineering students, do not consider the instructor's personality or his knowledge of subject and teaching ability as determinant factors when they rate their instructors in the CES at the end of each semester. In other words, they do care about assessments and grading, course characteristics and whether they achieved the course learning outcomes.
The same result revealed in the t-test results for the Interior Department in the female branch Figure 6. The P-values for the ID department are IP = 0.778, IKT = 0.999 and CA = 0.068 and the rest are null. The ID department findings reject the alternate hypothesis in three factors, therefore, they did not see any significant relationship between students' ratings in CES and the above mentioned factors. The descriptive statistical data presented in Table 3 gives an overview of the mean scores of all the dimensions/factors of teaching effectiveness as ranked by JUC students.
The statistics in the table above show that the instructors were rated high on their personality attributes by both male and female students with an overall mean of 4.120 and 3.913 respectively. This is followed by course attributes (mean 4.071 & 3.853), then knowledge and teaching ability which has a mean of (4.015 & 3.829) for male and female respectively. Followed by perception of learning outcomes (3.986 & 3.826) and finally the instructor's assessment and grading behavior (mean 3.985 & 3.781).
Thus, it can be concluded that the students are most impressed with the instructors personality, course attributes, instructors knowledge and teaching abilities, course learning outcomes and assessment and grading respectively.
The descriptive statistics relating of the attributes of teaching effectives under each factor /dimension are presented in Table 4. In the male branch, the statistics show that the instructor commitment, was rated highest with an overall mean of 4.202, the attribute related to the 'usefulness' of the course learning outcomes was rated second highest with an overall mean of 4.162 and the attribute "this course helped me to improve my skills in working as a member of a team" was the lowest rated with an overall mean of 3.886.   In the female branch on the other hand, the attribute "my instructor had thorough knowledge of the content of the course" was rated highest with an overall mean of 3.964, the attribute "my instructor/s were enthusiastic about what they were teaching" was rated second highest with an overall mean of 3.940, and the attribute related to "the grading of tests and assignments in this course was fair and reasonable" was the lowest rated with an overall mean of 3.717.
The results indicate that the SETE ratings were significantly influenced by the students' perception of instructions dimensions considered in this study. The five-factor analysis with high factor loading, average extract variance and Cronbach's alpha values, suggest that meaningful and reliable factors related to teaching effectiveness were being measured. The relatively close values of the factors measured, which seems surprising to many instructors who always predict that their assessment and grading policy is by large the major determinant of their ratings in SETE surveys. However, this explanation coincides with the major findings in the literature as concluded by Shevlin [26] that "good and effective teaching is not a one-dimensional skill, teaching is shown to be multi-dimensional" (p. 403).

Summary and Conclusion
The purpose of this study was to determine the factors affecting students' perceptions of teaching effectiveness and whether the instructor and course attributes have significant influence on effectiveness of teaching as measured by students' course evaluation survey. The results driven from the study demonstrate that CES is a useful and reliable tool for evaluating teaching effectiveness in Jubail University College. The findings suggest that instructor's personality, knowledge and teaching ability as well as course attributes, learning outcomes and the grading behavior have a significant relationship with effectiveness of teaching as perceived by students. Among the factors, instructor's personality was found to be the most important factor in the teaching evaluations for both male and female students.
The research design and data analysis is limited to one academic year, the validity and reliability of the results could be enhanced if the data analysis is extended to three years back. Moreover, this study has not considered the way students completed the online survey and the level of training they received in filling the questionnaire.
Despite the limitations, the results of the study should be useful to JUC instructors, academic departments and college administration to better understand students' course-instructor requirements. For example, course learning outcomes were generally considered low priority by both male and female students. This result should provide useful information for both instructors and academic departments for better management of course syllabus and content. For JUC faculty it is evident the need to highlight the importance of course learning outcomes as they introduce their courses or specific topics. Academic departments should demonstrate to students that the results of CES are integral part of measuring their learning outcomes. The results of this study could also contribute to the growing body of theoretical and empirical literature on students' perception of teaching effectiveness in Saudi higher education.