The Development of Higher Order Thinking Skills-Based Assessment Instrument for Elementary School Integrated Thematic Learning

This study was aimed at developing a higher order thinking skills (HOTS)-based test instrument that is empirically and theoretically feasible for elementary school integrated thematic learning. This study was conducted through a Research and Development methodology (Borg & Gall, 1983) to a population of an elementary school fourthgrade students in Central Lampung by using a purposive sampling. A total of 64 participants took part in this study. The data were collected through questionnaires and tests. The results show that the test instrument developed was theoretically feasible with an average expert score of 90.14. This fell into very good category and was empirically feasible. A total of 29 questions were valid and internally consistent with a moderate level of difficulty, good discrimination power, and good distractor.


Needs analysis and identification of problems
In this first step, several problems such as conventional question items made by teachers, teachers' lack of understanding of constructing HOTS-based assessment instruments, teachers' lack of items analyses were identified. However, some potential supports to conduct this study such as teachers' support and representative school facilities and infrastructures were also identified.

Data Collection and Product Planning
At this stage, an in-depth analysis of Curriculum 2013 which was the one applied at the school was carefully done followed by analyzing the basic competences, constructing a HOTS grid, choosing a stimulus, making questions, making an answer key, and constructing a guideline for scoring.

Preliminary Product Design
A preliminary product design or prototype was made at this stage. It was undertaken based on the concept of HOTS by referring to the grid that was made.

Product Design Validation
At this stage, the product prototype was then assessed and evaluated by experts using a questionnaire to see the feasibility of the instrument. The validators consisted of assessment experts, material development experts, and linguistic expert. The advice and suggestions from the experts were then used to revise the Prototype I. The feasibility analysis was obtained using the following formula.
Final Score = x 100 The final score was converted into the following category as shown in The

Product Revision
The product revision was performed based on the experts' suggestions. After the changes were made based on the experts' feedback, the Prototype II was then developed followed by the main field test.

Small Classroom Experiment
At this stage, the Prototype II was tried-out at public elementary school 3 Swastika Buana in Central Lampung with a total sample of 20 (twenty) students. It was to find out the validity, reliability, level of difficulty, discrimination power, and the effectiveness of distractors of each item. After that, the questions considered valid were used and those of invalid were dropped. After such revision was done, the Prototype III was then developed. Then, this Prototype III was tested in a larger classroom.

Large Class Experiment
At this stage, the Prototype III was tested to 44 students at the same elementary school. The results of the test were then analyzed to find out the validity, reliability, level of difficulty, discrimination power, and the effectiveness of the distractor of each item. After that, the Prototype IV was then finally developed. This prototype IV was the final product of the development of this test instrument.

Results and Discussion
Based on the results, it was found that the question items made by the teachers were not in accordance with the demands of the 21 st century skills. The teachers did not make a materials review prior to exams. They neither made HOTS-based assessment instrument nor undertook the items analysis. However, an assessment is theoretically in need to be done by teachers for measuring how far students have comprehended the learning materials delivered by the teachers (Hosnan, 2014), in which the results of the assessment can be used to decide the students' competence or ability and their learning achievement (Kankam Boadu, et al., 2015). The findings of this study are in line with findings found by Nova, et. al., 2016;Budiman & Jaelani, 2014 that an assessment instrument needs to be tested in order to obtain theoretical and empirical feasibility. The theoretical feasibility test was carried out by three experts including assessment experts, material development experts, and linguistic experts. To find out the empirical feasibility, it was tested to students in which the results of the test were then analyzed to find out the validity, reliability, level of difficulty, discrimination power and effectiveness of distractors in the form of multiple choices. Novitasari N. et.al, (2015) also explains that an assessment instrument needs to be tested in order to obtain theoretical and empirical feasibility.
The development of the test instrument in this research refers to Borg & Gall (1983) with the following steps.
3.1 Needs analysis and identification of problems Several problems such as conventional question items made by teachers, teachers' lack of understanding of constructing HOTS-based assessment instrument, teachers' lack of items analysis were identified. However, some potential supports to conduct this study such as teachers' support and representative school facilities and infrastructures were also identified.

Data Collection and Product Planning
An in-depth analysis of Curriculum 2013 which was the one applied at the school was carefully done followed by analyzing the core and basic competences. Table 2. Core and Basic Competences Core Competence 3. Understanding factual and conceptual knowledge by observing and asking questions based on curiosity, God's creatures and activities, and objects that are found at home, at school, and at the playground. Basic Competence 3.4 Connecting forces with motion in environmental events (natural sciences) 3.5 Identifying economic activities and their relationship with a variety of professions as well as social and cultural lives in the surrounding to the province (social sciences) 3.6 Comprehending fictional characters (Indonesian language) 3.7 Having knowledge of local dance motions (Arts) 3.8 Explaining the benefits of various individual characteristics in daily life (Civic education) After that, it was then followed by constructing a HOTS grid, choosing a stimulus, making questions, making an answer key, and constructing a guideline for scoring.

Preliminary Product Design
A preliminary product design or prototype was made at this stage. It was undertaken based on the concept of HOTS by referring to the grid that was made. Then, the Prototype I was developed.

Product Design Validation
At this stage, the product prototype was then assessed and evaluated by experts using a questionnaire to see the feasibility of the instrument. The validators consisted of assessment experts, material development experts, and linguists. The advice and suggestions from the experts were then used to revise the Prototype I and to state that the design of the test instrument was feasible.

No.
Evaluation Expert's Advice Revision Results 1. Questions with "except" statements must be underlined or typed in bold.
As advised 2. The use of the preposition "at" must be adjusted if it is used to refer to a place or with a verb.
As advised 3. Questions must be adjusted to the HOTS indicators. As advised 4. The choice of answers must vary and not use repeated words As advised Material Expert's Advice 1. The indicators of the formulated questions should be much richer than those of the basic competence.
As advised 2. The questions should be adjusted to the HOTS characteristics.
As advised 3. The questions should be adjusted to the students' or school's location.
As advised 4. The distribution of questions should be adjusted to the related material or basic competence indicators.
As advised 5. A measure of the question request is needed. As advised 6. A rational distribution of questions is also required.
As advised Linguistic Expert's Advice 1. The use of the preposition "at", "to" should be in accordance with the standardized Indonesian language.
As advised 2. The writing of the answer choices should be adjusted. If it is in the beginning of a sentence, it is printed in capitals. If it is in the end, use a period (.).

As advised
3. Proper names should be printed in capitals.
As advised 4. The imperative sentences should be provided with a "!" symbol at the end of the sentences.

As advised
The results of the experts' validation fall into very good category as shown in the Table 4 below.

Questions Total Description
2,4,5,10,11 5 Valid ( rvalue > rtable) 1, 3,6,7,8,9,12 7 Invalid ( rvalue < rtable)  ,7,8,9,12 Poor Each item in the Prototype II was analyzed. Thirty multiple choice questions and five essay questions were considered empirically feasible because they were proved to be valid and reliable with a moderate level of difficulty and good discrimination power. The effectiveness of the distractors was also proved to be good. After the Prototype II was revised, then the Prototype III was developed which was then tested in a larger-size class.

Conclusion
Based on the results and discussion, it can be concluded that the final product in this study is a HOTS-based assessment instrument that is theoretically and empirically feasible for the integrated thematic learning of elementary school fourth-grade students. The feasibility of the instrument was obtained from the experts' evaluation and instrument try-outs in the classrooms. This instrument has been theoretically feasible because it was validated by the assessment, material, and linguistic experts, in which the results fell into very good category. This multiple-choice instrument as well as the essay questions are empirically feasible because they were tested in classrooms. The results of the test were proved to be valid and highly reliable with a moderate level of difficulty and good discrimination power. The effectiveness of the multiple-choice distractors was also proved to be good.