miau.my-x.hu  · web viewbut, the so-called raw variable (see moodle-activities) are also not...

13
Alternative approaches about performance- evaluation in education Laszlo Pitlik (sen), Laszlo Pitlik (jun), Matyas Pitlik, Marcell Pitlik (MY-X team) Abstract : Log-based, normative evaluation of Student’s performances can be realized in quasi unlimited ways. The question is, therefore, what is the best rule system/model for evaluation of performances. The classic school system works with marks/scores being mostly entirely subjective. A mark/score is subjective too if the evaluation frame is a test what could/can be evaluated by robots because the potential points for a test-answer will be derived in a subjective way. Nobody tries to check whether each question has the same “challenge”-index where the evaluation points are the same. The paper will demonstrate two different ways to realize a final evaluation based on a lot of alternative ones: the way of the antidiscriminative final model and the way of the description of the complexity of the rule systems. Parallel, the paper needs to show a lot of alternative evaluation models being log-based and objective – yet partially different. Keywords : benchmarking, ranking, similarity analysis, advanced service design Introduction This paper is the newest part of the series about experiences of the QuILT-based education processes. Previous articles and their annexes can be downloaded here: 1. https://miau.my-x.hu/miau/quilt/Definitions_of_knowledge.docx + annexes like: o https://miau.my-x.hu/miau/quilt/ demo_questions_to_important_messages.docx o https://miau.my-x.hu/mediawiki/index.php/QuILT-IK045- Diary o https://miau.my-x.hu/mediawiki/index.php/Vita:QuILT- IK045-Diary o https://miau.my-x.hu/mediawiki/index.php/QuILT-IK059- Diary

Upload: vantram

Post on 18-Aug-2019

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

Alternative approaches about performance-evaluation in educationLaszlo Pitlik (sen), Laszlo Pitlik (jun), Matyas Pitlik, Marcell Pitlik (MY-X team)

Abstract: Log-based, normative evaluation of Student’s performances can be realized in quasi unlimited ways. The question is, therefore, what is the best rule system/model for evaluation of performances. The classic school system works with marks/scores being mostly entirely subjective. A mark/score is subjective too if the evaluation frame is a test what could/can be evaluated by robots because the potential points for a test-answer will be derived in a subjective way. Nobody tries to check whether each question has the same “challenge”-index where the evaluation points are the same. The paper will demonstrate two different ways to realize a final evaluation based on a lot of alternative ones: the way of the antidiscriminative final model and the way of the description of the complexity of the rule systems. Parallel, the paper needs to show a lot of alternative evaluation models being log-based and objective – yet partially different.

Keywords: benchmarking, ranking, similarity analysis, advanced service design

IntroductionThis paper is the newest part of the series about experiences of the QuILT-based education processes. Previous articles and their annexes can be downloaded here:

1. https://miau.my-x.hu/miau/quilt/Definitions_of_knowledge.docx + annexes like:o https://miau.my-x.hu/miau/quilt/demo_questions_to_important_messages.docx o https://miau.my-x.hu/mediawiki/index.php/QuILT-IK045-Diary o https://miau.my-x.hu/mediawiki/index.php/Vita:QuILT-IK045-Diary o https://miau.my-x.hu/mediawiki/index.php/QuILT-IK059-Diary o https://miau.my-x.hu/mediawiki/index.php/Vita:QuILT-IK059-Diary

2. https://miau.my-x.hu/miau/quilt/reality_driven_education.docx + annexes like:o https://miau.my-x.hu/miau/quilt/chained-translations-legal-slang.docx o https://miau.my-x.hu/miau/quilt/demo_chained_translations.docx o https://miau.my-x.hu/miau/quilt/demos_chained_translations.docx o https://miau.my-x.hu/miau/quilt/forum_details.docx o https://miau.my-x.hu/mediawiki/index.php/QuILT-IK057-Diary o https://miau.my-x.hu/mediawiki/index.php/Vita:QuILT-IK057-Diary

3. https://miau.my-x.hu/miau/quilt/Exercises_for_critical_thinking_and_doing.docx 4. https://miau.my-x.hu/miau/quilt/st1_all.docx 5. https://miau.my-x.hu/miau/quilt/20Q.docx 6. https://miau.my-x.hu/miau/quilt/GDP_final_en.doc 7. https://miau.my-x.hu/miau/quilt/st2_all.docx 8. https://miau.my-x.hu/miau/quilt/harmony.docx 9. https://miau.my-x.hu/miau/quilt/safety-index.docx 10. https://miau.my-x.hu/miau/quilt/20q_based_fingerprints_of_words.docx 11. (https://miau.my-x.hu/miau/quilt/alternative_evaluations.docx)

Page 2: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

Parallel, there are a lot of spreadsheets supporting the needs for details: https://miau.my-x.hu/miau/quilt/?C=M;O=D

Background information of the paper: https://miau.my-x.hu/miau/quilt/my-x%20logstore_standard_log%202019-04-04.xlsx

The Moodle system is capable of logging user activities like viewing, editing, answering, etc. The above-mentioned background spreadsheet contains anonymous data about real activities in a semester concerning 9 weeks (from week 6 to week 14 – in case of the Students week 7-14 because the week Nr6 contains activity logs just about the course preparation by conductors). The amount of the Students is 12 persons. The here and now used logs concerns 1 single subject.

Previous evaluations after ca. 2 weeks at the beginning of the semester are well-known for the Students who have the task (in frame of the final test of the course) to derive an evaluation rule set pro person about each Student in the course based on objective log-data about Student’s performances. The same evaluation task about conductors was also presented for Students:

https://miau.my-x.hu/miau/quilt/log_students.xlsx https://miau.my-x.hu/miau/quilt/log_conductors.xlsx

This kind of decision support challenge is a relevant approximation about the service science/design/management based on the principle “the ocean in a drop” from point of the Knuth’s principle where knowledge is what can be transformed into source codes (c.f. rule systems). To be able to define who is the best Student (and/or conductor), is the basis of improving performances. Without an objective scale about the term of Good, it is impossible to estimate impacts of changes (concerning rule systems, learning/teaching methodologies – or in general (advanced) actions for influencing (advanced) service systems.

Students, they do not have any possibility and/or the motivation to decide about good/better/best (concerning arbitrary objects like persons, learning materials, courses, methods, methodologies, institutions, etc. – but especially about persons/themselves) are like teachers of diving courses who never used a diving equipment before - only heard/read about the using of it.

The basic evaluation level (alternative Nr0) could be: everybody will have the needed mark/score – the question is: when? This teaching-oriented approach ensures that each declared evaluation is positive because the conductors will not close the cooperation with a bad evaluation for a Student, but the conductor will ask/help again and again. The bad evaluation can be interpreted as a sign for a not successful training/teaching session – it means, the bad evaluation is rather valid for the conductor’s performance and not for the Student’s performance. Parallel the positive and negative evaluations, there is a new option – the option of the non-evaluation. The non-evaluation means:

if a Student did not have any rational activities for a mark/score, then the conductor may/should also not evaluate the non-existing performance or

if a Student has some activities then it is possible to derive a kind of diagnosis (where a diagnosis is not an evaluation just a status-report about variables/parameters needing changes) and to derive appropriate “therapies” (supporting the expected changes)

if the expectation could be realized then an evaluation will be sooner or later positive in an automated way

the time for changes is depending on the characteristics of the explored needs/expectations and on the chosen “therapies”, …

Page 3: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

The non-evaluation-based co-operation between Students and conductors let derive a set of general questions like:

Is a diagnosis correct? Are the expected changes realizable based on the potential therapies? Are the impacts (including time-aspects) of the potential therapies derivable in advance with

high accuracy?

The non-evaluation-based evaluation system can be used if:

it is not necessary to have an evaluation (positive or negative = stigmatization without helping to change) to a given timestamp

the common (Student’s and conductor’s) goal is to realize changes based on the diagnosis and through therapies

AlternativesStudent’s activities can be described/reported in quasi unlimited ways. Figure Nr1-2-3 show a few possibilities:

Figure Nr1: static performance reports (source: own presentation)

Page 4: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

Figure Nr2: Status reports integrated into a common system (source: own presentation)

Page 5: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

Figure Nr3: Dynamic views (source: own presentation)

Page 6: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

OAMs making possible antidiscriminative similarity analyses (Y0: https://miau.my-x.hu/myx-free/coco/index.html) can be derived based on the reports and based on less complex models too (see Figure Nr5-6-7-8-9):

Figure Nr5: OAM and estimations after two weeks based on the static/aggregated report (source: own presentation)

Figure Nr6: OAM and estimations after all weeks based on the static/aggregated report (source: own presentation)

Figure Nr7: Competitive evaluation of the reports “earlier” and “now” (source: own presentation)

Page 7: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

Figure Nr8: Dynamic OAM and its estimations (source: own presentation)

Figure Nr9: Model-aggregation (final model I.) – (source: own presentation)

Figure Nr10: Model-aggregation with dynamic effects (source: own presentation)

The antidiscriminative aggregation of the alternative modelsThe Figure Nr11 demonstrates the parallel (alternative) model estimations and their ranking values concerning each Student:

Page 8: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

Figure Nr11: Model-report (source: own presentation)

As it can be seen (Figure Nr11): the alternative models deliver different ranking values because the input information being processed are different:

the model_earlier is using just information about the first 2 weeks (see Figure Nr1/5) the model_now is using information about all weeks (see Figure Nr1/6) the model_average is using information about the previous model in an integrated model

(see Figure Nr.2/7) the model_trend is using information presented in timeseries (see Figure Nr3/8) the model_final is using the results of the previous 4 models (see Figure Nr9) the model_final2 is using the results of the previous 5 models and the dynamic changes

based on the model-estimations of model_average (see Figure Nr10)

Figure Nr11 has two so-called final models with rel. different ranks concerning the anonymous Students. The highest difference it 5 units in a ranking scale from 1 to 12.

Yet, the final2 model seems to be the real final model with each previous input data and model results. The antidiscriminative modelling tries to deliver a weighting system (staircase function-parameters) where each Student can have the same evaluation value. In case of each raw and/or calculated/modelled input-variable should have a direction. A direction means: the more/less is the value of the variable, the higher is the evaluation value of the particular Student. In this paper, the following directions were defined:

in case of each primary Moodle-activity (unit: piece/event): the higher – the higher in case of each trend value: the higher the higher in case of each model estimation: the higher the higher in case of differences between ranking values the lower the higher

If the model estimations of the any evaluation model are not the same, then the Students can not be seen as parts of an equilibrium. The better ones (with greener cells) can have at once the mark or they should be asked less complex/long. The Students with more reddish cells (see Figure Nr11) can also have the marks at once if the absolute levels of the explored competencies are high enough or they should have more complex questions. In the business-oriented reality, the absolute levels do not play any roles, just the relative positions can become information – therefore, the marks are quasi irrelevant at once they are available at all. It does matter alone the difference between the most powerful performance and the less powerful performance.

The descriptive aggregation of the alternative modelsThis way of evaluating models is also objective although the variables are seemingly subjective. But, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having a direction can be used at once.

The descriptive variables of the above-presented models could be:

amount of the involved weeks (the more the better)

Page 9: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

trend-effect (yes/no: yes > no) weeks in the time-series (the more the better) absolute increasing effect (yes/no: yes > no) relative increasing effect (yes/no: yes > no), overlapping effect (yes/no: no > yes) …

(Remark: The ranking function in MS-Excel needs the parameter 0 for the direction the-more-the-more. The parameter 1 means: the less the more.)

The amount of the descriptive variables is theoretically unlimited. The overlapping-effect between models are quasi irrelevant if the ranking values of the alternative approaches are different. Each effect having seemingly overlapping effect in the background are just specific versions of calculated variables.

The descriptive evaluation is also based on similarities – therefore, it is necessary to have objects. These objects are here and now the approaches (models) – see Figure Nr12:

Figure Nr12: The competition of the models with and without overlapping effects (source: own presentation)

The ranking values of the approaches including overlapping effects declare the model_trend as the best model. Without overlapping effects, the best model should be the approach “final2” because it has just ranking values of 1.

ConclusionsThe potential problem of the overlapping effects and the real integration force fields of the final model(s) can be seen as a kind of antagonism. The overlapping effect is a kind of magic of word (it means a variable with a subjective/instinctive direction) because the derivation of new and newer variables may not be limited and the calculated values having the same ranking profiles in an OAM always having the same impacts. The aggregation of each partial results leads however to the higher equilibrium concerning the whole problem.

Assumed, that each Student derive an own evaluation model about each Student, the conductors have to integrate each model in a final model and the estimations of this closing model should be seen as the temporary best model.

Page 10: miau.my-x.hu  · Web viewBut, the so-called raw variable (see Moodle-activities) are also not holistic enough. It is important to declare: Each variable, being available and having

About the absolute levels (it means: about the existence of the needed competences in case of a single Student), the conductors should decide. Students with lack of data (without any positive signs concerning at least one of the involved variables) might not be seen as persons with relevant competencies. On the other hand: if a Student has e.g. the best relative evaluation, then this person seems to compensate the above-mentioned lack of performances/experiences. But: if somebody can not swim, then this person can not be evaluated as a kind of pentathlon expert because the good performances of riding or shooting can not substitute the lack of swimming. Therefore: each Students should have trials concerning each variable.

If each Student tries to reject this kind of evaluation system, then they could be present in each event, be seemingly active in Moodle, etc. These shadow-performances can not be compensated through classic performance tests because the absolute level should be declared in these cases also subjectively. Therefore, competition-based (relative) evaluation expect at least one Student wanting to be extremely good and the evaluation should be multi-layered. The first evaluation layer can be created with the Students do not have any lacks and they should have hopefully rational efforts and results. The second layer can define a kind of selection pressure where Students can have marks if

they have better evaluation values than the lowest evaluation value of the first layer or they have at least the norm evaluation value of the first evaluation layer or they have better evaluation values than the best evaluation value of the first evaluation

layer…

The selection pressure of the business world is close to the last one where a new product and/or service should mostly be better, than the best alternative here and now…

The negative spirals of performance-reductive approaches should not be evaluated based on the above-outlined concept. This kind of behaviour pattern should be stopped before using evolutive evaluation methods. How to stop a negative spiral, is not part of this paper…