[ieee 2009 international conference on engineering education (iceed) - kuala lumpur, malaysia...

6
2009 Inteational Conference on Engineering Education (lCEED 2009), December 7-8, 2009, Kuala Lumpur, Malaysia Ingredients for High Citation Index Hu Sze Yi School of Engineering Monash University Sunway Campus Bdar Sunway, Malaysia [email protected] Abstract-A citation is a reference to a book, article, web page, or other published items. A citation index on the other hand is an index of citations between publications and can be used as a measure of the quality of a paper. In general, a recently published paper tends to be cited less than a paper that was published 5-10 years ago. Similarly, a journal paper or a book chapter is cited more than a conference paper. In this paper, we investigate if there are any other ingredients that contribute to a paper being cited more. The scope of this research is limited to the journals from the Institute of Electrical and Electronic Engineers (IEEE) Xplore database only. We have used two statistical tests (correlation and significance) to identi the ingredients which affect the citation count. I. INTRODUCTION In this modem era, the academic community has placed a huge emphasis on paper quality. In some cases, the impact factor of a paper is used to judge the amount of nding a depament gets [1]. Similarly, goveent or private research organizations nd researchers who are likely to produce many highly cited papers. In this paper, we intend to study the ingredients that contribute to a research paper receiving a high citation count. To do so, some background information is presented in the literature survey. II. LITERATURE SURVEY In this section, a general description on the previous work which is related to our research is presented along with some statistical methods used in this paper. A. Previous Work Lawrence [3] identi that by putting papers online, the number of citations a paper gets can be increased considerably. Similly, Jonathan and Cole [4] discuss about the usage of citation count as a measure of quality. They also discuss about treating all citations as equal units. In their case, they did experiment to determine if a paper cited by a rst- ranked scientist should be more heavily weighted comped to other scientists. The results of that experiment show that there is not much difference and that it is possible to treat all citations as equal. Based on the experiments they did, they conclude that it is possible to use straight counts of citations with reasonable confidence to decide the quality of a paper. 978-1-4244-4844-9/09/$25.00 ©2009 IEEE 250 Rajendran Pariban School of Engineering Monash University Sunway Campus Bdar Sunway, Malaysia rajendran.pahiban@eng.monash.edu.my Thus, by identiing the ingredients which affect the citation count, we can then judge the quality of a paper. B. Correlation As we mentioned earlier, we aim to identi ingredients which affect the citation count. To do so, a coelation test can be caied out. Correlation is dened as an attempt to defme the strength of the relation between two variables by a number (correlation coefficient) [5]. The coelation coefficient is calculated using the Equation (1) where X = (X], X2, X3, ••• , X� d Y = ], Y2, Y3, ••• , Y� represent the viables which e to be studied and N is the total number of samples. (1) The correlation coefficient obtained lies between 1 and -1. A value closer to 1 indicates a songer positive coelation between the viables while a negative correlation indicates inverse correlation between them. C T-Testing Aſter determining the correlation coefficient, a significance test can be carried out to determine the level of confidence of the correlation when applied to real life data. The correlation coefficient calculated using Equation (l ) results in a value with a magnitude between 0 and 1 which indicates the strength of the coelation. However, this coelation strength could be subjective. To overcome this problem, a significance test (t-testing) is introduced. The t- value is evaluated using the following equation where r represents the correlation coefficient and N represents the number of samples [6].

Upload: rajendran

Post on 27-Feb-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2009 International Conference on Engineering Education (ICEED) - Kuala Lumpur, Malaysia (2009.12.7-2009.12.8)] 2009 International Conference on Engineering Education (ICEED)

2009 International Conference on Engineering Education (lCEED 2009), December 7-8, 2009, Kuala Lumpur, Malaysia

Ingredients for High Citation Index

Hu Sze Yi School of Engineering

Monash University Sunway Campus Bandar Sunway, Malaysia

[email protected]

Abstract-A citation is a reference to a book, article, web page, or

other published items. A citation index on the other hand is an index of citations between publications and can be used as a measure of the quality of a paper. In general, a recently published paper tends to be cited less than a paper that was

published 5-10 years ago. Similarly, a journal paper or a book chapter is cited more than a conference paper. In this paper, we investigate if there are any other ingredients that contribute to a paper being cited more. The scope of this research is limited to

the journals from the Institute of Electrical and Electronic

Engineers (IEEE) Xplore database only. We have used two statistical tests (correlation and significance) to identify the

ingredients which affect the citation count.

I. INTRODUCTION

In this modem era, the academic community has placed a huge emphasis on paper quality. In some cases, the impact factor of a paper is used to judge the amount of funding a department gets [1]. Similarly, government or private research organizations fund researchers who are likely to produce many highly cited papers.

In this paper, we intend to study the ingredients that contribute to a research paper receiving a high citation count. To do so, some background information is presented in the literature survey.

II. LITERATURE SURVEY

In this section, a general description on the previous work which is related to our research is presented along with some statistical methods used in this paper.

A. Previous Work

Lawrence [3] identify that by putting papers online, the number of citations a paper gets can be increased considerably. Similarly, Jonathan and Cole [4] discuss about the usage of citation count as a measure of quality. They also discuss about treating all citations as equal units. In their case, they did an experiment to determine if a paper cited by a fIrst­ranked scientist should be more heavily weighted compared to other scientists. The results of that experiment show that there is not much difference and that it is possible to treat all citations as equal. Based on the experiments they did, they conclude that it is possible to use straight counts of citations with reasonable confidence to decide the quality of a paper.

978-1-4244-4844-9/09/$25.00 ©2009 IEEE 250

Rajendran Parthiban School of Engineering

Monash University Sunway Campus Bandar Sunway, Malaysia

[email protected]

Thus, by identifying the ingredients which affect the citation count, we can then judge the quality of a paper.

B. Correlation

As we mentioned earlier, we aim to identify ingredients which affect the citation count. To do so, a correlation test can be carried out. Correlation is defIned as an attempt to defme the strength of the relation between two variables by a number (correlation coefficient) [5]. The correlation coefficient is calculated using the Equation (1) where X = (X], X2, X3, • • • , X� and Y = (Y], Y2, Y3, • • • , Y � represent the variables which are to be studied and N is the total number of samples.

(1)

The correlation coefficient obtained lies between 1 and -1. A value closer to 1 indicates a stronger positive correlation between the variables while a negative correlation indicates an inverse correlation between them.

C. T-Testing

After determining the correlation coefficient, a significance test can be carried out to determine the level of confidence of the correlation when applied to real life data. The correlation coefficient calculated using Equation (l ) results in a value with a magnitude between 0 and 1 which indicates the strength of the correlation. However, this correlation strength could be subjective. To overcome this problem, a significance test (t-testing) is introduced. The t­value is evaluated using the following equation where r represents the correlation coefficient and N represents the number of samples [6].

Page 2: [IEEE 2009 International Conference on Engineering Education (ICEED) - Kuala Lumpur, Malaysia (2009.12.7-2009.12.8)] 2009 International Conference on Engineering Education (ICEED)

2009 International Conference on Engineering Education (ICEED 2009), December 7-8, 2009, Kuala Lumpur, Malaysia

r.JN -2 t - ---;,==-- .Jl-r2 (2)

The t-value is then compared to the t-distribution table to determine the significance of the correlation based on a confidence interval. This test requires around 30 observations for the claim to be valid [7].

III. RESEARCH PROCEDURE

In this section, we present the approach used to identify the ingredients which affect citation count. Firstly, we make two assumptions and justify them. We then elaborate on the steps in our research procedure in Fig. 1 in the following subsections.

A. Assumptions r---------------------,

Identify Target of Research

Identify Ingredients to Study

Sampling

Storing in Database

Study Papel'"S

Perfur"m Correlation and Significance Test

Fig. 1: Flowchart of Research Procedure

To start off this research, two assumptions are made. The first assumption is that journals are cited more often than conference papers. In order to justify this assumption, 5 journals and 5 conference papers were collected every 5 years for the time period from 1976 until 2005. This totals to 60 papers collected for the first assumption. The mean citation of the journals obtained is 319.2333, while the mean citation count for conference papers is 7.6333, which supports our claim made in the first assumption.

251

The second assumption made is that a recently published paper gets fewer citation counts than a paper published 5 to 10 years ago. For this purpose, we plot the trend of citation counts for journal papers published from 1996 - 2005. Fig. 2 shows this trend. From this figure, we can conclude that the citation count does decrease with increasing year of publication.

Average Citation Count Vs Time Frame 900

� 800

.9 700 •

.S! 600

.. 500 " 400 v

GI 300 � 200 � <I 100

0

Time Frame

Fig. 2: Trend of Citation Count of Journals

B. Target of Research

After justifying the assumptions, we decided to study only journals in this paper. In order to minimize variance, journals from the IEEE Xplore database are chosen and scholar.google is used to identify the citation count for these journals. Only the engineering fields of Communications, Controls and Electronics are considered for this paper.

C. Identifying Ingredients

A list of potential ingredients is identified and the justifications for them are as follows:

1) Basic Maths Application: It is a known fact that maths

and engineering go hand in hand. Therefore, an engineering paper generally has some form of maths in it. This ingredient is considered present in a paper if it contains basic maths application such as algebraic manipulations. A person with a basic engineering knowledge should be able to comprehend this maths.

2) Advance Maths Application: Although engineering papers are expected to have some form of basic maths, there are times when advance maths application is needed. This is because when a research deals with something very technical, it is almost impossible to explain it using basic maths. As a

result, the usage of advance maths could be an indication of the level of technicality of a paper.

3) Modeling Application: Some authors sometimes introduce the theory behind their research and later model their fmdings in research papers. This modeling application can be considered as an important factor in a paper getting a high citation index. This is because the modeling application enables people to view a problem at an abstract level.

Page 3: [IEEE 2009 International Conference on Engineering Education (ICEED) - Kuala Lumpur, Malaysia (2009.12.7-2009.12.8)] 2009 International Conference on Engineering Education (ICEED)

2009 International Conference on Engineering Education (lCEED 2009), December 7-8, 2009, Kuala Lumpur, Malaysia

4) Design Application: In today's modem world, new ideas and inventions are created in a rapid pace. As a result, some researchers introduce a new design application to their paper. The presence of this factor gives readers a different view built on an existing concept. Therefore, more people are interested to read these papers and thus improve their chances of it being cited.

5) Real Life Data: When a research is done in any field, there must be an input data. The use of real life data from

industries is considered important in certain research areas. This is because if a paper uses a random data as the input, the concept brought forward may only be applicable to that data. However, the use of real data proves that the concept is applicable in real life and thus is more significant.

6) Optimization: This ingredient is for papers which contain a new way of optimizing a design. Although it may not be a totally new design, the idea of optimizing it is considered as a form of advancement in that field and thus considered as an ingredient.

7) Simulation: Researchers use simulation to verify a design or an optimization. This helps to check whether the concept or solution brought forward is applicable without the need of actually building a prototype.

8) Readability: The purpose of writing a research paper is

to spread information about the research that has been done. If the research paper is easily comprehendible, that information is able to target a wider scope of readers. This could also affect citation index.

9) Experimental Result: If a researcher conducts some experiments and presents the fmdings in a paper, it can be justified that the paper carries more weight in that particular field of research compared to a paper without any experiment. This is because sometimes simulation alone is not reliable enough to verify a certain concept. Real life implementation may have some additional parameters or factors which might not have been accounted for during a simulation.

10) Catchy Title: The first impression of any research paper is the title. lithe title sounds interesting and catchy, generally people are attracted to read the research paper and thus

increasing the probability of the paper being cited.

1 1) Interjields: When a research is conducted, there is no boundary to the areas of the research. As a result, some researches may involve the combination of more than a single engineering discipline area. This kind of paper generally could reach a wider range of readers as more than one discipline is involved.

252

D. Data Storage

A database was created using Microsoft Access 2007. Important information such as title, author, citation count, paper information, year of publication as well as ingredients present are stored in the database. These ingredients could be one or more of the ingredients identified in the previous section.

14

12

� 10

� 8 " � 6

4

� " ., " r:r �

u..

2

o

14

12

10

8

6

4

o

12

10

o

Distribution for Communications

• • • • • • • 1- 26- 51- 76- 101-126-151-176-201-226-251-276-301-326-

25 50 75 100 125 150 175 200 225 250 275 300 325 350

Citation Count

Fig_ 3: Distribution of Communications

Distribution for Controls

• • • • • •

• • • • • 1- 26- 51- 76- 101-126-151-176-201-226-251-276-301-326-

25 50 75 100 125 150 175 200 225 250 275 300 325 350

Citation Count

Fig_ 4: Distribution for Controls

Distribution for Electronics

I I • I I I I I 1- 26- 51- 76- 101-126-151-176-201-226-251-276-301-326-

25 50 75 100 125 150 175 200 225 250 275 300 325 350

Citation Count

Fig_ 5: Distribution for Electronics

E. Data Sampling

We randomly chose a sample of 100 papers for each engineering field (communications, control, and electronics). We require these samples to be normally distributed based on

Page 4: [IEEE 2009 International Conference on Engineering Education (ICEED) - Kuala Lumpur, Malaysia (2009.12.7-2009.12.8)] 2009 International Conference on Engineering Education (ICEED)

2009 International Conference on Engineering Education (lCEED 2009), December 7-8, 2009, Kuala Lumpur, Malaysia

citation count. We ensured that it is the case for our samples as shown in Fig. 3, Fig. 4, and Fig. 5

F. Data Correlation and Significance Test

The papers collected were studied and the software "Foxit" was used to document the ingredients identified. Data correlation was then performed between the citation count and number of observations to determine the correlation coefficient. Any ingredient which has a correlation coefficient value which is less than 0.25 in magnitude was dropped because we chose 0.25 as the threshold value. Since the correlation coefficient value is no guarantee that the ingredient could be significant if applied to real data, t-testing was done to show the level of confidence.

IV. RESULTS AND DISCUSSION

In this section, we present and discuss the results obtained after performing the correlation and significance test. Table 1, Table 3, Table 5 and Table 7 show the correlation coefficient values obtained through the correlation test as well as the t­value after performing the t-test. On the other hand, Table 2, Table 4, Table 6 and Table 8 show the confidence level of the ingredients which are identified as significant. Since the interpretation of the results is similar for the field of Communications, Controls, Electronics and the overall Engineering field, we only present the results for the field of Communications in detail and provide brief discussions for the other fields.

A. Communications

Table 1 shows the correlation coefficient values (third column). Based on the threshold of 0.25 set in the previous section, only the ingredients design application, inter field, real life data and catchy title are deemed to have some correlation with the citation count. As a result, ingredients such as maths application, advance maths application, modeling application, simulation, experimental, optimization and readability are dropped.

TABLE I. EXPERIMENTAL RESULTS FOR COMMUNICATIONS

Ingredients Total Correlation T-Testing Observation Coefficient Maths 88 0.182867 0.644337 Application

AdvMaths 25 -0.00602 -0.02084 Application

Modeling 97 0.065104 0.226005 Application

Design 56 0.478474 1.887577 Application

Simulation 84 0.12055 0.420667 Experimental 7 0.227341 0.808707 Optimization 43 0.075892 0.263659 Interfield II 0.378412 1.416166 Real Life Data 7 0.307722 1.120342 Catchy Title 1 0.378412 1.416166 Readability II -0.00943 -0.03267

253

TABLE II. SIGNIFICANCE OF INGREDIENT

Ingredient Confidence Number of Conclusion Interval Observations Design Application 90% 56 Significant Interfield 80% II Insignificant Catchy Title 75% 1 Insignificant Real Life Data 70% 7 Insignificant

The fourth column in the Table 1 shows the t-value obtained through Equation (2). These t-values are compared to a t-distribution table to obtain the confidence level. The results of these comparisons are shown in Table 2.

The second column in Table 2 shows the level of confidence that the ingredient could be significant when applied to real data. As shown in Table 1 earlier, only the ingredients design application, inter field, catchy title and real life data showed some correlation with the citation count. However, out of the four ingredients, only the ingredient design application can be considered as significant with a 90% confidence value. The other ingredients had to be dropped because they did not meet the requirement of having 30 observations as mentioned in Section IV.C. The same interpretation of the results applies to the other fields of Controls, Electronics and the Engineering field as a whole.

B. Controls

TABLE III. : EXPERIMENTAL RESULTS FOR CONTROLS

Ingredients Total Correlation T-Testing Observation Coefficient MathsApp 100 0.075099 0.260889 AdvMaths 25 0.679872 3.211575 Application

Modeling 100 0.075099 0.260889 Application

Design 53 0.137444 0.48068 Application

Simulation 47 0.074503 0.258807 Experimental 3 -0.06478 -0.22486 Optimization 22 0.320701 1.172892 Interfield 2 -0.10127 -0.35264 Real Life Data 0 0 0 Catchy Title I 0.172005 0.604858 Readability 5 0.05547 0.19245

TABLE IV. SIGNIFICANCE OF INGREDIENT

Ingredient Confidence Number of Conclusion Interval Observations Advance Maths 99.80% 25 Plausible Application

Optimization 70% 22 Plausible

For the field of Controls, the ingredients advance maths application and optimization is considered plausible because they do not have the necessary 30 observations to make a solid conclusion. However, they have around 20 to 30 samples. This

Page 5: [IEEE 2009 International Conference on Engineering Education (ICEED) - Kuala Lumpur, Malaysia (2009.12.7-2009.12.8)] 2009 International Conference on Engineering Education (ICEED)

2009 International Conference on Engineering Education (lCEED 2009), December 7-8, 2009, Kuala Lumpur, Malaysia

means, if we add more papers, there is a possibility for these ingredients to be very significant.

C. Electronics

For the field of Electronics, maths application and experimental work gave us a negative correlation which indicates that these ingredients are not favorable in papers with high citation counts. The ingredients advance maths application and readability is dropped due to insufficient observations while the ingredient optimization is considered plausible with 65% confidence value.

TABLEV. EXPERIMENTAL RESULTS FOR ELECTRONICS

Ingredients Total Correlation T-Testing Observation Coefficient Maths 30 -0.41106 -1.56202 Application

AdvMaths 2 -0.37841 -1.41617 Application

Modeling 77 -0.0465 -0.16125 Application

Design 62 0.126137 0.44047 Application

Simulation 36 -0.12328 -0.43032 Experimental 72 -0.4189 -1.59808 Optimization 20 0.289477 1.047633 Interfield 0 0 0 Real Life Data 11 -0.07874 -0.27361 Catchy Title 0 0 0 Readability 5 -0.37841 -1.41617

TABLE VI. SIGNIFICANCE OF INGREDIENT

Ingredient Confidence Number of Conclusion Interval Observations MathsApp 85% 30 Significant Experimental 85% 72 Significant

Adv Maths App 80% 2 Insignificant Readability 8oolo 5 Insignificant Optimization 65% 20 Plausible

D. Engineering Field

Another case study was carried out where all the papers studied earlier were grouped together and viewed as a whole under the Engineering field.

Ingredients Maths Application

AdvMaths Application

Modeling Application

Design Application

Simulation

Experimental

Optimization

Interfield

TABLE VII. GENERAL EXPERIMENTAL FINDINGS

Total Correlation T-Testing Observation Coefficient 218 0.008936 0.030956

52 0.388886 1.462239

274 0.041498 0.143878

171 0.360857 1.340357 167 0.062472 0.216832 82 -0.29005 -1.04989 85 0.309418 1.127172 13 0.290801 1.052865

Real Life Data 18 0.222155 0.789289 Catchy Title 2 0.405096 1.534871 Readability 21 -0.27273 -0.98199

254

Ingredient

Catchy Title

Adv Maths App

DesignApp

Optimization

Experimental

Interfield

Readability

TABLE VIII. SIGNIFICANCE OF INGREDIENT

Confidence Number of Conclusion Interval Observations

85% 2 Insignificant 80% 52 Significant 75% 171 Significant 70% 85 Significant 65% 82 Significant 65% 13 Insignificant 65% 21 Plausible

The results obtained indicate that advance maths applications, design application as well as optimization are ingredients which are highly correlated with papers with high citation counts. However, experimental work is shown to be not popular from this study because it has negative correlation.

V. FUTURE WORK

To improve the results of this research, ingredients which were deemed as plausible should be studied with a larger sample as there is a very strong indication that they could become significant ingredients.

In addition to that, a different time frame could be considered to see if this trend holds. Another database such as the Association of Computing Machinery (ACM) could also be studied to see if the same results are achieved. By doing so, a more general conclusion can be made.

VI. CONCLUSION

In this paper, a list of potential ingredients which might contribute to a paper receiving a high citation count has been identified. A statistical approach was used to narrow the list down to successful ingredients as well as ingredients which have a tendency to be significant.

For the field of Communications, design application is a successful ingredient. Similarly, advance maths application and optimization are plausible ingredients for the Control area. In the case of Electronics, optimization is a plausible ingredient.

If we group all papers together, we identify advanced maths application, design application, and optimization as significant ingredients for receiving high citation index.

REFERENCES

[1] W. Kuo, and J. Rupe, R-Impact: Reliability-Based Citation Impact Factor, IEEE Transactions on Reliability, vol. 56, pp. 336-367, 2007.

[2] M. K. Mcburney, and P. L. Novak, What Is Bibliometrices and Why Should You Care?, IEEE Proceedings of ProfeSSional Communication Conference, pp. 108-114,2002.

[3] S. Lawrence, Online Or Invisible?, Nature, vol. 411, pp. 521-524,2001. [4] J. Cole and S. Cole, Measuring the Quality of Sociology Research:

Problems In The Use of The Science Citation Index, The American SOCiologist, vol. 6, pp. 23-29,1971.

[5] R. E. Walpole, R. H. Myers, S. L. Myers, and K. Ye, Probability & Statistics for Engineers & Scientists, pp. 432, 2007.

Page 6: [IEEE 2009 International Conference on Engineering Education (ICEED) - Kuala Lumpur, Malaysia (2009.12.7-2009.12.8)] 2009 International Conference on Engineering Education (ICEED)

2009 International Conference on Engineering Education (lCEED 2009), December 7-8, 2009, Kuala Lumpur, Malaysia

[6] D. A. Lind, W. G. Marchal and S. A. Wathen, Statistical Techniques in Busniess & Economics, pp. 438-439, 2005,

[7] J. Lawson, and J. Eljavec, Modem Statistics for Engineering and Quality Improvement, pp. 155-157,2001.

255