Alternative bibliometrics from the web of knowledge surpasses the impact factor in a 2-year ahead annual citation calculation: Linear mixed-design models' analysis of neuroscience journals
Correspondence Address: Source of Support: None, Conflict of Interest: None DOI: 10.4103/0028-3886.222880
Source of Support: None, Conflict of Interest: None
Context: The decision about which journal to choose for the publication of research deserves further investigation.
Keywords: Algorithms, bibliometrics, citation, impact factor, predictive value
In the process of writing a scientific article, submission to a journal is considered the last primary stage in publishing a manuscript. In that process, it is very common that researchers look for journals with the highest impact factor (IF) instead of journals with the best audience for their research. The IF is a part of a set of bibliometric tools used by the Web of Knowledge (WOK) managed by Thomson Reuters. WOK encompasses six additional metrics: 5-year impact factor, immediacy index, number of articles (published), cited half-life, Eigenfactor™ score (ES), and article influence score. The IF has become an increasingly used metric for consideration of tenure and promotion, as well as for budget and resource planning within universities, research institutions, and colleges., However, this behavior seems counterintuitive to the cumulated opinions, year after year (including published views in high IF journals as Nature), stating that the IF is not a perfect metric and has its limitations.,, In 1997, the IF was already considered still far from being a quality indicator and not recommended for evaluating research. IF is not associated with factors such as quality of peer review process and quality of content of the journal. It is a measure that reflects the relative importance of a journal within its field and quantifies the frequency with which an “average article” in a journal has been cited in a particular period. In 2014, the American Society for Cell Biology, along with journal editors, publishers, and other stakeholders, issued a pledge to move away from an over-reliance on journal IF and to seek new ways of assessing research output. A similar opinion appeared in 2017 in an editorial of BMC Medicine.
Some publications consider that alternative bibliometrics in the WOK like the ES can assess the real dissemination of an article (i.e., its use as well as the category of journals which include it in their reference lists). Thus, the ES would better capture the prestige of a journal than the IF. A recent study using linear-mixed models found that the ES and cited half-life are the two most significant metrics able to predict citation rates in a 2-year-ahead citation calculation in gastroenterology and hepatology journals.
To prove if the findings mentioned above would apply to other fields, we purposed to evaluate the predictive ability of bibliometric tools included in the WOK in the Neurosciences category to calculate the annual total cites during a 7-year period. Our findings might help authors to answers the questions: which journals within my field should I select for the submission of my manuscripts?; and, which bibliometrics should I use to rank my preferred journals for submission?
This was a retrospective study that did not require approval by the Institutional Review Board. We selected bibliometrics from the WOK  used to evaluate journals' performance in the Journal Citation Index (JCI). JCI includes medical journals that are categorized in the Science Citation Index (SCI) and the SCI Expanded (SCIE). WOK website allowed us to record bibliometrics values over a 7-year period. Bibliometrics were defined as follows:
Selection of journals and measured periods
We recorded the bibliometric values of 275 journals in the Neurosciences category of the JCR Science Edition [Table 1]. We chose this group because journals in this field published several of our articles. For the analysis, only journals that coincidentally appeared between the 2007 and 2013JCR Science Editions were included.
We recorded the bibliometric values offered by the WOK including five sets of bibliometrics for each journal and for each selected year that matched to its total citations 2-year ahead:
Sample size calculation
We followed the recommendation of Tabachnik and Fidell  for multivariate normality: we included seven measurements (per 275 journals), as even with different n and only a few dependent variables (DVs), a sample size of about 20 in the smallest cell ensure robustness in the analysis.
Design of the mixed-effects model
Our predictive model of 2-years ahead Total Cites combined the overall effect of bibliometrics while taking into account within-journal cites in five repeated measures. The use of random slopes and intercepts allowed us to test the hypothesis that alternative bibliometrics of WOK surpasses the IF as predictors of a total-cite calculation. Our mixed-model comprised a three-level hierarchy – level 1, selected bibliometrics (continuous variables) consecutively measured during five years; level 2, the year of citation; and level 3, the journal ID. [Figure 1] shows the three-level hierarchical model; it is evident how levels 1 and 2 are nested within each journal.
Justification for multilevel modeling
Experts consider that a multilevel modeling is needed when a high intraclass correlation coefficient (ICC) value points to variations in the mean selected biomarker levels across time (year-over-year data). ICC was used to estimate the selected biomarker variance that occurs both across journals and time of measurement.
Selection of dependent and independent variables
The 2-years ahead Total Cites was the dependent variable (DV). Independent variables (IV) included: seven continuous variables: impact factor, 5-year impact factor, immediacy index, number of articles, cited half-life, Eigenfactor score, article influence score; and one categorical variable: year of measurement (time-set of citations).
Mixed-model effects analysis
We used the maximum likelihood (ML) estimation. ML is an appropriate approach for studying individual changes as well as fixed and random effects of the entire model. ML created a hierarchical model with repeated measures nested at six consecutive years within journals. To specify the within-individual error covariance structure that best fits the data, we evaluated the four most common covariance matrixes types (unstructured, scaled identity, compound symmetry, diagonal). The fitness of the model was assessed with the 2-log likelihood, Akaike's Information Criterion (AIC), and Bayesian Information Criterion (BIC).
Our model considered the fixed effects of the selected IV and a random effect for the repeated measures to characterize the characteristic variation caused by individual differences. A scatter plot depicted the fitting of the predicted versus the observed values of Total Cites for the whole model labeled by subgroups (year of cites). A linear regression (LR) analysis calculated the R2 and P values.
Measure of the effect size
The effect size was defined as the proportion of the variance in the DV explained by the IV by using the R proposed by Cohen; where 0.1 to 0.29 = small effect, 0.30 to 0.49 = moderate effect, and ≥0.5 = large effect. We computed a pseudo-R2 as a measure of global effect size statistic (squared correlation between the observed and predicted scores).
Evolution of IF and ES over time
We plotted the evolution of the IF and ES (separately) in the neurosciences category by sorting the journals in five groups based on their values of IF [(level 1, 0 to 6.0); (level 2, 6.1. to 12.0); (level 3, 12.1 to 18); (level 4, 18.1 to 24); (level 5 >24)]; and ES [(level 1., 0 to. 090); (level 2.,091 to. 180); (level 3.,181 to. 270); (level 4.,271 to. 360); (level 5>.361)] from 2007 to 2013 and performed a split-plot factorial analysis of variance (ANOVA).
Displacement of the ranking place
We drew a graph line to show how the top 25 journals initially ranked by the IF were re-ordered based on their ES values. All analyses were carried out using the IBM's SPSS software (version 22.214.171.124, IBM Corporation, Armonk, NY, USA). Statistical significance was indicated by P< 0.05 (two-tailed).
Justification for multilevel modeling
We found significant ICC values in all selected bibliometrics: impact factor (ICC = 0.993, P = <0.001); 5-year impact factor (ICC = 0.989, P = <0.001); immediacy index (ICC = 0.970, P = <0.001); number of articles (ICC = 0.992, P< 0.001); cited half-life (ICC = 0.985, P = <0.001); Eigenfactor score (ICC = 0.998, P = <0.001), and article influencescore (ICC = 0.998, P = <0.001).
Overall fit for models
The diagonal matrix type represented the best model by depicting the smallest values in the information criteria table. [Table 2] shows the assessment of the overall fit of the multivariate models. [Figure 2] shows graph lines identifying each selected journal in the final model; the existence of random slopes and intercepts is evident.
Significant predictors of total cites and beta coefficients of the regression model
The by-subject random slopes and intercepts for the effect of repeated measures (time) showed a substantial effect (Wald Z = 6.331, P<.001) for four IVs: immediacy index, number of articles, cited half-life, and Eigenfactor score. The impact factor, 5-year impact factor, and article influence score were not significant. [Table 3] shows the main effects for the selected model.
The more significant coefficient corresponded to the ES followed by the CHL; both variables depicted a positive, meaningful direction in the outcome. [Table 4] shows the unstandardized beta coefficients for each variable and its 95% confidence intervals.
A linear equation representing our assembled model was structured as:
Total cites (number) = −1063.40 + 51.95 (impact factor) + 65.37 (5-year impact factor) + 257.59 (immediacy index) + 3.22 (number of articles) + 899.57 (cited half-life) + 59690.66 (Eigenfactor score) + 191.37 (article influence score)
Global effect size of the model
The global effect size showed an R2 value = 0.999, P< 0.001; this value corresponded to a large effect size. [Figure 3] represents the regression line between the observed and predicted values for total cites.
Evolution of IF and ES over time
There was no significant interaction between the IF or ES of the journals when grouped by a five-level rank. The IF showed an increasing trend from 2008 to 2011 with a plateau from 2011 to 2013 (for journals with an IF > 24). Conversely, the ES depicted a decreasing trend from 2008 to 2013 (for journals with an ES > 0.361); [Figure 4]a and b shows the IF and ES patterns.
Comparison of journals' ranking using IF, ES, and CHL
Only 13 of the 25 journals previously ranked in the top 25 by the IF stayed in this selected group after they were re-sorted using decreasing values of ES, with striking differences in their new positions. For example, J Neurosci from place 24th climbed to 1st and Mol Psychiatr dropped from the 5th to the 24th spot. An additional resorting using the CHL showed that only 2 of the first 25 journals stayed in the top 25 places. [Figure 5] depicts the displacements in rankings based on decreased values of IF, ES, and CHL.
Active investigators will remain apprised not only of the latest discoveries in their field but also will improve their understanding of which audience might be more interested in a particular research topic, ideally before a manuscript submission. Recently, journals from diverse fields have been publishing reports about the apparent limitations of the IF.,,,,, It has been accepted that neither the IF nor the total number of citations is, per se, the metric of the overall influence of a journal. All journals have a diverse set of citations, and even the best publications contain some papers that are never cited. That is, citations are not equally distributed, with fewer than 20% of the articles accounting for more than 50% of the total number of citations., Despite these facts, the misuse of the journal IF for judging the value of science persists because it confers significant benefits to individual scientists and journals., In this scenario, becoming familiar with bibliometric indices is a substantial concern for individual researchers because they may influence their teaching and research activities. Nowadays, measuring the impact of publication is mandatory not only for the individual but also for universities and research institutes; thus, journals and institutions are sufficiently correlated with the prestige and ranking outcome.
We considered in this study the “2-years ahead total cites” as our DV. Some readers might ask why not use that metric to gauge the reputation of a journal; we did not use this because, by the time (chosen year) a researcher makes a bibliometric analysis of his target journals, the number of years ahead total cites is unknown. Our findings that four meaningful metrics (CHL, ImIn, ES, and AN) depict significant predictive ability for a total-cites calculation 2-year ahead match with the results of a similar study evaluating bibliometrics in the Gastroenterology and Hepatology category.
Although our results might seem obvious, we were compelled to assess the predictive ability of the same bibliometrics in Neuroscience journals considering the striking facts about how IF are different between various specialties. For example, a journal in the oncology field might have an IF up to 30 times as high as the corresponding figure in the forensic medicine category.
Some explanations for our findings acknowledge that, although IF is a per-article measure, it has not been replaced by any other means of rating the quality of journals. People most of the times use it, although several editors are gradually removing the IF indicator from their journals' websites. On the other hand, ES is a per-journal measure that scales with journal size, as is “total cites;” then, the superiority of the ES evaluating the quality of journals has been formerly reported. The nonsignifi cance of the IF and 5-yIF as predictive bibliometrics in our study could be justified as they measure citations “per article,” are poor indicators of total cites given that scholarly journals vary in size over multiple orders of magnitude; and the volume of journals normalizes IF, 5-yIF, and AIS. The AN (number of articles) at least scales with the journal size but does not account for quality; hence, the ES is a winner in an unbalanced competition.
An additional feature of the ES that helps to explain its predictive ability is that ES ranks journals similar to how Google ranks websites: the ES algorithms use the structure of the entire network and not just citation information to evaluate the importance of each journal. ES provides a 5-year analysis of articles published and citations rather than a 2-year evaluation, and it does not include self-citations in its studies. Furthermore, the ES formula has no denominator; journals that publish copious articles have a higher ES than those that publish sparse articles if the average quality of the published reports is coincident among these journals.
Some advantages of using a mixed-model design are worth mentioning. In contrast to conventional multivariate analysis techniques that only focus on group differences in patterns of change, mixed-models allow the study of both intra- and interindividual variations (e.g., slopes and intercepts). In addition, multilevel linear models dismiss the assumption of homogeneity of regression slopes, cast aside the assumption of independence (between cases), and expect the presence of missing data.
Several limitations of this study need to be addressed – a detailed explanation of each of the presented bibliometric indices is beyond the scope of this article. We acknowledge that, besides numerical markers, authors of articles also use the perceived reputation of a journal to decide the target journal for their articles. This depends on the publisher, affiliation to a reputed society, editor, and editorial board members; we did not include these variables as their measurement is not standardized and they are not included in the WOK database. We observed in our data bibliometrics with high correlations, in those situations, the usual solution would be the deletion of the redundant variable. However, we decided to retain all variables in the model considering that, in a linear mixed-effects model, the assumption of independence (between cases) is cast aside and correlation among variables is expected. Our predictive analysis was limited to time-sets in a 2-year comparison period (five repeated measures of bibliometrics from 2007 until 2013), which might seem a bit meager. Readers should be aware that the first publications by Bergstrom et al., about ES methodology appeared in 2008. We did not include additional confounding factors such as longer time frames, number of articles published in each issue, and circulation of each journal because the WOK does not consider them. The JCR includes approximately 171 categories in the sciences and about 54 in the social sciences; then, the publication of future studies validating our model in other specialties would be desirable.
In conclusion, it is in the researchers' hands to stop the misuse of IF alone to evaluate journals; we found evidence that alternative bibliometrics from the WOK provide a better assessment of the influence and importance of scientific journals in a particular discipline. An integrative model using a set of metrics could represent a useful tool for researchers in their decision-making during the manuscript submission phase; it may even supplement new “standards” of the quality and validity of the research.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]
[Table 1], [Table 2], [Table 3], [Table 4]