To achieve optimal results in FA and PCA multivariate statistics, practitioners need to establish the correct number of components or factors for retention. The scree plot, which Raymond B. Cattell first presented in 1966, provides aid to analysts for identifying the “elbow” point at which eigenvalues stabilize while explaining diminished variance. The article presents a thorough analysis of scree plots by discussing their uses and creation methods together with interpretation methods and evaluation of faults along with alternative options.
The scree plot shows eigenvalues of factors or principal components based on the analysis as points that appear on a line. The values of eigenvalues measure the change in dimensional space caused by components, while larger eigenvalues suggest more important information combined with each component. Transducers show eigenvalues through vertical scales, while component numbers use horizontal scales to decrease from left to right.
Scree plots serve to assist in choosing which factors or components should be kept in FA and PCA. The scree plot lets analysts determine the point where eigenvalues stabilize and create an elbow formation. The significant components appear before the elbow, while the components after it can be discarded. The term scree plot draws its name from the geological concept of scree, which describes loose rock materials building up against mountains. The plot follows an initial steep descent before it reaches a steady leveling state, much like how mountains transition to scree fields. The fundamental elements in the design correspond to the “mountain,” while additional details of lesser importance consist of the “scree.”
The first step to build a scree plot involves conducting PCA or FA on selected data. The methods generate a set of uncorrelated components or factors from the original variables through their transformation process. The calculation of eigenvalues gives the quantity of explained variance for each component or factor after standard PCA or FA techniques. The obtained eigenvalues get positioned according to their component numbers when plotted against each other. A standard plot shows a descending eigenvalue distribution where the biggest value starts from the left and reaches the smallest value on the right side. A line between data points forms a curve that reveals the eigenvalue distribution pattern.
Statistical software packages like R and Stata offer built-in functions for conducting PCA or FA and generating scree plots. In R, the
‘prcomp()’
function can be used to perform PCA, and the
‘plot()’
function can then be used to create the scree plot. In Stata, the
‘screeplot’
command can be used to generate a scree plot after commands like
‘pca’
or
‘factor’
The analysis of a scree plot requires researchers to detect where eigenvalues reach a point of stabilization. The elbow indicates the stage when adding additional components or factors provides minimal explained variance to the overall dataset. Further analysis requires the retention of all components and factors positioned to the left of the elbow in the scree plot. Subjectivity enters into identifying the elbow point because scree plots either show multiple elbows or a smooth curve without an obvious elbow. When the situation requires analysts to judge it, they can apply their expertise and additional criteria for identifying the best number of components or factors.
Although scree plots are commonly applied, they receive criticism because of their interpretive element, which can lead to inaccurate results. Observing elbow points by sight proves problematic when carrying out visual assessments since different analysts tend to show personal biases that affect results inconsistently. The signal produced by scree plots might lack clarity regarding its elbow point, which prevents analysts from accurately identifying the correct number of components or factors to use. The method of using scree plots produces too few factors or components, which may result in disregarding crucial parts of the analyzed information. Results become affected because the elbow may not show clearly, and the analysis shows caution in interpretation. Different statistical programs can generate different graphic output from the same dataset because they follow no standard rules when scaling the x and y axes.
The limitations of scree plots have prompted researchers to create various methods for determining which components or factors should be kept. These include:
Kaiser Criterion: This method retains components or factors with eigenvalues greater than. The method is said to retain excessive factors in its factor analysis process.
Parallel Analysis: Parallel Analysis evaluates actual data eigenvalues against random data eigenvalues to determine suitable components or factors for retention. The method retains components that show eigenvalues exceeding the eigenvalues determined from random data measurements.
Elbow Method: The “elbow” point definition as the maximum curvature point has resulted in the Kneedle algorithm development according to the concept of operator knee detection through maximum curvature effects.
These alternative analytical procedures provide administrators with quantitatively based methods for deciding how many factors to conserve, which could resolve the human-based conjecture found in scree plots.
As a visual assessment in multivariate statistics, the scree plot enables researchers to determine how many components or factors should be included in PCA or FA processes. Scree plots retain their value for analysis when researchers combine them with other assessment approaches and professional evaluation. Analysts achieve better data reduction and analysis decisions through their understanding of scree plot functions and construction methods, along with interpretation strategies and the available criticism and alternative methods.