![]() A best-fit line slanting downward to the right, depicted in Figure 2, indicates a negative correlation. A positive correlation is associated with a best-fit line that slants upward to the right, like that in Figure 1. By design, the correlation value can range from -1 to +1. (“R” or “r” is the traditional letter designation for correlation).Ĭorrelation is a statistical measure, and it indicates how well, or poorly, a straight line conforms to a pair of data elements. Finally, the legend on the plot reports the correlation measure, r=0.98. Note also the dotted line in the plot that is the “best fit” of a straight line to the age-income data. In this scatter plot, each dot represents a different customer. My goal is to help you avoid the usual pitfalls, so that you can use correlation and scatter plots effectively in your own work.Ĭonsider the graph of income versus age that is shown in Figure 1. As we’ll soon see, there are good, bad, and ugly things that can happen when you apply a purely computational method like correlation. Or, you might compute a number that is the correlation between these two customer demographics. For instance, you might want to view the relationship between the age and income of your customers as a scatter plot. These tools are fundamental for gauging the relationship (if any) between pairs of data elements. We will tackle two, intertwined tools/topics this time: correlation and scatter plots. In this article, I continue the “Nuts and Bolts of Data Mining” series. Nuts and Bolts of Data Mining: Correlation & Scatter Plots
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |