Testbed for ColdFusion Linear Regresion

Testbed for ColdFusion Linear Regression

Data Points

This set of test data holds world population (in billions) for six different years relative to 1980; i.e., "5" represents the year 1985. The ultimate source of this data is the United States Census Bureau. The example to be devloped was taken from page 267 (Section 5.4) of Elementary Linear Algebra, Seventh Edition; by Ron Larsons; 2013; Brooks/Cole Publishing. This example was used to develop the software because details of each intermediate step are given along with the final solution.

Year (since 1980) World Population (in billions)
5 4.9
10 5.3
15 5.7
20 6.1
25 6.5
30 6.9

Problem: find the least squares linear regression to the quadratic polynomial

for the data and use the model to estimate the population for the year 2015.

Solution

By substituting the data points (5, 4.9), (10, 5.3), (15, 5.7), (20, 6.1), (25, 6.5) and (30, 6.9) into the quadratic equation above, you obtain the following system of linear equations:

Now you have six equations and three unknowns. The unknowns are c0, c1 and c2. Two techniques for solving two equations with two unknowns or three equations with three unknowns, named "substitution" and "elimination" are usually covered in any ninth-grade algebra class. In first and second year college algebra for business majors, easier methods using matrices are taught. Matrix algebra is convenient to implement in computer programs and pocket calculators because they can be solved with three nested loops. Because computing machinery doesn't care whether your loop repeats two or three times or a million times, the same program can solve a system of a million equations with a million unknowns just about as fast as two equations with two unknowns.

The one tricky part to linear regression is that there must be at least as many equations as unknowns. If there are fewer equations than unknowns, then a unique solution can never be determined. If there are at least as many equations as unknowns, then there will always be one unique solution. The solution may not necessarily be a straight line. It may be a curve. The curve will always be a quadratic (a parabola) if there are up to three more equations than unknowns

The set of six equations with three unknowns shown immediately above produces this matrix equation:

Left multiply both sides of the matrix equation above to get the normal equation for solving the matrix. Don't worry abut having to do any of this matrix math. Just feed the raw two column set of data points into the ColdFusion linear regression component, method="FindCoefficients", and it will do all the matrix arithmetic for you and return the three coefficients. The normal equations for solving the matrix equation above are:

As the final step, Gauss-Jordan elimination is used to find the three unknowns in this system of equations:

Therefore, the solution to the problem of population growth just happens to be a a straight line, with a slope of 0.08 and a y-intercept of 4.5, and no curve to it at all:

This linear regression predicts that world population in 2015 will be 7.3 billion. The United States Census Bureau reports that the world popuation at this moment is 7,236,953,554, but it is only April 14 of 2015. Stanford University Professor Paul R. Ehrlich was full of nonsense in 1968 when he wrote in his book, The Population Bomb that world population is growing exponentially. The truth is that, it is in fact rising along a straight line with a very small slope.

Practical Application

Once the initial data array has been produced, the following ColdFusion should produce the three coefficients:

The coefficients of the function are: 4.5, 0.08 and 0

The resulting equation is: y = c2x2 + c1x + c0
y = 0x2 + 0.08x + 4.5

The world population for 2015 is: 7.3 billion.