Thursday, May 24, 2012

Application of Pascal’s Triangular plus Monte Carlo Analysis to Find the Least Squares Fitting for a Limited Area


The purpose of this article is to prove the accuracy of the simulation model made by the combination of Pascal’s Triangular method and Monte Carlo Simulation model included in the article of “Application of Pascal’s Triangular plus Monte Carlo Analysis to Appraise the Wisdom of Crowds” and posted on link: http://emfps.blogspot.com/2012/05/application-of-pascals-triangular-plus_08.html in which I am willing to prove this simulation model as well as works. The most important findings are to find the least squares fitting and linear function for a limited area (XY) and so to obtain the point which has the deviation approximately equal to zero where we have the infinity points on the plane of XY. I think that we can also generate the best numerical solution for 3D Laplace Equation and so we will be able to find the best fitting plane given a set of N points in a 3D space by using of this simulation model.
Nowadays, all fields of engineering and science such as Social science, medicine, finance, strategic management and so on are using the method of the Last Squares Fitting. One of the most important applications of two variables function’s minimizing is to utilize the least squares method for fitting of direct line (y = mx + b).
Assume you have infinity points (x1, y1), (x2, y2),……(xn, yn) that you have found them by your observation or survey questionnaire in which all points are limited into the area of x = a and y = b and “n” reach to infinity. There are two amounts of “y” for each observed amount of “x” (xobs) which are the observed amount of “y” (yobs) and the amount of “y” that it is predicted by linear function of y = mxobs + b where we have:

Predicted amount – observed amount = (mxobs + b) - yobs

We call this amount a deviation (d) in which there are the infinity deviations of “d1, d2,…. dn”  as follows:

Set (A) = [(d1= (mx1 + b) – y1), (d2= (mx2 + b) – y2)……… (dn= (mxn + b) – yn)] and “n” reach to infinity.

We can say the line will be the perfect fitting if and only if all deviations will be equal to zero. Of course, it is impossible. Therefore, we will find the best fitting line by minimizing below formula:

SUM (d^2) = d1^2 + d2^2+…..+ dn^2         (1) 

Or  


SUM (d^2) = [(mx1 + b) – y1]^2 + [(mx2 + b) – y2]^2+…+ (mxn + b) – yn]^2    (2)


Methodology

The methodology is the same with my previous article.

Note: “Since I want to prove my simulation model in this article, I do not explain more about it.  Therefore, please read carefully my article of “Application of Pascal’s Triangular plus Monte Carlo Analysis to Appraise the Wisdom of Crowds” and posted on link: http://emfps.blogspot.com/2012/05/application-of-pascals-triangular-plus_08.html before the continuation of this article.”

Finding and discussion

Let me start my discussion by using an example as follows:
Assume we want to get the best fitting line among infinity points into a rectangular area where: 


x = 0  to  x = 4    and     y = 1   to   y = 5

I apply all steps mentioned in my previous article in which I want to compare the final result between 200 trials (n = 200) and 1000 trials (n =1000) as follows:

1) For n = 200, we will have below algorithm to find all outcomes:

n
200
Number of outcomes of "x"
201
Range of  "X"
201
Amount of "X"(min)
0
Amount of "X" (max)
4
max - min
4
delta
0.02
n
200
Number of outcomes of "y"
201
Range of  "Y"
201
Amount of "Y"(min)
1
Amount of "Y" (max)
5
max - min
4
delta
0.02

Therefore, the outcomes for “x” will be started by 0 then 0.02 and will be finished by 4 and so the outcomes for “y” will be started by 1 then 1.02 and will be finished by 5.
After all steps, we will have below Cut – offs (accumulative probability) for “x” and “y” as follows:

Cut Offs
      x    
     0
     0.02
7.72E-45
1.68
  0.01
1.82
    0.1
1.96
    0.4
2.18
    0.9
3.08
      1
4

Cut Offs
           y
0
1.02
7.72E-45
2.68
0.01
2.82
0.1
2.96
0.4
3.18
0.9
4.08
1
   5

For Least Squares Fitting, I used RAND formula accompanied by VLOOKUP for “x” and “y” simultaneously where we will have 201 cells on row filled by them.
Then, I applied below algorithm to calculate “SUM (d^2):

(xi)^2
SUM (xi)
SUM(xi)^2
SUM (yi)
xi * yi
SUM(xi*yi)
m
b
d0
d0^2
SUM(d^2)

Where I calculated “m” and “b” by using of below formulas:

[(SUM(xi)^2) *m ] + [(SUM (xi)) *b = SUM(xi*yi)

[(SUM (xi)) * m] + (n*b) = SUM (yi)

To calculate “d0”, I utilized from below formula:

d0 = ((m*x0) + b) – y0

The final results of two – ways sensitivity analysis were as follows:

    b
-
3.167558
Ave
3.166511
STDEV
0.159445
AVE
0.158023
CV
0.049904

     m
  -
0.002501
AVE
0.000201
STDEV
0.073112
AVE
0.07178


  SUM (d^2)
-
  22.8858892
AVE
22.88723508
STDEV
1.721733542
AVE
1.665178887
CV
0.072755791

Finally, I obtained below point which generated the least SUM (d^2) among all points:

“x” for min[SUM (d^2)] = 1.96
“y” for min[SUM (d^2)] = 3.18

2) For n = 1000, we will have below algorithm to find all outcomes:

n
1000
Number of outcomes of "X"
1001
Range of  "X"
1001
Amount of "X"(min)
0
Amount of "X" (max)
4
max - min
4
delta
0.004
n
1000
Number of outcomes of "Y"
1001
Range of  "Y"
1001
Amount of "Y"(min)
1
Amount of "Y" (max)
5
max - min
4
delta
0.004

Therefore, the outcomes for “x” will be started by 0 then 0.004 and will be finished by 4 and so the outcomes for “y” will be started by 1 then 1.004 and will be finished by 5.
After all steps, we will have below Cut – offs (accumulative probability) for “x” and “y” as follows:

Cut Offs
           x
0
0.004
6.6E-219
1.852
0.01
1.92
0.1
1.984
0.4
2.08
0.9
2.496
1
      1

Cut Offs
          y
0
1.004
6.6E-219
2.852
0.01
2.92
0.1
2.984
0.4
3.08
0.9
3.496
1
     5

For Least Squares Fitting, I used RAND formula accompanied by VLOOKUP for “x” and “y” simultaneously where we will have 1001 cells on row filled by them.
Then, I applied the same algorithm above mentioned to calculate “SUM (d^2) where the final results were as follows:

        b
   -
3.074636
AVE
3.076022
STDEV
0.06411
AVE
0.066586

       m
   -
0.002325
AVE
0.000143
STDEV
0.031485
AVE
0.03185



SUM(d^2)
ave
21.87308
AVE
21.74568
STDEV
3.527124
AVE
3.45824
CV
0.159031

Finally, I obtained below point which generated the least SUM (d^2) among all points:

“x” for min[SUM (d^2)] = 2.08
“y” for min[SUM (d^2)] = 3.08

How can we prove above results?

There are two ways to prove above results and finally to be proved this simulation model as follows:

      A) First way: Referring to formula (2), we have:

SUM (d^2) = [(mx1 + b) – y1]^2 + [(mx2 + b) – y2]^2+……+ (mxn + b) – yn]^2  

Let me consider below function:

f (x, y) = [(mx + b) – y]^2 

Total sum of this curve on below area will be equal double integral f (x,y) as follows:  


x = 0  to  x = 4    and     y = 1   to   y = 5

I made an algorithm on my excel spreadsheet to solve this double integral as follows:

1) For n = 200, we will have:

p1
1
p2
5
q1
0
q2
4
m
0.002501
b
3.167558
Delta (p)
4
Delta (q)
4
(p1)^3
1
(p2)^3
125
(q1)^3
0
(q2)^3
64
(p1)^2
1
(p2)^2
25
(q1)^2
0
(q2)^2
16
m^2
6.25E-06
b^2
10.03342
f1
0.000534
f2
160.5347
f3
165.3333
f4
0.507006
f5
-0.48019
f6
-304.086
SUM(d^2)
21.8099

2) For n = 1000, we will have:
p1
1
p2
5
q1
0
q2
4
m
0.000143
b
3.076022
Delta (p)
4
Delta (q)
4
(p1)^3
1
(p2)^3
125
(q1)^3
0
(q2)^3
64
(p1)^2
1
(p2)^2
25
(q1)^2
0
(q2)^2
16
m^2
2.04E-08
b^2
9.461912
f1
1.74E-06
f2
151.3906
f3
165.3333
f4
0.028133
f5
-0.02744
f6
-295.298
SUM(d^2)
21.4265

By using of try and error around m = 0.000143 and b = 3.076022, we can find the least squares fitting by calculating double integral where we will have:

m = 0
b = 3
p1
1
p2
5
q1
0
q2
4
m
0
b
3
Delta (p)
4
Delta (q)
4
(p1)^3
1
(p2)^3
125
(q1)^3
0
(q2)^3
64
(p1)^2
1
(p2)^2
25
(q1)^2
0
(q2)^2
16
m^2
0
b^2
9
f1
0
f2
144
f3
165.3333
f4
0
f5
0
f6
-288
SUM(d^2)
21.33333

You can see that 1000 trials (n = 1000) is closer to the result of double integral. (Please check red colour)

  B) Second way: Referring to formula (2), we have:

SUM (d^2) = [(mx1 + b) – y1]^2 + [(mx2 + b) – y2]^2+……+ (mxn + b) – yn]^2 
 
Let me consider below function for deviation square of each random point:

f (x, y) = [(mx + b) – y]^2

 If we want to have the least amount of f (x, y), we should simultaneously solve below equations: 



I replaced “m”, “b”, and “x” for min[SUM (d^2)]  and “y” for min[SUM (d^2)]  in each trial into above formulas and I calculated the final result by using of excel spreadsheet as follows:

v  n = 200 trials:



n
200
x
1.96
1.96
y
3.18
3.18
d1
1.59E-07
6.36
d2
0.001275
-0.00079
d3
-0.00128
-6.33302
SUM
-5.3E-06
0.026189

v  n = 1000 trials

n
1000
x
2.08
2.08
y
3.08
3.08
d1
8.5E-08
6.16
d2
0.000879
-0.00059
d3
-0.00088
-6.15204
SUM
-1.1E-06
0.007361


You can see that the final results for n = 1000 trials are very closer to zero.



Note:  “All spreadsheets and calculation notes are available. The people, who are interested in having my spreadsheets of this method as a template for further practice, do not hesitate to ask me by sending an email to: soleimani_gh@hotmail.com or call me on my cellphone: +989109250225. Please be informed these spreadsheets are not free of charge.”