The law is a trend throughout, not a pass/fail based on if the number of polls is "large enough" - Edit 1
Before modification by Burr at 06/10/2012 03:30:33 PM
The Law of Large Numbers is not dependent on what your brain conceives as a large number.
Whatever the margin of error of each poll and whatever the number of polls total, averaging more than one poll provides, on average, a better information to noise ratio than any one of those polls provided by itself.
This would be true even if we were only averaging a mere two polls, each with a horrible margin of error.
You can do an experiment in Excel if you like. Put =Rand()*100 into cells A1 and cell B1 to produce two random numbers between 0 and 100. The real average is 50, and any difference from either of the two random cells is error.
Now, in cell C1, put "=average(A1, B1)". This is of course the average of the two poll cells. It may or may not have hit closer to 50 in a single instance, but I promise you it will do better than either of the two poll cells, on average. We're going to test my promise.
We can make three more cells in that row (D1, E1, and F1) to measure the error of each of the first three cells we made. You can do this as standard error if you want, but I'll keep things simple by using absolute error. Use these three formulas:
=abs(50-A1)
=abs(50-B1)
=abs(50-C1)
Now copy all six cells that we've made in that row and paste them down their respective columns a gazillion times. Say, from row 1 to row 500,000.
When you've done that, create three more cells below (D500002, E500002, F500002). These cells will measure the average error that each poll and the average of the two polls produced. So put them below their respective "measure of error" cells and give them the following formulas:
=average(D1
500000)
=average(E1:E500000)
=average(F1:F500000)
Lastly, average the two average errors for the two polls with "=average(D500002, E500002)" Put this somewhere close to the average error of the average of the polls (perhaps in G500002) so that you can compare them.
Between the average error of the average of the polls and the average error of the two polls unaveraged -- i.e., between the number in F500002 and the number you just created in G500002, whichever is lowest had the least amount of error. I guarantee you, F500002 will be the lower one.
In fact, averaging the the two polls together reduces average absolute error by a THIRD of the original average absolute error.
Why? Because the Law of Large Numbers is not dependent on what your brain conceives as a large number. It doesn't matter that we are only averaging two polls. On average, two polls are better than one poll. Their individual errors are not equal to the error of the average. It makes a difference, no matter how few or how many polls we average together.
(Furthermore, in terms of efficiency, the law of large numbers applies even better to a small number of polls averaged together than it does to a large number of polls averaged together. The first poll we add to the average will make a bigger improvement than the next poll we add, and so on and so forth.)
Whatever the margin of error of each poll and whatever the number of polls total, averaging more than one poll provides, on average, a better information to noise ratio than any one of those polls provided by itself.
This would be true even if we were only averaging a mere two polls, each with a horrible margin of error.
You can do an experiment in Excel if you like. Put =Rand()*100 into cells A1 and cell B1 to produce two random numbers between 0 and 100. The real average is 50, and any difference from either of the two random cells is error.
Now, in cell C1, put "=average(A1, B1)". This is of course the average of the two poll cells. It may or may not have hit closer to 50 in a single instance, but I promise you it will do better than either of the two poll cells, on average. We're going to test my promise.
We can make three more cells in that row (D1, E1, and F1) to measure the error of each of the first three cells we made. You can do this as standard error if you want, but I'll keep things simple by using absolute error. Use these three formulas:
=abs(50-A1)
=abs(50-B1)
=abs(50-C1)
Now copy all six cells that we've made in that row and paste them down their respective columns a gazillion times. Say, from row 1 to row 500,000.
When you've done that, create three more cells below (D500002, E500002, F500002). These cells will measure the average error that each poll and the average of the two polls produced. So put them below their respective "measure of error" cells and give them the following formulas:
=average(D1

=average(E1:E500000)
=average(F1:F500000)
Lastly, average the two average errors for the two polls with "=average(D500002, E500002)" Put this somewhere close to the average error of the average of the polls (perhaps in G500002) so that you can compare them.
Between the average error of the average of the polls and the average error of the two polls unaveraged -- i.e., between the number in F500002 and the number you just created in G500002, whichever is lowest had the least amount of error. I guarantee you, F500002 will be the lower one.
In fact, averaging the the two polls together reduces average absolute error by a THIRD of the original average absolute error.
Why? Because the Law of Large Numbers is not dependent on what your brain conceives as a large number. It doesn't matter that we are only averaging two polls. On average, two polls are better than one poll. Their individual errors are not equal to the error of the average. It makes a difference, no matter how few or how many polls we average together.
(Furthermore, in terms of efficiency, the law of large numbers applies even better to a small number of polls averaged together than it does to a large number of polls averaged together. The first poll we add to the average will make a bigger improvement than the next poll we add, and so on and so forth.)