logo4 Evolution is progress—                          
progress is creativity.        
vline

Reply to "Social Scientists and their Struggle with Statistics"

view blog view wiki view wiki view wiki

This is actually a reply to a post that claims to have improved statistics. It comes with a bundle of software that will be discussed here. We not only put statistics straight but also offer a deep psychoanalytical analysis if the the thought behind the critique.

Software Implementation in Python

Simulation

The Software is described here. It is written in C and will be analyzed step by step. The copyright notice in the header of the file at least allows us to show the pieces of software before rewriting in a more comfortable python version.

/*
 * Calculator by method of brute force Monte-Carlo
 *
 * Problem proposed by Mato Nagel (2010) regarding the consequence of
 * a "Dunning / Kruger Effect" on elections in a Democracy. This program
 * by Telford Tendys, Copyright (C) 2012, gives a significantly different result
 * to the conclusion of Mato Nagel, suggesting that one of us is wrong.
 *
 * This program may be re-distributed without modification.
 *
 * See also:
 *
 *          A Mathematical Model of Democratic Elections
 *                           Mato Nagel
 * Center for Nephrology and Metabolic Disorders, A.-Schweitzer-Ring 32,
 *                   D-02943 Weisswasser, Germany
 *
 */

The declaration of libraries and variables looks a bit different in python. The variables in particular can be created on the fly. The libraries may be included when needed.

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <math.h>

#define NUM_PEOPLE     1000000
#define NUM_ELECTIONS  1000000
#define REFIL_CQ       1000

double cq[ NUM_PEOPLE ];
int vote[ NUM_PEOPLE ];

We also don't need this function in python.

/*
 * Just take one value for convenience, inefficient but never mind.
 * I'm very impressed with my quad-core "AMD Phenom(tm) II", especially the price!
 *
 * See also:
 *       http://en.wikipedia.org/wiki/Normal_distribution#Generating_values_from_normal_distribution
 */
double Box_Muller( void )
{
	for(;;)
	{
		double bm;

		double u = random();
		double v = random();
		u /= RAND_MAX;
		v /= RAND_MAX;

		bm = sqrt( -2 * log( u )) * cos( 2 * M_PI * v );
		/*
		 * Note that standard normal has mean of 0 and SD of 1,
		 * but Psychologist normal is mean 100, and SD of 15.
		 * Also for no particular reason, reject the possibility
		 * of anything being below 0 nor greater than 200.
		 */
		bm *= 15;
		bm += 100;
		if( bm >= 0 && bm <= 200 ) { return( bm ); }
	}
}

Also this function which helps to sort an array can be omitted.

int cq_compar( const void *p1, const void *p2 )
{
	register const double *dp1;
	register const double *dp2;

	dp1 = p1;
	dp2 = p2;
	if( *dp1 > *dp2 ) { return( 1 ); }
	if( *dp1 < *dp2 ) { return( 0 ); }
	/* Not expecting any NaN, Inf, etc */
	return( 0 );
}

The following function initializes the sample array with normally distributed CQs.

/*
 * Very simple array fill with normal random numbers.
 * Then sort the array from lowest to highest.
 */
int fill_cq()
{
	int i;

	/*
	 * Put a bunch of normally distributed random numbers together
	 */
	for( i = 0; i < NUM_PEOPLE; i++ )
	{
		cq[ i ] = Box_Muller();
	}

	/*
 	 * Sort the stack (standard library sort function)
	 */
	qsort( cq, NUM_PEOPLE, sizeof( cq[ 0 ]), &cq_compar );
	return( 0 );
}

All this looks much more simple in python. For manageability, I choose a sample of 1000 people only. It doesn't matter at all. The result is always the same.

from pylab import np
from scipy.stats import norm

def getCQ(n):
    r = []
    for i in n:
        r.append(random.normalvariate(100, 15))
    r.sort()
    return r

voters = 1000
X = np.arange(1,voters+1,1)
C = getCQ(X)

Just a quick check that this is working properly. Below you can find a simple histogram plot of the sample.

Histogram of a Normal CQ-distribution

Now the actual voting takes place the software discussed here employs a GNU random function described here that simply generates a uniformly distributed integer between 0 and RAND_MAX. By the Modulo operator this random number is projected into the interval between the he IC of the actual voter and the topmost CQ participant. This standard method is described here for instance.

/*
 * Note that vote is an integer -- we only accept whole votes.
 *
 * Thus, we need many trials to get a suitable spread of winners,
 * based on uniform random voting. We presume the random() function
 * is good to go on any modulus (people are allowed to vote for
 * themselves, and the top guy ALWAYS votes for himself). Actually,
 * I'm not sure GNU's random() can really hold up to this type of
 * usage but my results are so massively different to Nagel's
 * calculation that really the fine details of the random generator
 * don't freak me out.
 */
int fill_vote( void )
{
	int i;
	double x = 0;

	/* Empty out the ballot box */
	bzero( vote, sizeof( vote[ 0 ]) * NUM_PEOPLE );

	for( i = 0; i < ( NUM_PEOPLE - 1 ); i++ )
	{
		int j = random();
		/* Lowest CQ casts completely random vote */
		j %= ( NUM_PEOPLE - i );
		j += i;
		/* Put a ballot against the candidate */
		vote[ j ]++;
	}
	vote[ NUM_PEOPLE - 1 ]++; /* Always votes for self */
	return( 0 );
}

The voting according the author of that software is accomplished in python by the following routine.

import numpy as np
import random

def getVote(n,v):
    r = []
    for i in n:
        index = random.randint(i,voters)-1
        r.append(v[index])
    return r

Just a quick check that this is working properly. A simple plot of the sample.

Histogram of a Shifted CQ-distribution

The similarity of the two histograms and the curves provided by the author is obvious. He gets smother courves of ourse but this is because he uses a larger sample (1,000,000 vs. 1,000) and than he further amplifies the sample by repeatedly selection the winner only. As a rule in statistics the greater the sample the closer the data come to what is expected.

Calculating Expectations

Thus instead of simulation millions of samples the same result can be obtained more elegantly by simply calculation expectations.

no title

The critical part of the software is the function that defines the vote. As can be seen in the simple python code there is a slightly deformed shift of the whole distribution to the right.

The software of the author uses a uniform distribution to define the index of the elected candidate, the expectation of a uniform distribution is simply the mean of the two extremes.

def getVote(n):
    r = []
    for i in n:
        shift = (max_CQ-i)/2
        r.append(getCQ(i-shift))
    return r

By the authors idea simply the majority of voters would have selected a candidate with CQ of 150 which is almost exactly half way from 100 to 200 the upper CQ limit the author allows to his sample. He would have soon realize that there is something wrong with his software if he allowed an CQ of 500 as in the sample below.

no title

The software produces a elected CQ of well above 200 simply by changing the upper limit. It it were so easy...

Critique

The reason for the failure of the software is the assumption of a uniform distribution of selected candidates. According for a CQ 100 voter all CQ values above 100 have the same probability to be chosen. That not true. Not the CQ values but the candidates have the same probability, and only few of them have a CQ of 150 while millions have a mere 100, so it is not likely a CQ-150 candidate but rather a CQ-100 candidate to be chosen.

Consider the picture below.

The selection process depends on the number of candidates at each level.

It can be easily proved that the probability of selecting a read ball from the left box is merely 4,7% while in the right box the probability is 80%.

The same is true when voters that posses a higher CQ chose a candidate.

Discussion

As a matter of fact, from the perspective of the majority of voters there is no difference between my model and the one proposed by the author as the majority of voters in blind for the difference beween my model and the model proposed by this author (I mean practically not the plot, of course). The difference only matters for those at the upper end of the CQ spectrum.

Ideologically, this authors model fits even better, and probably it was that cognitive inhibition to accept democracy's flaws that hindered the author to scrutinize his results more critically.


Tags: Software


Categories: Psychology Sociology Software

 
   

(c) Mato Nagel, Weißwasser 2004-2013, Disclaimer