logo4 Evolution is progress—                          
progress is creativity.        
vline

Reply to "Social Scientists and their Struggle with Statistics"

This is actually a reply to a post that claims to have improved statistics. It comes with a bundle of software that will be discussed here. We not only put statistics straight but also offer a deep psychoanalytical analysis if the the thought behind the critique.

Software Implementation in Python

Simulation

The Software is described here. It is written in C and will be analyzed step by step. The copyright notice in the header of the file at least allows us to show the pieces of software before rewriting in a more comfortable python version.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

The declaration of libraries and variables looks a bit different in python. The variables in particular can be created on the fly. The libraries may be included when needed.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

We also don't need this function in python.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

Also this function which helps to sort an array can be omitted.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

The following function initializes the sample array with normally distributed CQs.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

All this looks much more simple in python. For manageability, I choose a sample of 1000 people only. It doesn't matter at all. The result is always the same.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

Just a quick check that this is working properly. Below you can find a simple histogram plot of the sample.

[Error: Macro 'mathplot' error: mathplot() got an unexpected keyword argument 'title']

Now the actual voting takes place the software discussed here employs a GNU random function described here that simply generates a uniformly distributed integer between 0 and RAND_MAX. By the Modulo operator this random number is projected into the interval between the he IC of the actual voter and the topmost CQ participant. This standard method is described here for instance.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

The voting according the author of that software is accomplished in python by the following routine.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

Just a quick check that this is working properly. A simple plot of the sample.

[Error: Macro 'mathplot' error: mathplot() got an unexpected keyword argument 'title']

The similarity of the two histograms and the curves provided by the author is obvious. He gets smother courves of ourse but this is because he uses a larger sample (1,000,000 vs. 1,000) and than he further amplifies the sample by repeatedly selection the winner only. As a rule in statistics the greater the sample the closer the data come to what is expected.

Calculating Expectations

Thus instead of simulation millions of samples the same result can be obtained more elegantly by simply calculation expectations.

[Error: Macro 'mathplot' error: latex was not able to process the following string: b'lp' Here is the full report generated by latex: This is pdfTeX, Version 3.14159265-2.6-1.40.20 (TeX Live 2019/Debian) (preloaded format=latex) restricted \write18 enabled. entering extended mode (/var/www/.cache/matplotlib/tex.cache/265fdf92b077575fff48f2c6d04b2931.tex LaTeX2e <2020-02-02> patch level 2 L3 programming layer <2020-02-14> (/usr/share/texlive/texmf-dist/tex/latex/base/article.cls Document Class: article 2019/12/20 v1.4l Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo)) (/usr/share/texlive/texmf-dist/tex/latex/type1cm/type1cm.sty) ! LaTeX Error: File `type1ec.sty' not found. Type X to quit or to proceed, or enter new name. (Default extension: sty) Enter file name: ! Emergency stop. l.6 \usepackage {type1ec}^^M No pages of output. Transcript written on 265fdf92b077575fff48f2c6d04b2931.log. ]

The critical part of the software is the function that defines the vote. As can be seen in the simple python code there is a slightly deformed shift of the whole distribution to the right.

The software of the author uses a uniform distribution to define the index of the elected candidate, the expectation of a uniform distribution is simply the mean of the two extremes.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

By the authors idea simply the majority of voters would have selected a candidate with CQ of 150 which is almost exactly half way from 100 to 200 the upper CQ limit the author allows to his sample. He would have soon realize that there is something wrong with his software if he allowed an CQ of 500 as in the sample below.

[Error: Macro 'mathplot' error: latex was not able to process the following string: b'lp' Here is the full report generated by latex: This is pdfTeX, Version 3.14159265-2.6-1.40.20 (TeX Live 2019/Debian) (preloaded format=latex) restricted \write18 enabled. entering extended mode (/var/www/.cache/matplotlib/tex.cache/265fdf92b077575fff48f2c6d04b2931.tex LaTeX2e <2020-02-02> patch level 2 L3 programming layer <2020-02-14> (/usr/share/texlive/texmf-dist/tex/latex/base/article.cls Document Class: article 2019/12/20 v1.4l Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo)) (/usr/share/texlive/texmf-dist/tex/latex/type1cm/type1cm.sty) ! LaTeX Error: File `type1ec.sty' not found. Type X to quit or to proceed, or enter new name. (Default extension: sty) Enter file name: ! Emergency stop. l.6 \usepackage {type1ec}^^M No pages of output. Transcript written on 265fdf92b077575fff48f2c6d04b2931.log. ]

The software produces a elected CQ of well above 200 simply by changing the upper limit. It it were so easy...

Critique

The reason for the failure of the software is the assumption of a uniform distribution of selected candidates. According for a CQ 100 voter all CQ values above 100 have the same probability to be chosen. That not true. Not the CQ values but the candidates have the same probability, and only few of them have a CQ of 150 while millions have a mere 100, so it is not likely a CQ-150 candidate but rather a CQ-100 candidate to be chosen.

Consider the picture below.

[Error: Wrong macro arguments: "ElectionSample" for macro 'img' (maybe wrong macro tag syntax?)]

It can be easily proved that the probability of selecting a read ball from the left box is merely 4,7% while in the right box the probability is 80%.

The same is true when voters that posses a higher CQ chose a candidate.

Discussion

As a matter of fact, from the perspective of the majority of voters there is no difference between my model and the one proposed by the author as the majority of voters in blind for the difference beween my model and the model proposed by this author (I mean practically not the plot, of course). The difference only matters for those at the upper end of the CQ spectrum.

Ideologically, this authors model fits even better, and probably it was that cognitive inhibition to accept democracy's flaws that hindered the author to scrutinize his results more critically.

 
   

(c) Mato Nagel, Weißwasser 2004-2024, Disclaimer