By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 1.33; (* of zipf.p 2016 January 26}
(* begin module describe.zipf *)
(*
name
zipf: Monte Carlo simulation for Peter Shenkin's problem
synopsis
zipf(zipfp: in, data: out, xyin: out, output: out)
files
zipfp: parameters to control the program
first line: integer, number of correlation coefficients to create
second line: integer, number of symbols for each correlation coefficient.
eg, 20 means amino acids.
third line: character. 't' means use Tom's method, 'p' means use Peter's.
fourth line: character. 'g' means to graph the simplex.
data: a list of correlation coefficients. This is to be input
to the genhis program.
xyin: data for graphing the simplex. The graph is generated with the
xyplo program.
output: messages to the user
description
1992 Jan 13 Returned call to Stephen Altschul 496-2475. He suggested that
Peter Shenkin's results of rank versus log of probability are due to random
effects. This is easy to test with a Monte Carlo simulation:
Tom's method
chose s (eg 20) random numbers
find their sum
divide each number by the sum to produce s random numbers which
sum to 1.
sort the numbers
take the log versus the rank
determine the correlation coefficient
repeat to get distribution of correlation coefficients.
Peter's method
chose s-1 random numbers between 0 and 1
sort the numbers
take the differences to produce 20 numbers that sum to 1
resort the numbers
take the log versus the rank
determine the correlation coefficient
repeat to get distribution of correlation coefficients.
Graph of simplex. The numbers all add to 1 for either method. They are
points in an s dimensional space. The volume they fit into is a hyper plane
of s-1 dimensions since they sum to 1, called a simplex. The distribution
of the points can be visualized by projecting onto a plane and graphing with
the xyplo program. The projection is done by using polar coordinates.
There is a vector P from the center of the simplex to each point to graph.
There is a vector, A, from the center of the simplex to the point where the
first coordinate has value 1 and all others are zero. The magnitude of P is
determined, and the angle between P and A determines an angle. These
numbers are in polar coordinates. They are converted to rectangular
coordinates in the xyin file. If s = 3, then the simplex is a simple plane
reaching between the three points A=(1,0,0), B=(0,1,0) and C=(0,0,1). The
projection takes this equilateral triangle onto the xy plane. In higher
dimensions, the points are collapsed to the xy plane, so high dimensional
effects are expected. This means that the center should tend to become
empty, and the distribution will become spherical.
examples
zipfp file:
***********************************************************
10000 10000 1000 Number of correlation coefficients to print out
3 16 Number of symbols being simulated
p t= tom's, else peter's
g g = graph the symplex, otherwise not
zipfp: parameters to control the zipf program.
***********************************************************
genhisp file for use with genhis
***********************************************************
x n 50
r -1 -0.5
***********************************************************
xyplop file for use with xyplo
***********************************************************
2 2 zerox zeroy graph coordinate center
x -1 1 zx 0 25 zx min max (character, real, real) if zx='x' then set xaxis
y -1 1 zy 0 250 zy min max (character, real, real) if zy='y' then set yaxis
10 10 xinterval yinterval number of intervals on axes to plot
6 6 xwidth ywidth width of numbers in characters
1 1 xdecimal ydecimal number of decimal places
16.5 22.0 xsize ysize size of axes in cm
x
y
c zc 'c' crosshairs, axXyYnN
n 2 zxl base if zxl='l' then make x axis log to the given base
n 2 zyl base if zyl='l' then make y axis log to the given base
*********************************************************************
1 2 xcolumn ycolumn columns of xyin that determine plot location
0 symbol column the xyin column to read symbols from
0 0 xscolumn yscolumn columns of xyin that determine the symbol size
0 0 0 hue saturation brightness columns for color manipulation
*********************************************************************
p symbol-to-plot c(circle)bd(dotted box)x+Ifgpr(rectangle)
0 symbol-flag character in xyin that indicates that this symbol
0.05 symbol sizex side in inches on the x axis of the symbol.
0.05 symbol sizey as for the x axis, get size from yscolumn
nl 0.05 no connection (example for connection is c- 0.05 for dashed 0.05 inch)
n 0.05 linetype size linetype l.-in and size of dashes or dots
*********************************************************************
.
*********************************************************************
***********************************************************
documentation
see also
genhis.p, xyplo.p
author
Thomas Dana Schneider
bugs
The revision 2016 Jan 26 replaced the non-standard random(0) with a
standard proceedure. This will always give the same results. For
actual use, add parameters and the timeseed function to base the
initial seed on the date and time.
technical notes
This was replaced by a portable one, but with the danger
of it not giving good results.
*)
(* end module describe.zipf *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}