LeoStatistic
software for data presentation, statistical analysis, marketing and prediction.

LeoStatistic.zip or
LeoStatistic.exe
(selfextracting winzip file)

Registration

• Introduction
• Data
• Statistics
• Results presentation
• Samples
• Popular statistics and data analysis
• Statistics.

Statistics is the most fundamental scientific discipline. Not philosophy but statistics is the very base of  the scientific method. It's not intuitively obvious that there is literally direct connection of statistics and natural selection and developing of instinct responses for any living subject but it is indeed. The everyday life of the average person does not offer too many examples of statistical method, save for gambling. That is the result of a highly organized and multifaceted modern life that we live that contains an infinite amount of factors that all mesh together seamlessly. Initially behind any statement you can find reasoning that is backed up by statistical analysis.

Sometimes this statistical analysis is wrong in sense of its interpretation as it was for example for ancient statement that force is the reason for movement with constant rate. Statistically this idea was very broadly supported by undeniable connection between necessity to push cart to make it moving. Millions observations summons into the wrong physical law but right statistical statement. Other observations like continuous speed of the ahead movement of stone thrown by hand are contradicted to the  universality of necessity to push any object to keep it continuously moving. But it took a genius of Newton to summarize all known facts into three simple formulated laws.

The role of the statistics is to supply researcher with initial compressing of numerous observations into relative short statement.

Let's consider how statistics can be used to analyze data of radar measured velocities of cars. What we at first should to do is to built a histogram of distribution measured velocities. Then if the shape of histogram will happen to be a bell like, average value of the rate and its standard deviation can be calculated:

 (1)(2)

where n is a number of individual measurements. This two values already are very useful to make educated predictions about with what rate the car could hit innocent or nor too much such pedestrian if he or she will be really stupid to run across the street here.

By it is not only one our option to improve such analysis. We can have recorded besides rates of cars and time of the measurement also weather conditions, color of the car, their plate number and by this way associated with any given car full information about its technical specifications, we can also from picture of the driver's his or her race, age, gender and education data. So we can for one primary interesting us value - rate of the car collect associated with it data of many, dozens of arguments - other parameters of the event. Quite possible that same of them like color of the car will produce no statistically significant influence on the rate of car or will, who knows, other like intensity of precipitations - will and real strong affect. Statistics will help, with the matrix of correlations for example, to establish mutual influence of all parameters. We can also built some model as in form of single  mathematical formula included itself all arguments or with some algorithm create a model that will produce a most possible value depend on numerical values of all measured parameters.

Important to note that although there is no universal statistical method magically applicable to any set of data.

LeoStatistic implement most useful statistical methods for data analysis and modeling. Here these methods will be shortly described in general terms leaving specific "how to do it" to other pages.

Distribution of one variable.

One can divide domain of the variable on smaller spots and calculate how many cases will fall into different smaller subintervals, so named bins. Then draw the rectangle based on width of the bin and with height equal number of the cases. Such picture named a histogram is represent a probability distribution for the variable to be found in any given interval. There are numerous theoretical representations of such distributions including two most popular  T-probability (Student) and Poisson these can be calculated and displayed on the screen along with its correlation coefficients the histogram. The value of correlation for the perfect fit is 1.0 and decrease with the mismatch between the theoretical curve and histogram.

The conditional distribution of some variable can be built too. To do it just built a histogram only for the part of records these are matched other conditions for other parameters.  This option is specially fruitful to reveal a detailed, even tiny influence of one variable on the other what is specially applicable for marketing to discover non functional dependencies.

Let presume that experimental results could be described with the formula:

 y = f(x1, x2...xn) (3)

During experimental research and data modeling quite often one meets the situation when structure of formula for description of data is known from basic principles and the task is to find coefficients these are best to fit the data for the particular experiment. Standard method for calculating of coefficients is the least squares method. For this method a fitting is based on the criteria of minimization of a sum of squares of deviations between calculated and experimental values:

 Min(dev(a0,a1,a2,...an)) = Σ((yt(a0, a1, a2, , x1, x2 ... xn) - ye)2 (4)

The task is to find a collection of coefficients a0,a1,a2,...an when function (4) has minimum value. In the general case for any given form of the approximated formula there is no analytical solution to find best collection of fitting coefficients. In the LeoStatistic is one can use one of the numerous algorithms for the numerical approximation of free format formula.

For special situation when fitting equation has quasipolynomial structure:

 F(y) = a0 + a1*f1(x1,x2 ...xn) + a1*f2(x1, x2 ...xn)+ ... + ... + an*fn(x1, x2 ...xn) (5)

there is an analytical solution to find coefficients a0, a1, a2 ... an that corresponds to the best fitting with experimental data in sense of least squares deviation.

One can calculate also standard deviations of found coefficients σ0, σ1, σ2 ...σn, their coefficients of variations σ0/a0, σ1/a1, σ2/a2 ... σn/an and the correlation coefficient that characterized a math at all:

The best known style of function (2) is polynomial equation:

 Y(x) = a0 + a1*x + a2*x2+ a3*x3 + ... + an*xn (7)

In case when arguments x1, x2 ... xn are independent parameters we can talk about multivariate regression and LeoStatistic implement linear and parabolic presentation of the fitting formula.

Near neighbors method.

This method is based on the presumption that we have no advance knowledge about mutual dependence between variables. One can assume that for non-sporadic data the closer a point in multidimensional space is located to other points the more reasons to suggest that their value will be approximately the same. An other approach to described this method is to say that estimate value of the point in n-dimensional space is to say that it is most possible value is an weighted average of values for most closest points around. The formula for calculation looks like this:

 yp = Σw(x0p...xnp, x0i...xni)*yi/ Σw(x0p...xnp, x0i...xni) (8)

LeoStatistic software application implements following schemes to calculate w(x0p...xnp, x0i...xni):

A distance, dpi, in n-dimensional space between probe and i-th points is calculated by formula:

 dpi = (Σ(xlp - xli)2)1/2 (12)

Scoring.

For the task of optimizing a marketing campaign the common problem is to range all potential clients by expected response toward the direct advertisement call. As soon mailing costs money, to get more responses per advertising dollar has all the sense. Common approach for this is the constructing an algorithm that will calculate and assign a score for all clients that could for example be normalized from 0 to 100% representing probability to have positive response.

LeoStatistic has tools to solve this problem like building a conditional distributions as well a directly creating a scoring algorithm.

Screenshots of the LeoStatistic software:
click on picture to enlarge

Building histogram

Distribution of two variables.

Approximation
(constructor style interface).

3D view.

DOW trend.

Signals revealing.

Near neighbors method.

Harmonic analysis.

Fit with free format formula.

Curve fit of crystal growth rate.

Get data from image file.

 Data analysis  Crystal growth simulation  Internet robot  Photoshop and image analyzer  NetCDF editor  Calculator Software archive  Expert database  Photo album  Maverick thoughts  Open forum  Search for cheap sale Home  Products  Partners  Service  Contact Copyright © by LeoKrut