LeoStatistic
software for data presentation, statistical analysis, marketing and prediction.

Free download:
LeoStatistic.zip or
LeoStatistic.exe
(selfextracting winzip file)


Registration

  • Introduction
  • Data
  • Statistics
  • Results presentation
  • Samples
  • Popular statistics and data analysis
  • Statistics.

    Statistics is the most fundamental scientific discipline. Not philosophy but statistics is the very base of  the scientific method. It's not intuitively obvious that there is literally direct connection of statistics and natural selection and developing of instinct responses for any living subject but it is indeed. The everyday life of the average person does not offer too many examples of statistical method, save for gambling. That is the result of a highly organized and multifaceted modern life that we live that contains an infinite amount of factors that all mesh together seamlessly. Initially behind any statement you can find reasoning that is backed up by statistical analysis.

    Sometimes this statistical analysis is wrong in sense of its interpretation as it was for example for ancient statement that force is the reason for movement with constant rate. Statistically this idea was very broadly supported by undeniable connection between necessity to push cart to make it moving. Millions observations summons into the wrong physical law but right statistical statement. Other observations like continuous speed of the ahead movement of stone thrown by hand are contradicted to the  universality of necessity to push any object to keep it continuously moving. But it took a genius of Newton to summarize all known facts into three simple formulated laws.

    The role of the statistics is to supply researcher with initial compressing of numerous observations into relative short statement.

    Let's consider how statistics can be used to analyze data of radar measured velocities of cars. What we at first should to do is to built a histogram of distribution measured velocities. Then if the shape of histogram will happen to be a bell like, average value of the rate and its standard deviation can be calculated:

    (1)

    (2)

    where n is a number of individual measurements. This two values already are very useful to make educated predictions about with what rate the car could hit innocent or nor too much such pedestrian if he or she will be really stupid to run across the street here.

    Analyzing data more in more details one can notice that depend on time of day there are some systematic variations of average rate of cars. At the morning at rush hours it could be less then in other periods. One can to try to fit this data with a curve on the chart with coordinates rate - time with the help of some periodic formula. Other and more reasonable approach could be creation additional parameter - frequency of measurements of different cars - value reflected intensity of traffic and try to fit curve between these two parameters that most possible can be done with much simple and not periodic formula. By doing such analysis we can significantly increase predictability rate of car at any given moment of day.

    By it is not only one our option to improve such analysis. We can have recorded besides rates of cars and time of the measurement also weather conditions, color of the car, their plate number and by this way associated with any given car full information about its technical specifications, we can also from picture of the driver's his or her race, age, gender and education data. So we can for one primary interesting us value - rate of the car collect associated with it data of many, dozens of arguments - other parameters of the event. Quite possible that same of them like color of the car will produce no statistically significant influence on the rate of car or will, who knows, other like intensity of precipitations - will and real strong affect. Statistics will help, with the matrix of correlations for example, to establish mutual influence of all parameters. We can also built some model as in form of single  mathematical formula included itself all arguments or with some algorithm create a model that will produce a most possible value depend on numerical values of all measured parameters.

    Important to note that although there is no universal statistical method magically applicable to any set of data.

    LeoStatistic implement most useful statistical methods for data analysis and modeling. Here these methods will be shortly described in general terms leaving specific "how to do it" to other pages.

    Distribution of one variable.

    One can divide domain of the variable on smaller spots and calculate how many cases will fall into different smaller subintervals, so named bins. Then draw the rectangle based on width of the bin and with height equal number of the cases. Such picture named a histogram is represent a probability distribution for the variable to be found in any given interval. There are numerous theoretical representations of such distributions including two most popular  T-probability (Student) and Poisson these can be calculated and displayed on the screen along with its correlation coefficients the histogram. The value of correlation for the perfect fit is 1.0 and decrease with the mismatch between the theoretical curve and histogram.

    The conditional distribution of some variable can be built too. To do it just built a histogram only for the part of records these are matched other conditions for other parameters.  This option is specially fruitful to reveal a detailed, even tiny influence of one variable on the other what is specially applicable for marketing to discover non functional dependencies.

    Approximation.
    (
    Curve and surface fitting, multivariate, nonlinear regression)

    Let presume that experimental results could be described with the formula:

    y = f(x1, x2...xn)

    (3)

    where y correspond to the value data series and x1, x2 ... xn of its arguments . If you know the exact mathematical expression for the f(x1, x2 ...xn), one could calculate with this formula theoretical values and compare these with experimental results.

    During experimental research and data modeling quite often one meets the situation when structure of formula for description of data is known from basic principles and the task is to find coefficients these are best to fit the data for the particular experiment. Standard method for calculating of coefficients is the least squares method. For this method a fitting is based on the criteria of minimization of a sum of squares of deviations between calculated and experimental values:

    Min(dev(a0,a1,a2,...an)) = Σ((yt(a0, a1, a2, , x1, x2 ... xn) - ye)2

    (4)

    The task is to find a collection of coefficients a0,a1,a2,...an when function (4) has minimum value. In the general case for any given form of the approximated formula there is no analytical solution to find best collection of fitting coefficients. In the LeoStatistic is one can use one of the numerous algorithms for the numerical approximation of free format formula.

    For special situation when fitting equation has quasipolynomial structure:

    F(y) = a0 + a1*f1(x1,x2 ...xn) + a1*f2(x1, x2 ...xn)+ ... + ... + an*fn(x1, x2 ...xn)

    (5)

    there is an analytical solution to find coefficients a0, a1, a2 ... an that corresponds to the best fitting with experimental data in sense of least squares deviation.

    One can calculate also standard deviations of found coefficients σ0, σ1, σ2 ...σn, their coefficients of variations σ0/a0, σ1/a1, σ2/a2 ... σn/an and the correlation coefficient that characterized a math at all:

    Cr.Cf = 1 - (1/n)*Σ(((F(ye)-F(yt))/ σy)2

    (6)
    where σy - standard deviations of F(y) variable itself. The correlation coefficient characterizes quality of fitting in general and equal 1 for perfect fitting. Important to be aware that near 1 value of correlation coefficient by itself is not guarantee an comprehensiveness of the approximation formula. A least two other criteria are important to be taken into consideration to make conclusion about the best formula: - the physical sense of the formula; - the small values of coefficients of variations of found coefficients. If same of the coefficient of variations of coefficients σi/ai have unreasonable large value (> 1. is usually have to attract attention).

     The best known style of function (2) is polynomial equation:

    Y(x) = a0 + a1*x + a2*x2+ a3*x3 + ... + an*xn

    (7)

    Nevertheless the possible forms for them available for the analytical solution are much more broad and same of them (including simple polynomial) are implemented in the scope of LeoStatistic. In the spite that the universal method for the approximation of any imaginable set of data is not exist LeoStatistic gives a variety of several different schemes with corresponding specific user interfaces for each of them these could cover most of common situations.

    In case when arguments x1, x2 ... xn are independent parameters we can talk about multivariate regression and LeoStatistic implement linear and parabolic presentation of the fitting formula.

    Near neighbors method.

    This method is based on the presumption that we have no advance knowledge about mutual dependence between variables. One can assume that for non-sporadic data the closer a point in multidimensional space is located to other points the more reasons to suggest that their value will be approximately the same. An other approach to described this method is to say that estimate value of the point in n-dimensional space is to say that it is most possible value is an weighted average of values for most closest points around. The formula for calculation looks like this:

    yp = Σw(x0p...xnp, x0i...xni)*yi/ Σw(x0p...xnp, x0i...xni) (8)

    where - w(x0p...xnp, x0i...xni) is a weight coefficient for incorporated i-th point with coordinates x0i...xni in calculation of the value for probe point with coordinates x0p...xnp.

    LeoStatistic software application implements following schemes to calculate w(x0p...xnp, x0i...xni):

    A distance, dpi, in n-dimensional space between probe and i-th points is calculated by formula:

    dpi = (Σ(xlp - xli)2)1/2 (12)

    where - summing is done by all n arguments.

    Scoring.

    For the task of optimizing a marketing campaign the common problem is to range all potential clients by expected response toward the direct advertisement call. As soon mailing costs money, to get more responses per advertising dollar has all the sense. Common approach for this is the constructing an algorithm that will calculate and assign a score for all clients that could for example be normalized from 0 to 100% representing probability to have positive response.

    LeoStatistic has tools to solve this problem like building a conditional distributions as well a directly creating a scoring algorithm.

    Screenshots of the LeoStatistic software:
    click on picture to enlarge

    Building histograms
    Building histogram

    Distribution of two variables.
    Distribution of two variables.

    Approximation (constructor style interface).
    Approximation
    (constructor style interface).

    3D view.
    3D view.

    DOW trend.
    DOW trend.

    Signals revealing
    Signals revealing.

    Near neighbors method
    Near neighbors method.

    Harmonic analysis.
    Harmonic analysis.

    Fit with free format formula.
    Fit with free format formula.

    Curve fit of crystal growth rate.
    Curve fit of crystal growth rate.

    Get data from image file.
    Get data from image file.

    Data analysis  Crystal growth simulation  Internet robot  Photoshop and image analyzer  NetCDF editor  Calculator
    Software archive  Expert database  Photo album  Maverick thoughts  Open forum  Search for cheap sale 
    Home  Products  Partners  Service  Contact
    Copyright by LeoKrut