LeoStatistic
software for data presentation, statistical analysis, marketing and prediction.

LeoStatistic.zip or
LeoStatistic.exe
(selfextracting winzip file)

Registration

• Introduction
• Data
• Statistics
• Results presentation
• Samples
• Popular statistics and data analysis
• Multivariate regression.

When fitting function for modeling experimental data have more then one independent argument we can talk about multivariate regression.

LeoStatistic offers linear and parabolic approximation as well near neighbors method for performing multivariate analysis and fitting by user defined formula.

Presuming that status of parameters in data series is set to assign as arguments independent parameters and as value a modeling one. Go to "Statistics" tab of control panel:

By checking up the corresponding control do select modeling method to approximate data with:

Linear multivariate equation:

y = a0 + a1*x1 + a2*x2  +... +an*xn    (1)

where ai - found approximation coefficients, xi - i-th argument. LeoStatistic will also find standard deviations for each found coefficient.

Parabolic multivariate equation:

y = a0 + a1*(x1+b1)2 + a2*(x2+b2)2  +... +an*(xn+bn)2   (2)

where ai and bi- found approximation coefficients, xi - i-th argument. Coefficients bi - are represented positions of extremums for each of the arguments these are equal -bi. It's important to understand that collection of values (-b1, -b2, ... -bn) could be a coordinates of global minimum in n-dimensional space if all ai coefficients have the same positive or negative signs. If signs are mixed it means that there is no global extremum for variable y.

In case of two arguments multivariate approximation with linear equation is an analog of plane fitting in three dimensional space and it will be shown in form of the plane mesh in the result panel. A parabolic approximation also have as analog a surface fitting or with parabola or saddle shape.

A visual presentation of multivariate regression for more then two arguments in LeoStatistic in done in form of of x-y coordinates chart with experimental values along x-axis and theoretical along y-axis. For ideal fitting by theoretical formula all point on the chart have to lie on the 0 - 1 diagonal that is shown on the chat. The large dispersion of the worst is approximation.

Near neighbors estimation.

This method is based on the presumption that we have no advance knowledge about mutual dependence between variables. One can assume that for non-sporadic data the closer a point in multidimensional space is located to other points the more reasons to suggest that their value will be approximately the same. An other approach to described this method is to say that estimate value of the point in n-dimensional space is to say that it is most possible value is an weighted average of values for most closest points around. The formula for calculation looks like this:

 yp = Sw(x0p...xnp, x0i...xni)*yi/ Sw(x0p...xnp, x0i...xni) (3)

LeoStatistic software application implements following schemes to calculate w(x0p...xnp, x0i...xni):
 invert distance: wpi = 1/dpi (4) invert squares of distances wpi = 1/dpi2 (5) invert exponent of distances wpi = e-dpi (6)

A distance, dpi, in n-dimensional space between probe and i-th points is calculated by formula:

 dpi = (S(xlp - xli)2)1/2 (7)

Also the is an option to take into consideration only a exact part of most closest records.

I general words a closest analog  for near neighbors method in everyday life is a estimation of the height of some point from the numerous measurements taken all around. One can presume that it will be average of heights of near by measured points. For plane like landscape is quite reasonable, for mountains too but measured points should be much more dense.

A visual presentation of the result of presentation are the same as for linear and parabolic regression described above.

User defined formula.

By clicking on the button "User defined formula" application will go to the free format interface that is almost identical to the analogous in case of curve fitting with user defined formula of one argument except instead of using "x" as a substitute argument name, in multivariate situation in fitting formula user has to put actual names of arguments. This is adding other natural limitation on the number of data sets that has to be only one. Typical user interface for free format formula in case of multivariate approximation is shown on the image:

User has to input a fitting formula that contains names of arguments and fitting coefficients. Fitting coefficients can be added, edited and deleted.

By clicking on the "Run fitting" button user starts one of the incorporated algorithm for finding best values of coefficients in sense of minimum sum of square deviations calculated by user formula and experimental values of the defined by it parameter. There are following "Fitting schemes" are incorporated in LeoStatistic. All off them are based on the idea that we are looking around some given set of fitting coefficients in n-dimensional space where n is number of fitting coefficients. As soon at the some of the attempts the best fit is found we are taken this point as a vantage and continue the search. Difference between schemes are in how next point to check is chosen:

• Random linear - for each coefficients next value is uniformly randomly chosen in interval around its present Value plus-minus Step. If next value will get out physical borders (Min - Max) established by user it will get the corresponding border value.
• Random Gauss - next try value is chosen around vantage point proportionally normal probability with standard deviation equal current Step
•  Slope local - next point for all coefficients are taken as their current Value algebraic plus current Step. Until next point is happen better fit the procedure is repeat itself. If tried point is not better fit one of the duty coefficient step is reverse sign, shrunken in absolute value and tried again from last best fitting point.
• Leo method - is our proprietary algorithm. The chose of the next tried points are based on the information of the history of search procedure.

Depend on the profile of multidimensional space deviation(C1,C2...Cn) any of the presented schemes could have advantages or disadvantages in finding global minimum in sense of rate of arriving to local minimum and probability to settle down in global valley. Really it is impossible to offer general recommendation but to try all of them with different Fetch setting: Try (fast), Normal, Detail (slow). The difference between this settings is in general words in numbers of unsuccessful attempts to find next best vector of fitting coefficients until to shrink steps and ratio to shrink them (steps).

The procedure to find best fit will stop automatically when absolute values of all Steps will be less then given by user Stop condition or user can brake the process manually.

Screenshots of the LeoStatistic software:
click on picture to enlarge

Building histogram

Distribution of two variables.

Approximation
(constructor style interface).

3D view.

DOW trend.

Signals revealing.

Near neighbors method.

Harmonic analysis.

Fit with free format formula.

Curve fit of crystal growth rate.

Get data from image file.

 Data analysis  Crystal growth simulation  Internet robot  Photoshop and image analyzer  NetCDF editor  Calculator Software archive  Expert database  Photo album  Maverick thoughts  Open forum  Search for cheap sale Home  Products  Partners  Service  Contact Copyright © by LeoKrut