Saturday, December 13, 2014

Systematic Error

The problem that we encountered fitting the distribution for the standard deviation is a  systematic error due to the fact that the Theory of Errors does not work out exactly because the estimates of the moments are off. Bessel's correction is needed for the standard deviation and we saw that a second correction was needed for the standard deviation's own distribution function. One can plot the residuals for the original plot, the corrected plot and plot with the standard deviation calculated using its known mean.

The original plot shows evidence of a shift of the standard deviation distribution to left while the correction has greater standard deviations towards the sides and lower standard deviations near the peak as already mentioned. The plot using the known mean of the standard deviation shows a non-uniform distribution with greater fluctuations at the center of the plot. The probability distribution is responsible for reducing the counts on the sides.

Wednesday, December 10, 2014

Error Distribution for the Standard Deviation 2

The two extra degrees of freedom in the standard deviation discussed in the last blog appear to be due to the use of the average of the xk in the datasets for the computation of the observed values. If one uses the known value, μx, instead the observed and calculated curves fit quite well with n as the number of degrees of freedom. The observed values shift to the left when the averages are used to estimate the distribution mean.

Monday, December 8, 2014

The Error Distribution for the Standard Deviation

One can also look more closely at the error distribution for the standard deviation by generating m datasets of n random numbers for a known probability distribution and compute a standard deviation for each dataset.

The computed distribution assuming the standard deviation of the sum is reduced by the square root of n gives a calculated distribution that is slightly offset from the observed distribution.

Using Bessel's correction and assuming two additional degrees of freedom gives a better fit to the observed distribution. The observed distribution appears to have a slightly different shape with a small reduction in the peak value and slightly larger values at the sides sides indicating a broadening of the peak. The sum of the probabilities for histogram intervals in both cases is 1 as expected. One has to use a large number of datasets to notice the differences.

Bessel's Correction

Student published a paper on The Probable Error of a Mean in 1908 in the journal Biometrika. In it he discusses the definition of mean and standard deviation and the distribution of their errors. It is based on Airy's Theory of Errors which contains a discussion of the error for the sum of two random numbers. So what has been presented in these blogs makes use of Airy's approach to the problem. The use of n-1 instead of n in the formula for the standard deviation is known as Bessel's correction.

It is necessary to make two assumptions in order to derive formulas for the mean, μ, and standard deviation, σ. Assuming that a set of random numberss has a mean value and a purely random component imposes a constraint on possible values for them. If we take the average data set, xk, it will be equal to μ plus σ times the average of the z-values. We have two unknowns and the average of the z-values which is subject to some error. The same is true for the mean square of the xk. Both assumptions yield and equation with an unknown sum involving z-values, Eqns (2) and (3) below.

For a normal distribution of errors the expected values of the sum of the z-values and their squares are approximately 0 and n. Making these substitutions we get the usual formulas for the mean and standard deviations. Near the peak the sum of the squares is approximately n-1 instead of n and we get Bessel's correction for the standard deviation.

Using random numbers to check of these two formulas we see that the average for Bessel's correction is closer to chosen value for the standard deviation but its variation is slightly larger.

Friday, December 5, 2014

Least Squares Estimates of the Mean and Standard Deviation and a Sample Calculation

Determining the mean and standard deviation a set of measurements is very difficult to do exactly. One has to make an additional assumption in order to get an estimate of the mean but that adds a little more error to the measurement errors. The assumption used in least squares is that variance V, the sum of the square of the errors, is a minimum. This gives the average as the best estimate of μ. Once we have μ we can then estimate the errors which in turn the variance and standard deviation, σ.

I generated 2000 sets of 20 random numbers to test the formulas used in statistics and the procedure gives the mean, standard deviation and z-values. We also find that the root mean square of the z-values for each set of numbers is exactly 1. Using n-1 in the denominator of the standard deviation formula causes this to deviate slightly. Setting the rms z value equal to 1 is another starting point for the determination of μ and σ.

Each set of numbers will have an mean and standard deviation and there is a little variation among the results for the 2000 data sets but the over-all average is close to the chosen values for μ and σ. The rms variation in the mean is approximately equal to the standard deviation of x divided by the square root of n, the number of values in each data set.

Using the theory of errors from in the last couple of blogs we can calculate the first two moments and standard deviation for the data sets. The σ standard deviation is very close to the μ standard deviation divided by 2 as predicted.

Here is a comparison of the 2000 σ standard deviations with the two estimates of σ.

Wednesday, December 3, 2014

Best Choice for the Expected Error

In the last blog we found two values for the estimate of σ, the expected value, μ1, and the rms error, μ2 = n, and we have to ask which is the best choice. The expected value is the mean that one would get for an estimate of the error but it has some uncertainty, sd, associated with it. If we want to combine these two uncertainties we have to add their squares and the result is the second moment, μ2 = n.

If the sum involves a short string of numbers then we are most likely to random values for z near the peak of the distribution where
zpeak = n-1.

The rms error, s, for the sum is then equal to σ√n-1. We can turn things around to get an estimate of σ for the probability distribution from that of s and we conclude that σ = s/√n-1.

Monday, December 1, 2014

Expected Error For the Sum of a Number of Trials

A scientist like any other witness can be impeached for using poor practices.  So it is important for him to maintain his reputation for accuracy. When one publishes the results of an experiment it is customary to include the error bounds for numerical estimates. Averaging a number of trials gives a better estimate of an observation and involves the sum of a number of individual observations, xk. What error error would one expect for this sum? The formula that one usually uses is that the error for the sum is equal σ√n, where σ is the standard deviation for xk. The following proof is based on the expected values involved. The xk are assumed to be equal to the mean value, μ, plus a random error which can be expressed in terms of z-values.

The square of the difference between the sum and its mean value, nμ, involves a sum of the products of the individual z-values which can be split up into parts where the indices j and k are equal and unequal. The term that is crossed out makes a small contribution to the sum and on average is zero since the two z-values are uncorrelated.

A more sophisticated analysis makes better use of probability theory.  We start by determining the probability distribution for the sum of two and three trials assuming normal distributions for each of the random variables and we need only consider the sum of the z-values, x + y and x + y + z.  The square of the difference between the sum and its mean involves a sum of squares which is also found in the exponential of the joint probability distribution and so we can just consider the distribution for the radial length of a vector whose components are the z-values involved in the sum.

Notice that the area and volume contribute powers of r to the joint probability distribution and for n terms in the sum the contribution of the spatial unit space's differential element will be some constant times rn-1 which can be determined from the condition that the integral of the probability density is 1. Having found the probability function we can determine formulas the 1st and 2nd moments and the standard deviation for r.

To test these formulas we can we can generate two sets of random numbers, create a set of sums and observe the resulting distributions. The formulas give a very good fit for the generated data. The mean value for r (or z below) is slightly less than √n.

For a sum of more numbers we can combine a new set of random numbers with the previous sum. For the sum of three numbers we get:

Continuing the process we get for the sum of 10 numbers:

We got n for the sum of the squares vindicating the previous result. Note that the joint probability functions are similar in shape to the Poisson distribution and may be considered continuous analogs. The standard deviation of the mean values is small and levels off after the sum of about 20 numbers and becomes approximately equal to √2/2.

Supplemental (Dec 2): The definite integral above is evaluated in Reif, Fundamentals of Statistical and Thermal Physics (p. 608). It is also found in Boltzmann, Lectures on Gas Theory (p. 64).

Edit (Dec 3): The differential elements are surface lamina of nD hyperspheres.

Wednesday, November 12, 2014

Using Grid Searches to Optimize a Fit

We were able to estimate the reaction rate constants from functions of three unknown parameters and to improve the fit with a simple search for k0. For a better fit we need to search for all three unknown parameters. We can choose X0, Xeq and k0 as the unknown parameters with X = A - A0 being the amount of product at some particular time. X simplifies the formulas which is the reason why it was chosen. We have estimates for Xeq and k0 determined by the data points and can search the neighborhood of the values for the best fit. So a three dimensional grid search will allow us to improve on our initial estimates. Here is a sample of how one might start out. The first three lines determine the bounds of the search and the next three compute the grid points. X0 was initially assumed to be a small positive number.

I wrote a Mathcad program to do the search for the integers λ, μ and ν which give the lowest value for the variance, V, of A and A'.

Three such searches zoomed in on values very close to the original values.

They give a relative rms error for the fit of about three tenth of one percent. The fit of the "lower left corner" is also much better.

An error propagation formula will help us determine how errors in the grid search parameters will affect the final results. We can estimate how two independent variables, x and y, affect a third dependent variable, z, starting with the chain rule. If dx and dy represent small errors then the chain rule gives us a good estimate of dz. The formula can be modified to give the rms error of dz but we need to use an inequality to express it in terms of the rms errors of dx and dy. One can choose the right hand side of the last equation to represent a bound for the rms error of dz. One usually sees this propagation of error formula written as an equality.

The formula for the propagation of error can be used to give the expected relative errors of the computed parameters for the fit based on the relative errors of the fit variables. The matrix Λ was found by applying the propagation of error formula to each of the computed variables and evaluating it using estimated values. The differences between the fit values and original values can be used to get approximations for the relative errors of the fit variables. The bottom left shows the expected bounds and can be thought of as standard deviations. The bottom right shows the z-scores for the the actual relative errors.

The grid search we used isn't optimal but it allows us to improve on the original estimate which a plot showed to be slightly off. The final result is what can be expected from a grid search.

Supplemental (Nov 12): The program p_min(V) puts the rows of a table containing the pointer p and corresponding V values in the order of the V values. The resulting location for the minimum value of V is the first p value on the new table.

Saturday, November 8, 2014

Fit For When Two Reactants are Consumed

The fit for problem worked in the blog Increasing the Rate of Dissociation by Adding Acid is similar to that in the last blog. It's what one would have to do for the case where the two reactants are consumed in the reaction. There was a minor complication with the complex κ.

The estimate value for κ turned out to be close to the complex conjugate of the original κ. For z = x + i y the complex conjugate is z* = x - i y. It turns out that tanh(κ*) = tanh(κ) when the result is real. There is another formula for the tanh of a complex number involving ordinary and hyperbolic sines and cosines that helps explain the situation. Either the cosine or the sine of the imaginary part in the result has to be equal to zero.

Some data sets are more difficult to fit than others. One has to play with the numbers of terms in the empirical fit for best results. The empirical fit for the initial rate of change might be good but that doesn't guarantee that the fit for the formula will be good. The rounded corner at the lower left of the curve below seems to be hardest to get right. The procedure is sensitive to random errors in the first few data points so some method of adjusting errors might prove useful.

Supplemental (Nov 8): A weakness of the fit procedure is that an error in k'0 propagates through as one calculates all the other fit constants so if one searches the neighborhood of k'0 for the value which gives a minimum for the variance of the difference between A' and A then one will get consistently good fits to the data points. The resulting fit is a least squares fit of the data.

Friday, November 7, 2014

Fitting a Simple Catalyzed Reaction

In a catalyzed reaction the catalyst is not used up in the reaction. With the dissociation of sucrose in an acid solution water takes part in the reaction but the quantity of acid and its ions remain constant. The quantity of hydrogen ions in solution is maintained by the dissociation of water. We can take this into account and it simplifies the equations for the constants in the formula for the amount of sucrose as a function of time. We proceed as we did before determining the coefficients for the quadratic equation for the reaction rate but let B remain unchanged. Integrating the reaction rate equation again produces a formula with a hyperbolic tangent function but the constants in it are slightly different due to changes in relative magnitude. We can assume some initial conditions for the sake of argument.

We then generate some random data to work with and allow sufficient time for equilibrium to be reached.

If one checks the initial reaction rate one gets dA/dt = −k0B0A which agrees with the formula that Wilhelmy used so the initial solution is approximately an exponential decay function. To make the change easier to fit we can use the natural log of A instead of A and fit a polynomial equation. F(t) below is a vector function containing the powers of t and is needed by linfit to find the coefficients of the polynomial. We can estimate the initial value for A by averaging the values of A divided by powers of e with x values corresponding to the data points. A check of the standard deviation shows that the rms error is about one unit.

Once we have the fit we can draw a smooth curve through the data points using the empirical power law.

By taking the derivative of the fit polynomial at t = 0 we get the initial rate constant and this gives us an estimate of the first rate constant, k0, on division by B0. To find the second rate constant, k1, we need an estimate of the equilibrium value for A and the equilibrium condition k0B0Aeq = k1(A0 - Aeq)2 which states that the amount of product that is produced is canceled by the reverse reaction.

With the estimates for the rate constants we can now estimate the constants needed to fit the formula to the data. Using an auxiliary constant γ makes it easier to determine an estimate for κ. The fit agrees fairly well with the initial assumptions.

Tuesday, November 4, 2014

Increasing the Rate of Dissociation by Adding Acid

Sucrose is a disaccharide produced by the hydrolysis of glucose and fructose molecules in which an hydroxyl group (-OH) in each sugar molecule join producing an oxygen bridge between them by releasing an H2O water molecule. One can see the oxygen atom linking the hexagonal ring (glucose) to the pentagonal ring (fructose) in the .gif below.

animation credit: Wikipedia article Sucrose

Dissociating a disaccharide molecule requires a hydration reaction to replace the missing atoms of the simple sugars. There are always hydrogen and hydroxyl radicals normally present in a sugar solution which can react with the sucrose molecules but adding an acid as Wilhelmy did acts as a catalyst to increase the reaction rate.

To study the effect of adding the acid to the solution we need to modify the equation for the reaction and deduce the corresponding reaction rate equation. In the reaction below we can let A correspond to the sucrose molecule, B to the hydrogen molecules present in the solution resulting from the addition of acid and C and D to the glucose and fructose products. We can also let X represent the progress of the reaction in order to determine the amount of sucrose and other molecules present at any given time. The acid molecules easily dissociate so the amount of hydrogen present is essentially equal to the concentration of acid. Plugging the modified amounts into the reaction rate formula again reduces it to a first order differential equation involving the sucrose concentration [A].

The reaction again follows a binomial rate law and we can simplify it by "completing the square" and rewriting the equation using the auxiliary parameters u and v.

The result is a similar formula involving the hyperbolic tangent function but here λ is replaced by βc so the net effect is that the reaction rate constants are multiplied by the factor β increasing the reaction rate considerably. We can check our math by working an example. From the positive polynomial coefficients a, b and c in the rate equation we can compute the constant parameters α and β and κ. Here κ is complex but that isn't a problem since we can use tan(iθ) = i tanh(θ) with a complex θ. The complex κ is easily handled by Mathcad. The complex value is the results from ([A]0 + α)/β being greater than 1.

The results from an iterative evaluation of the reaction rate formula and those from the derived formula produce identical curves for the reaction, justifying the use of the derived formula. Setting the reaction rate polynomial to zero allows us to compute the equilibrium sucrose concentration Aeq which agrees with the result of the numerical computation.

Supplemental (Nov 4): Try k0=1×10-4, k1=1.234567901234567×10-6

Suppemental (Nov 5): For the hyperbolic tangent of a complex number one can use the following identity which is similar to the identity for tanh(u+v).

Even though κ is complex, tanh(κ) is real and consequentially, A is real.

Note also that since tanh(κ + βct) > 0, the new formula for A subtracts 1 from the tanh instead of subtracting the tanh from 1.