## 8. DATA ANALYSIS
The graphical analysis of data, described in chapter 7, is most useful for communicating results in reports and for gaining intuitive understanding of phenomena. However, when accurate results are required, analytic methods are preferred over graphical methods. This chapter will introduce some of these methods. Advanced texts may be consulted for additional details. The purpose of data analysis is to use all of the data to calculate one or more results. This is usually done by averaging large amounts of data. The averaging method must be carefully chosen so that it actually uses all of the data in a consistent way. Some averaging methods which "look good" on superficial analysis, may actually cancel out some of the data, or may be emphasizing the least accurate data. The methods described below avoid such pitfalls and give results which are the best obtainable for the data used. Implicit in all these methods is the assumption that the individual data values have Gaussian distributions. If there is good reason to believe that the distributions are not Gaussian, modified methods are required. The student should not consider data analysis as something which can be "left for later," to be ignored until the laboratory work is finished. Good experimental strategy requires that the experiment be "thought through" even before data is taken, so that the data-taking procedure will produce sufficient data, of adequate quality, and with sufficient range, for the intended method of analysis. Thus concern with the methods of data analysis will permeate the entire experimental process, from experimental design, through data collecting, to the final calculations. Data analysis (which includes error analysis) can show which quantities must be measured most precisely. It can show that some experimental designs are unsuitable for good measurements of some quantities, suggesting a search for better designs. The student who leaves the analysis to be done "later" may spend several hours in lab taking data totally unsuited to calculation of an accurate result. The student who plans a strategy in advance, knowing what must be done to obtain the desired accuracy, will spend lab time more efficiently, and obtain better results.
The simplest curve fitting problem is that of fitting a straight line to a set of data, as illustrated in Fig. 7.4. The problem is to find the slope of the line and its x and y intercepts. Two simpler cases frequently occur. (1) The line may be known to be horizontal and only the y intercept is required. (2) The intercepts may be of no interest and only the slope needs to be calculated. The methods for fitting linear relations are of great importance because nonlinear
problems may often be reduced to linear ones by an appropriate change of variable.
Thus the relation Y = bx
This method gives equal weight to each Q
The summations are from i=1 to n. In an elementary treatment of errors, the weighting factor W
Consider, for example, a case where the data points are equally spaced along the x-axis, with spacing L, and the y values are a, b, c, d, e, f, g, and h. The slope in the first interval is given by (b-a)/L, and the average slope is
But notice that intermediate data points in the numerator cancel out and the equation reduces to
Only the first and last data points contributed to the average. The result is merely the slope
of the line between points a and h. This probably is
The intermediate readings do not cancel out of this equation.
where n is the number of data points y
The reader may see in the formulation of this rule a hint as to why it is intimately connected with the standard deviation as a measure of error. It might seem that the application of the rule to curve fitting would be difficult, if not impossible, for there are an infinity of possible curves to test! But one usually has a good idea in advance whether the best curve should be straight, parabolic, exponential or whatever, so all that remains is to determine its parameters. It is worth remarking that if there n parameters to determine, there must be at least n data points--preferably quite a few more than n to get a better fit. Furthermore the methods of calculus allow the derivation of standard formulae for the parameters. We now state these formula without proof, for the straight line case. Let the data points be (x
the summations being over i from 1 to n. The y intercept is given by
Notice that the denominators are the same in Eq. 8-7 and 8-8. They need only be calculated once. The standard deviations of the slope and the intercept may also be found. The standard deviation of the y intercept is
s
Dy The standard deviation in the slope is given by
the least squares curve fit equations become
(8.1) Write a compact formula for the successive differences method, using the summation symbol, and compare it with the formula for the weighted successive differences method. (8.2) An experiment gives the data: X Y 12 4.5 13 10.0 14 19.0 15 25.5 16 37.0 17 44.0 18 49.0 19 53.0 20 61.5 Find the slope of this straight line by the method of differences. An electronic calculator or computer is very desirable for these lengthy calculations. (8.3) Use the least squares formulae on the data of problem 2 to find all parameters:
slope, y intercept, (8.4) Write and execute a BASIC or FORTRAN or PASCAL computer program to do the calculations of problem 3. Try to make the input routine general enough so you could use the program on any size set of data you might obtain in lab.
Another approach is to transform the relation by an appropriate change of variable, so it is in the form of a linear relation. This, in effect, straightens out the curve. This procedure is often used in graphical curve fitting, by plotting the data on special graph paper with non linear scales, such as log, log-log, polar or other types of graph paper as described in chapter 7. Carrying the graph analogy a bit further, note that if the original curve had error bars, they too will transform when the curve is "straightened out," and this will change the weighting factors. log A = log B + 2q log C This is of the form Y = mx + b if we use the transformation relations y = log A b = log B x = log C m = 2q This can be fitted by equations (8-7) through (8-10) if c is an independent parameter of negligible error, and the error is all in the variable A. But if the standard deviations of the A
The analysis now proceeds as for a straight line fit, and values of m and b are determined. Transforming back by Eqs. (8-17) gives values for B and q. © 1996, 2004 by Donald E. Simanek. |