Normality in Data ( Chemists and Pharmacists) - The Economist

Breaking

Thursday, 27 April 2017

Normality in Data ( Chemists and Pharmacists)

WHAT IS MEANT BY "NORMAL"?

The Normal distribution model. "Normal" data are data that are drawn from a population that has a normal distribution. This distribution is inarguably the most important and the most frequently used distribution in both the theory and application of statistics.
 The shape of the normal distribution is symmetric and unimodal. It is called the bell-shaped.


WHY NORMALITY??

The normality assumption is one of the most misunderstood in all of the statistics.  In multiple regression, the assumption requiring a normal distribution applies only to the disturbance term, not to the independent variables as is often believed.  Perhaps the confusion about this assumption derives from difficulty understanding what this disturbance term refers to – simply put, it is the random error in the relationship between the independent variables and the dependent variable in a regression model.  Each case in the sample actually has a different random variable which encompasses all the “noise” that accounts for differences in the observed and predicted values produced by a regression equation, and it is the distribution of this disturbance term or noise for all cases in the sample that should be normally distributed.

There are few consequences associated with a violation of the normality assumption, as it does not contribute to bias or inefficiency in regression models.  It is only important for the calculation of p values for significance testing, but this is only a consideration when the sample size is very small.  When the sample size is sufficiently large (>200), the normality assumption is not needed at all as the Central Limit Theorem ensures that the distribution of disturbance term will approximate normality.
When dealing with very small samples, it is important to check for a possible violation of the normality assumption.  This can be accomplished through an inspection of the residuals from the regression model (some programs will perform this automatically while others require that you save the residuals as a new variable and examine them using summary statistics and histograms).  There are several statistics available to examine the normality of variables. including skewness and kurtosis, as well as numerous graphical depictions, such as the normal probability plot.  Unfortunately the statistics to assess it are unstable in small samples, so their results should be interpreted with caution.  When the distribution of the disturbance term is found to deviate from normality, the best solution is to use a more conservative p value (.01 rather than .05) for conducting significance tests and constructing confidence intervals.

WHY NORMALITY FOR MEDICAL RELATED DATA?

Perhaps the best way to explain my position is to draw an analogy with medical practice. A patient presents with an ear ache and a low grade fever. It’s an obvious inner ear infection and surely antibiotics is the cure. No matter how obvious the cure, no doctor in their right mind would prescribe drugs without an examination – including a medical history. Why?
The same is true of process improvement. 99 time out of 100, you may fix the special cause as a byproduct of your other efforts – by accident, so to speak. Finding and removing that special cause is to guard against that 1 time out of 100 that your efforts and the company’s resources will have been wasted. Your career may depend on taking that simple step. To skip it seems irresponsibly reckless to me.

Note For Chemists and Pharmacists:
Normality is a measure of concentration equal to the gram equivalent weight per liter of solution. Gram equivalent weight is the measure of the reactive capacity of a molecule. The solute's role in the reaction determines the solution's normality. Normality is also known as the equivalent concentration of a solution.

NORMALITY EQUATION:

Normality (N) is the molar concentration ci divided by an equivalence factor feq:
N = ci / feq
Another common equation is normality (N) equal to the gram equivalent weight divided by liters of solution:
N = gram equivalent weight / liters of solution (often expressed in g/L)
or it may be the molarity multiplied by the number of equivalents:
N = molarity x equivalents


UNITS OF NORMALITY
The capital letter N is used to indicate concentration in terms of normality. It may also be expressed as eq/L (equivalent per liter) or meq/L (milliequivalent per liter of 0.001 N, typically reserved for medical reporting).



EXAMPLES OF NORMALITY
For acid reactions, a 1 M H2SO4 solution will have a normality (N) of 2 N because 2 moles of H+ ions are present per liter of solution.

For sulfide precipitation reactions, where the SO4- ion is the important part, the same 1 M H2SO4 solution will have a normality of 1 N.

EXAMPLE PROBLEM:
Find the normality of 0.1 M H2SO4 (sulfuric acid) for the reaction:
H2SO4 + 2 NaOH → Na2SO4 + 2 H2O
According to the equation, 2 moles of H+ ions (2 equivalents) of sulfuric acid react with sodium hydroxide (NaOH) to form sodium sulfate (Na2SO4) and water. Using the equation:
N = molarity x equivalents
N = 0.1 x 2
N = 0.2 N
Don't be confused by the number of moles of sodium hydroxide and water in the equation.
Since you've been given the molarity of the acid, you don't need the additional information. All you need to figure out are how many moles of hydrogen ions are participating in the reaction. Since sulfuric acid is a strong acid, you know it completely dissociates into its ions.

POTENTIAL ISSUES USING N FOR CONCENTRATION
Although normality is a useful unit of concentration, it can't be used for all situations because its value depends on an equivalence factor that can change based on the type of chemical reaction of interest. As an example, a solution of magnesium chloride (MgCl2) may be 1 N for the Mg2+ ion, yet 2 N for the Cl- ion. While N is a good unit to know, it's not used as much as molarity or morality in actual labwork. It has value for acid-base titrations, precipitation reactions, and redox reactions. In acid-base reactions and precipitation reactions, 1/feq is an integer value. in redox reactions, 1/feqmay be a fraction.

"HOW DO I TEST MY DATA FOR NORMALITY?"

Many statistical tests and procedures assume that data follows a normal distribution (figure 1).
Figure 1: Histogram depicting a normal (bell-shaped) distribution in WinSPC

For example, all of the following statistical tests, statistics, or methods assume that data is normally distributed:

•    Hypothesis tests such as t tests, Chi-Square tests, F tests
•    Analysis of Variance (ANOVA)
•    Least Squares Regression
•    Control Charts of Individuals with 3-sigma limits
•    Common formulas for process capability indices such as Cp and Cpk

Before applying statistical methods that assume normality, it is necessary to perform a normality test on the data (with some of the above methods we check residuals for normality).  We hypothesize that our data follows a normal distribution, and only reject this hypothesis if we have strong evidence to the contrary.

While it may be tempting to judge the normality of the data by simply creating a histogram of the data, this is not an objective method to test for normality – especially with sample sizes that are not very large.  With small sample sizes, discerning the shape of the histogram is difficult.  Furthermore, the shape of the histogram can change significantly by simply changing the interval width of the histogram bars.

Normal probability plotting may be used to objectively assess whether data comes from a normal distribution, even with small sample sizes.  On a normal probability plot, data that follows a normal distribution will appear linear (a straight line).  For example, a random sample of 30 data points from normal distribution results in the first normal probability plot (Figure 2).  Here, the data points fall close to the straight line. The second normal probability plot (Figure 3) illustrates data that does not come from a normal distribution.















_________________________________________________________________________________




No comments:

Post a Comment