Wednesday, January 30, 2008

Validity & Statistics

Important Characteristics of Measures
• Validity
• Reliability
• Objectivity
• Usability

Validity vs. Reliability

Validity= appropriateness, correctness, meaningfulness, and usefulness of inferences made about the instruments used in a study
Reliability= consistency of scores obtained, across individuals, administrators, and sets of items

Relationship Between Reliability and Validity
Suppose I have a faulty measuring tape and I use it to measure each student’s height.
On the other hand, if I have a correctly printed measuring tape...
My tool is invalid, but it’s still reliable.
My tool is both valid and reliable.

Something can be valid & reliable.
Something can be invalid but reliable.
But if something is unreliable, it is always invalid.

Types of Validity
Content Validity -
Criterion Validity -
Predictive Validity - Ability of the measure to predict future performance
Concurrent Validity -
• Convergent vs. DiscriminantValidity
Convergent Validity - they are trying to show that one measure is showing the same thing as another measure
Discriminant Validity - showing one measure is actually showing something quite different than another measure
• Construct Validity -
• Internal Validity - How well is your study designed?

Threats to Internal Validity:
Subject characteristics
Mortality threat (attrition)
Location
Instrumentation
Data Collectors
Testing
History
Maturation
Attitude of subjects
Regression threat
Implementation

Ways That Threats to Internal Validity Can be Minimized:
a. Standardized study conditions - The "Bus Test" - If you walked out the door and got hit by a bus, someone else could pick up right where you left off with your research.
b. Obtain more information on individuals in the sample
c. Obtain more information about details of study
d. Choice of appropriate design

Reliability Checks
Test-Retest (aka Stability) - Tests have consistent results
Equivalent Forms - Multiple forms of the same test - If one individual takes both forms of the test, the scores should be highly correlated
Internal Consistency
Split-half - compare 1/2 of the items on the test to the other 1/2 to ensure that all items on the test are reliable - NEVER compare the 1st half to the last half of the items because fatigue or not completing the test can greatly affect the answers on the 2nd half; instead you could compare odd #s to even #s
–KuderRichardson
–ChronbachAlpha
• Inter-Rater (Agreement)

Analyzing Data
Frequency Polygon aka Frequency Distribution
Normal Distribution
Descriptive Statistics = describe a sample
Inferential Statistics = describe a sample, and are inferred to a larger (target) population

•Measures of Central Tendency:

–Mean = statistical average - the best, most stable measure of central tendency
–Median = middle score
–Mode = most frequent score

• Measures of Variability

–Range = highest score minus the lowest score
–Standard deviation = average deviation from the mean
–Standard error of measurement = range in which “true score”is likely to fall
Standardized scores (or z-scores) = transform raw scores into standard deviation units on the normal distribution; z = (raw score –mean) / stand. dev.

Correlational Data -- plotted on scatterplots
• Correlation Coefficients
–“r”can range from -1 to +1
–Negative correlation = as one variable decreases, other increases (r is close to -1)
–Positive correlation = as one variable increases, other also increases (r is close to +1)
–Zero correlation = no relationship between the two variables (the closer r is to 0, the less correlation there is)
*You cannot imply causation from correlation.*

Hypothesis Testing
Null Hypothesis (H0) = set up to state that there is no effect
Alternative Hypothesis (H1) = set up to state that there is an effect
These two hypotheses must be:
• Mutually Exclusive - they can't overlap - either there is no effect or there is an effect
• Exhaustive

Test by determining by doing statistics to determine probability that the result was due to
chance:
• If probability that the result was due to chance <> 5%, the null hypothesis cannot be rejected
• 5% level => alpha level => .05
So, a researcher wants the probability (p) that their results were due to chance to be less than 5% (0.05).
If p is <>
If p is > 0.05, there is a non-significant effect.

If my null hypothesis is true, but I reject the null, that is a Type I Error.
If my null hypothesis is true, and I fail to reject the null, that is a correct decision.
If my null hypothesis is false, and I reject the null, that is a correct decision.
If my null hypothesis is false, and I fail to reject the null, that is a Type II Error.

This is the one I want! I will do anything I can to increase my POWER to get this result.

Ways Researchers May try to Increase Likelihood of Rejecting Null Hypothesis:
• Increase sample size.
• Control for extraneous variables (confounds).
• Increase the strength of the treatment.
• Use a one-tailed test when justifiable.

How do you know how “big”an effect really is?
• Effect Sizes = an estimate of the magnitude of an effect between two groups or variables
Cohen’s d - an estimate of effect size
–η2(eta-squared) or partial η2
–Coefficient of determination (R2)

Interpreting Cohen’s d:
Small d <.2 (statistically significant, but not practically significant)
Medium .3 < d < .5
Large d > .5

NEXT WEEK
• Moving into different Research Designs
–Everybody read:
• Kavalearticle• Rosen & Solomon article
–Starting with Meta-Analyses
• I’ll discuss Kavalearticle
• Staci, Randy, and Katie will lead discussion on Rosen & Solomon article
• Initial Article Analyses are Due
–Use guidelines on Initial Analysis handout–Consider “What you Know to Ask So Far”
–Turn in your review and a complete copy of the article you reviewed
1st Quiz is due Tuesday at midnight.

No comments: