PSY 4550 Week 2 Assignment 2 Reliability and Validity
Overview of Reliability and Validity
Outside of statistical research, reliability and validity are used interchangeably. For research and testing, there are subtle differences. Reliability implies consistency: if you take the ACT five times, you should get roughly the same results every time. A test is valid if it measures what its supposed to.
Tests that are valid are also reliable. The ACT is valid (and reliable) because it measures what a student learned in high school. However, tests that are reliable arent always valid. For example, lets say your thermometer was a degree off. It would be reliable (giving you the same results each time) but not valid (because the thermometer wasnt recording the correct temperature).
What is Reliability?
Reliability is a measure of the stability or consistency of test scores. You can also think of it as the ability for a test or research findings to be repeatable. For example, a medical thermometer is a reliable tool that would measure the correct temperature each time it is used. In the same way, a reliable math test will accurately measure mathematical knowledge for every student who takes it and reliable research findings can be replicated over and over.
Of course, its not quite as simple as saying you think a test is reliable. There are many statistical tools you can use to measure reliability. For example:
Kuder-Richardson 20: a measure of internal reliability for a binary test (i.e. one with right or wrong answers).
Cronbachs alpha: measures internal reliability for tests with multiple possible answers.
Internal vs. External Reliability
Internal reliability, or internal consistency, is a measure of how well your test is actually measuring what you want it to measure. External reliability means that your test or measure can be generalized beyond what youre using it for. For example, a claim that individual tutoring improves test scores should apply to more than one subject (e.g. to English as well as math). A test for depression should be able to detect depression in different age groups, for people in different socio-economic statuses, or introverts.
One specific type is parallel forms reliability, where two equivalent tests are given to students a short time apart. If the forms are parallel, then the tests produce the same observed results.
The Reliability Coefficient
A reliability coefficient is a measure of how well a test measures achievement. It is the proportion of variance in observed scores (i.e. scores on the test) attributable to true scores (the theoretical real score that a person would get if a perfect test existed).
