
When you formulate your research question or statement, you have certain components in that question or statement that needs to be objectively measured to increase the generalizability of your results.
Subjective data is not a concrete representation of results from your sample under study and therefore the results from your study cannot be generalized to the greater population. The major point of a research study is to use the data to generalize the results to the greater population.
The measurement of a certain construct within your study can be done objectively with an instrument that has established psychometric properties. These psychometric properties can demonstrate the strength and consistency of that instrument. The strength and consistency of an instrument is determined respectively by the validity and reliability of that instrument.

Validity measures show that you are measuring what you intend to evaluate with an instrument. A straightforward example is using the Sensory Profile 2 to measure sensory processing or the Peabody 2 to measure fine motor skills. Using the Peabody 2 to measure sensory processing would not be a valid measure of sensory processing.
Reliability is the consistency of the results of an instrument. Components of reliability can be either in the instrument or the individual that is administering the instrument.
In short, you want your instrument to be a valid measure of your construct or variable, in addition to being reliable. Having an instrument that measures what you want to measure and does so consistently, allows you to have confidence in your data and allows for more generalizability.
Ensuring that an instrument is measuring what you intend to measure is based on the strength of the instrument and the strength of the instrument is based on the different types of validity of that instrument.

Face—the instrument appears to test what it is suppose to measure. An instrument that measures sensory processing has good face validity if it has questions about reactions to certain sensory stimuli.
Content—items that makeup an instrument has an adequate representation of the content that defines a certain variable or construct. This type of validity is somewhat subjective in that the content is determined by “experts” in the field.
Criterion-related—the ability of one instrument (the target instrument) to predict results obtained on an external criterion (criterion measure). The target instrument is the instrument that is being validated by comparing the results of the target instrument to a criterion measure that has already been validated. For example, establishing the validity of observational gait analysis instrument by comparing the results with a computerized motion analysis system that has shown consistent gait analysis measurement capability [Portney, 2013]. There are two types of criterion-related validity:
- Concurrent—Establishes validity when two instruments are completed at the same time. Most often used when the target instrument is considered more efficient than the gold standard and, therefore, can be used instead of the gold standard. For example, using an established screening questionnaire in addition to parental interview (target instrument) to establish a certain diagnosis.
- Predictive—establishes that the outcome of the target instrument can be used to predict a future criterion score or outcome.
Construct—establishes the ability of an instrument to measure an abstract construct and the degree to which the instrument reflects the theoretical components of the construct. These measures are determined by existing knowledge.
In addition to measuring the correct construct (validity), you want to ensure that your measurements are consistent (reliability). Inconsistent data results (unreliable results) decrease generalizability. The reliability of an instrument is determined by the consistency of the results of the instrument under varying conditions and across individuals.

Test-rest—consistency of results of an instrument administered under the same conditions in two different points in time.
- Test-retest reliability can be analyzed using Pearson product-moment coefficient of correlation (for interval-ratio data) or Spearman rho (for ordinal data).
Intrarater Reliability—consistency of results of an instrument administered by one individual across two or more trials.
Interrater Reliability—consistency of results of an instrument administered by two or more raters who measure the same groups of subjects.
- The interclass correlation coefficient (ICC) should be used to evaluate rater reliability.
Internal consistency—reflects the extent to which items measure various aspects of the same characteristic and nothing else (Homogeneity). For example, a test looking at physical characteristics, places a question about psychological characteristics will affect the internal consistency or homogeneity of the instrument. The internal consistency of questions of an instrument can be determined by:
- Split half reliability: A reliability measure of internal consistency based on dividing the items on an instrument into two halves and correlating the results.
- Cronbach’s coefficient alpha—statistic most used for internal consistency. This statistic evaluates if items in a scale are measuring the same construct or if the items in the scale are redundant, suggesting which items could be discarded to improve the homogeneity of the scale.
Having a general understanding of psychometric properties of an instrument is an important component of establishing the strength and consistency of that instrument. This knowledge will help you decipher the information provided in the methods section of research articles where researchers explain the measures used to assess participants to establish the generalizability of the data obtained from that research study. While there are many other components to the psychometric properties of an instrument, having a small bite of that information can go a long way when reading or preparing a research study!