MacArthur SES & Health Network
MacArthur SES & Health Network

printable version

Salivary Cortisol Measurement

Summary prepared by Judith Stewart in collaboration with Teresa Seeman. Last revised June, 2000.

This chapter represents the results of a meeting organized by the MacArthur Research Network on SES and Health at Rockefeller University in December, 1999. The aim of the meeting was to explore the conceptual issues relevant to day-time cortisol measurement, and to reach agreement on the day-time cortisol measurement protocol which is best supported by currently available data. The participants at the meeting were: Per Björntorp, Sheldon Cohen, Elissa Epel, Clemens Kirschbaum, Bill Lovallo, Bruce McEwen, Greg Miller, Joseph Schwartz, Teresa Seeman, David Spiegel, Judith Stewart, Arthur Stone, Eve Van Cauter, and Elizabeth Young.

Chapter Contents

  1. Measures of Cortisol in Field Studies
  2. Association with Psychological and Social Characteristics
  3. Number of Samples Needed
  4. Techniques for Analyzing Data
  5. Maximizing Adherence
  6. Micro Level Issues
  7. People in the Laboratory
  8. Selected Bibliography

While we try to provide an overview of answers to questions about cortisol measurement where there was consensus among meeting participants, we also review areas where consensus was not reached, and note areas where future research is needed.

Measures of Cortisol in Field Studies

1) What are the most appropriate measures of cortisol for epidemiological or field type studies focusing on stress and disease? For example, should we be looking at the shape of the daytime rhythm? Area under the curve? Awakening challenge? Lunch Challenge? Dexamethasone challenge?

This is the area where it was the most difficult to reach group consensus. Although the goal of the meeting was to settle on an optimal measure of cortisol for field studies the complexity of the parameters involved in cortisol measurement worked against the meeting's reaching this goal. Despite the lack of consensus, the MacArthur Network on SES and Health benefited greatly from the meeting, and the many discussions during the meeting have contributed to the development of a protocol for a large epidemiological study the Network is undertaking.

General factors: It is important to note that the "appropriateness" of a cortisol measure may be at least in part based on the research question. For example, a project to look at stress reactivity may optimally use a measure quite different from one aimed at identifying the association between cortisol and health outcomes. The difference between looking at cortisol as an indicator of disease (possibly evening cort more sensitive) versus as a marker of stress reactivity (awakening challenge more important) is central to establishing a measurement protocol.

The population under study may affect the choice of cortisol measure. In studying a normal population area-under-the-curve may be the best measure whereas rhythm profiles may be the optimum measure in the study of diseased or psychologically burned-out populations. Any teleological interpretation of cortisol patterns, however measured, is risky (e.g., a rise may have very different meanings dependent on the stress level or physiological health of the subject). Thus, a sharp rise to a morning challenge with a robust and short-term decline may indicate health in many populations, but a similar rise to morning challenge in a chronically stressed population may indicate a less functional cortisol response (e.g., the decline may be more gradual, indicating continued HPA over-stimulation). The "flatness" of a cortisol rhythm may be indicative of a long-term response to chronic stress (i.e., Kirschbaum's burn-outs) or a variant within the normal range.

Discussion of salivary ("free") versus plasma ("total") cortisol measurement enters into any decision about measurement methods. In many research settings salivary cortisol may be the only choice. The primary question then becomes what assay kit to use and what sampling protocol to follow. Salivary cort kits vary significantly in paralleling plasma cort (assumedly the "gold standard"); ORION is significantly better correlated with plasma measurements than are other kits. More salivary measurements are better with 49 being the "gold standand." But an agreed upon optimum daily number and schedule that maximizes the valid and reliable measuring of diurnal cortisol production can be calculated (see Q 3 below).

Total cortisol concentration over the day (Area-under-the-curve): This measure seemed less controversial than did the rhythm/profile. The AUC was valued as a promising measure most notably in establishing a link between cortisol levels and psychological functioning.

back to top

Rhythm dysfunction: Although there is substantial variation in rhythm shapes there is little evidence that this is associated with stress. In a review paper, Art Stone in collaboration with Clemens Kirschbaum, Joe Schwartz and Sheldon Cohen show that a proportion of profiles in a normal, healthy population is flat. They find that 51% show typical cycles, 17% had flat cycles and 34% had inconsistent cycles. Individual differences are large, and day-to-day stability is only modest. Curves vary with time and with place and mode of awakening. What the profile means is not clear.

David Spiegel has shown that metastatic breast cancer patients with relatively flat rhythms showed earlier mortality. These "flat" rhythms were "high" flats as compared to the low "flats" observed in non-patient populations. Among Spiegel's patients split at the median cortisol slope 77% of those with flat rhythms died (average 2.9 years), while only 52% of those with steep rhythms died (average survival 4.1 years). It is important to note that those with flatter diurnal cortisol profiles experienced more sleep disturbance and had diminished natural killer cell numbers. However the flat rhythm pattern was not associated with questionnaire measures of adjustment, and mean diurnal cortisol in this group was negatively associated with reported positive interpersonal connections (e.g., appraisal, belonging, tangible support and cohesive, expressive family environment). The picture of the environmental space these patients inhabited is complex. These flat rhythms may be a marker of disease progression as opposed to a cause of disease progression. The relationship of these cortisol patterns to the psychotherapeutic intervention of group support that is the primary focus of Spiegel's study is as yet unclear.

Stone et al in their review article suggest that cycle type is associated with subsequent self-reported URI rates (i.e., flat cyclers had lower upper respiratory infection rates). But this may be confounded by the difficulty of interpreting what "flat cyclers" means (i.e., are "normal" flat cyclers functionally different than "flat cyclers" who are flat cycling due to a systemic response to chronic stress, as in "burn-out" or PTSD). If so, how do we differentiate the healthiness of flatness from the unhealthiness of flatness?

Wake-up challenge: Across a large group of "normal" subjects 50% show a morning rise in cort with a peak at 30-45 min., 8/10 are responders to an awakening challenge (+2.5 nmoll/l or more), and 75% show a consistent response over 2 days. Lack of a morning rise may not be healthy (e.g., morning rise activates appetite and cognitive functioning), so a lack of this rise may be indicative of dysfunction. Although some researchers construe the magnitude of the rise itself as pathological, this may not be a wise interpretation. Clemens Kirschbaum emphasizes that the "crispness" of the response is of central importance (i.e., an abrupt response to the awakening challenge followed by a quick and elegant decline that sets the stage for the diurnal movement towards a healthy nadir and period of noctural quiescence).

Lunch challenge: It is agreed that there is universally a cortisol response to the presentation and ingestion of food at noon. The magnitude of the cortisol response to lunch challenge is in part dependent upon the composition of the lunch meal (e.g., carbohydrates elevate cort levels) which complicates the lunch challenge as a stressor within a naturalistic (uncontrolled) study.

back to top

Dexamethasone test: Generally thought that as level of reported stress goes up, the degree of suppression to dex test is reduced.

Association with Psychological and Social Characteristics

2) What evidence is there that the measures listed above are associated with psychological and social characteristics? With health characteristics?

Total cortisol concentration over the day (Area-under-the-curve): Sheldon Cohen finds that AUC is related to almost all psychosocial measures (elevated "anxiety," "hostility," "not calm"). Its relationship to health characteristics is less clear.

Rhythm dysfunction: This is an area of significant controversy. Does an unexpected cortisol profile mean anything in terms of reactivity, in terms of health? May it be the total output of cortisol that is related to health outcomes, rather than the cortisol profile? What does it mean to be "burned-out"?

Eve Van Cauter reported that the analysis of multiple samples shows that the nadir rather than the peak of diurnal cortisol is associated with disease progression. Related to this, sleep deprivation is associated with elevations in evening cort. Sleep debt flattens the slope and the area-under-the -curve is enlarged. Sleep debt produces a pronounced response to breakfast challenge and big changes in glucose tolerance, and appears to have profound and wide-ranging effects on cortisol functioning within the individual. In the breast cancer patients, flatter diurnal cortisol profiles were associated with diminished natural killer cell numbers.

Wake-up challenge: Clemens Kirschbaum notes that the response to morning challenge may be tied to the state characteristics of the subject, and that many factors for sorting subjects need to be monitored. Chronically stressed individuals may show a quite different challenge response than do "normals."

The Perceived Stress Scale shows that high stress is associated with a larger response to awakening, and that the number of stressful life events is associated with a higher rise to awakening. But, data are also available that show that flat cycles are associated with shift work and other scenarios that affect the quality and quantity of participant sleep.

Lunch challenge: There is some evidence that the lunch challenge is associated with chronic stress state. However, in the course of the meeting the lunch challenge received the least attention as a useful salivary cortisol measurement tool, particularly in a field (uncontrolled) study.

Dexamethasone test: Sheldon Cohen has demonstrated that individuals reporting the lowest perceived stress and who demonstrate stable/normal cortisol curves show the most suppression to a dexamethasone challenge. Those reporting mid-range stress and showing stable flat curves show somewhat less suppression, and those showing high stress and an unstable profile show the least suppression.

Number of Samples Needed

3) How many samples need to be collected to get a reasonable assessment of each of these measures? When (what times) should they be collected? What would be the absolute minimum measurement required for large epidemiological studies where a greater number of participants can compensate for less precise measurement?

back to top

The agreement on this question was rather good. Generally, more measurements are better per day, more days per measurement segment, and more distributed episodes of measurement are better. Art Stone's ecological momentary assessment with its attention to both physiological reactivity and environmental precipitants was very well received; however its feasibility with a SES diverse and large population was questioned.

Analyses done by Cohen's lab and by Joe Schwartz agree that the minimal number of samples needed for a one-day cortisol measurement protocol is 4 or 5. Researchers agree that to get an accurate area-under-the-curve for a day that measurements at 1, 4, 9, and 11 hours after wakening provide good coverage. Single day assessments are very weak approaches to this problem since measures are affected by many day-to-day variations, and this is especially difficult when the shape of the rhythm is of interest, since this seems rather sensitive to the influence of stress.

Other researchers, possibly with more interest in the rhythm profile, suggested that the morning rise and the evening corts were most important. They would recommend that five samples be obtained (#1 Immediately on wakening and before get out of bed; +45 min. after awakening; 4-6 pm; 6-9 pm and 9-bedtime).The actual time of the later three samples would be randomly determined (e.g., using Palm Pilot prompts) so that across the sample of participants data would be obtained to cover the full interval. These parameters provide: the morning awakening increase, the afternoon-evening slope, the evening nadir and the area-under-the-curve.

Cushing's Disease and Unipolar Depression patient cortisol profiles are differentiated by the afternoon readings. If looking for abnormal cortisol profiles as a pathway to the metabolic syndrome, evening corts are probably the most important.

Joe Schwartz calculated that data should be collected over 3-4 days to get a reliable assessment of a "trait" daily concentration (area-under-the-curve), and for 6 or more days to get a reliable assessment of a "trait" rhythm. The advantage of using multiple days is that it helps to control the unreliability of one day's data which can underestimate the cortisol relationship to outcomes. For example, if nine samples per day are collected, collection over 4 days will give an estimate of area-under-the-curve with .80 reliability, and 8 days will give an estimate of the slope with .80 reliability. The effect of under-sampling on estimates of the slope and the possibility that under-sampling may lead to underestimation of the slope is a topic that needs further research.

The MacArthur Network has settled on a one-day, six-sample protocol for a large epidemiologic study it is undertaking. The timing for the samples is: 1) awakening, 2) 45 minutes after wakening, 3) 2.5 hours after wakening, 4) 8 hours after wakening, 5) 12 hours after wakening, and 6) bedtime.

Techniques for Analyzing Data

4) What are the most appropriate techniques for analyzing these data? What kinds of measures need to be considered as possible control factors (e.g., menstrual cycle, age, gender, food intake)?

back to top

The most appropriate analysis techniques as with measurement itself, are dependent upon the question being addressed (e.g., cortisol relationship to stress versus to disease outcome). Area-under-the-curve, or total cortisol concentration, appears to be the most universally accepted technique, whereas rhythm analyses (i.e., slope) are more controversial. Rhythm and magnitude analyses may benefit from hierarchical linear modeling.

As with the question above, the agreement on possible control factors was good. The factors thought important to consider in cortisol measurement are of six types:

  • Stable characteristics of individuals: age, gender
  • State characteristics: menstrual cycle stage, contraceptive and other medication use
  • Disease/"chronic" condition characteristics: liver disease, PTSD, malnutrition or fasting, "voluntary" flattening of cort as result of lifestyle (e.g., jet lag or shift work)
  • Dynamic characteristics: food intake (e.g., carbohydrates increase cortisol), sleep status (e.g., assess sleep quality and quantity on night prior to cortisol measurement), exercise (e.g., level and timing), wake-up time
  • Psychological characteristics: positive and negative affect, passivity of coping
  • Whether it is advisable to screen out, or probe subjects for other factors which affect cortisol patterns (e.g., smoking, alcohol use) is debatable, and possibly relates to the issue under study (i.e., relationship to psychosocial factors versus to disease).

In the MacArthur salivary cortisol protocol a log has been developed for the patient to fill-out during the day of cortisol collection. This log is in the piloting stage. It elicits information at the time of each sample about most recent food ingestion and psychological state. The psychological probes: 1) How much did you feel happy, excited, or content when you woke up? (Not at all, Somewhat, Very much, Extremely). 2) How much did you feel worried, anxious, or fearful when you woke up? (Not at all, Somewhat, Very much, Extremely). In addition at the conclusion of the sampling, the participant is asked to report in the logbook about a number of control factors. These include cigarette smoking, alcohol consumption, drugs or medicines taken, vigorous exercise, time of usual awakening, the most stressful event of the day (time, duration, degree of stress—not at all stressed, somewhat, moderately, very stress, the most stressed I've ever felt). The participant is also asked about the typicality of the day in terms of how busy, pressured or stressed the participant felt during the sampling day.

Maximizing Adherence

5) Can we agree on methods of maximizing adherence to collection procedures, particularly timing of samples. This is a particularly difficult problem with "wake up" samples. Because the most rapid changes in cortisol levels happen in the hour or so after waking, the amount of time that elapses between waking and the first sample can have a substantial impact on the shape of the day-time rhythm?

General points about adherence: 1) Subjects need to be entrained into believing that what they are doing is crucial to science and the medical care of a large majority of people. 2) Subjects need to believe that if they cheat on their cortisol measurements (e.g., measure late or fake measurements) they will be discovered. 3) Multiple, timed (palm-pilot or watch prompted) salivary measurements are the ideal. 4) Ecological momentary assessments are of interest when multiple interactive data points are possible (i.e., biologic measurements coupled with environmental events).

back to top

Various technologies were suggested to increase adherence. Those that seem most universally applauded have some method for either keying into a palm pilot or watch, to record a code or to get a special code from the instrument which is written on the cortisol sample. Clemens Kirschbaum has developed a system that records when the salivette is removed for use; "smartcaps" is another version of this system. Even with these systems data will be lost; in the case of watches, it was noted that 15-17% of data may be lost.

The MacArthur epidemiologic protocol uses watches which vibrate at the programmed sampling times. At the time of the vibration a code appears on the watch which the subject is required to write in the logbook and on the salivette container. The logbook serves both as a way to gather data (as noted in Question 4), and also as an adherence booster (e.g., the participant is asked to record the time of the next sample at the bottom of each page of the logbook, potentially increasing adherence).

Micro Level Issues

6) On a micro level: What do we mean by awake? Opening eyes first time? Deciding to be awake? Getting out of bed? And other procedural issues that are so crucial to valid and reliable measurement?

There was good consensus that if one is interested in the awakening rise, that it is crucial to insure compliance with a set protocol. For most researchers, "awake" seemed to mean the conjunction of opening eyes and being alert enough to insert a salivette into the mouth. Some researchers at the meeting held that it is important that the first sample taken be done prior to any major physical movement such as getting out of bed. Additionally, it is important that the individual not do normal morning activities such as brushing teeth, eating breakfast, exercising, until any 30 and 45 min or 60 min samples are completed (since all these activities would affect cortisol level sampling). Essentially, the ideal for assessing the "awakening challenge" is an in bed awakening sample followed by a 45 minute sample, with no teeth brushing, food ingestion or vigorous movement intervening.

back to top

When cortisol samples taken in the home setting (with natural wake-up regimen, either with spontaneous awakening or routine alarm clock awakening) are compared to those obtained in a controlled setting where timing of sampling is standardized (e.g., wake-up in Sheldon Cohen's hotel) the natural routines of the subjects can play a role. For example, late risers are way down on their cortisol cycle when awakened in the hotel, thus showing a later peak. However, early risers and late risers look no different when measurement is done at home, assumedly due to each following a "normal" awakening routine. This underscores the importance of establishing the participant's typical wake-up time, and the utility of using participant diurnal cycle sampling (e.g., sampling times linked to wake-up time rather than sampling at clock-linked points in the day).

People in the Laboratory

7) What are the most appropriate measures of cortisol (HPA) response if we can get people in the laboratory?

This is an area where the salivary "free" versus the plasma "total" contrast may be most active. Although total cortisol in overnight urines may be included in large epidemiologic studies (i.e., as is the case in the current MacArthur project), it is more typical that such more invasive measurements would be limited to more intensive laboratory studies. The informativeness of obtaining such total cortisol measurements was not questioned at the meeting, except on practicality grounds. Further in large epidemiological studies or field studies, the possibility of taking blood samples is quite small, within the lab setting this is quite possible. Again, there is a research need to better understand the relationship between the measurement of salivary cortisol, blood plasma cortisol levels, and total urine-based cortisol measurements.

Various challenge studies (e.g., the Trier Social Stress Test) are suitable for laboratory use. Challenge tests are a very important means to assess reactivity and endogenous activity of the HPA axis, and salivary cortisol represents one of the easiest and most informative endpoints of HPA activity. Basal measurements of cortisol at wakening and in the evening provide an estimate of the diurnal rhythm, and responses to morning wakening and the TSST provide estimates of reactivity that reflect ongoing life stress as well as the intrinsic potential of the HPA axis to respond.

The MacArthur Network is in the process of developing a reactivity protocol to be used in conjunction with the larger epidemiologic study that will soon be underway. Questions the Network is considering in developing a challenge study include:

back to top

  • Who should be studied (what sampling strategy)?
    • SES (what measure to use, and what is reasonable stratification)
    • Psychosocial characteristics (current characterization or include historical factors; include social integration or relationship quality); possible attempt to fill 2 x 2 (Low/High SES and Low Psychosocial resources/reserves//High Psychosocial resources/reserves)
    • Childhood abuse history as parameter
    • Other characteristics (e.g., gender, ethnicity, obesity, depression)
    • Exclusion criteria (e.g., certain meds, health issues such as uncontrolled hypertension, depression and PTSD, liver disease, etc.)

  • What should the challenge be?
    • Issue of SES biases (e.g., meaning/impact of challenge given SES-based history; impact of education and attendant life experiences on challenge response)
    • Possible challenges
      • video game (Atari Breakout; distribution of novelty?)
      • mirror image star-tracing
      • cold pressor (found to be highly aversive by many participants)
      • cognitive tasks (likely to be highly influenced by educational differences)
      • TSST (public speaking experiences biased along educational lines)
      • Driving challenge (potential issue of non-drivers and lack of familiarity with "task")
      • Interpersonal challenge (discussion with "provocative confederate," can it be made "real" enough to act as effective challenge)

  • What outcome (reactivity) measurements should be obtained?
    • Saliva
      • Cortisol – repeated sampling? (see above)
      • Oxytocin
      • Other parameters?

    • Cardiovascular
      • BP, pulse
      • Heart rate variability
      • Other (e.g., impedance cardiography)

    • Blood
      • Baseline values useful even if can't get "reactivity" biologic measures?
        • Igf-1
        • Oxytocin
        • DHEA (S)
        • Testosterone

      • Multiple samples
        • ACTH
        • Growth Hormone

    • Immune assessment (baseline or repeated)
    • Urine (for period of challenge and recovery)
      • Integrated catecholamines (NE, EPI, other parameters)

  • What are the sample size requirements?
    • 20 per cell a likely requirement
    • sample size may be affected by the scope and structure of larger epidemiologic study from which challenge population is drawn (i.e., practical issue of willingness to expose a longitudinal sample to more intensive experimental protocols)

back to top

Selected Bibliography

Fuchs E., Kirschbaum C., Benisch D. & Bieser A. (1997). Salivary cortisol: a non-invasive measure of hypothalamo-pituitary-adrenocortical activity in the squirrel monkey, Saimiri sciureus. Laboratory Animals, 31(4):306-11.

Hellhammer D.H., Buchtal J., Gutberlet I. & Kirschbaum C. (1997). Social hierarchy and adrenocortical stress reactivity in men. Psychoneuroendocrinology, 22(8):643-50. [A PDF of this article is available.]

Kirschbaum C., Kudielka B.M., Gaab J., Schommer N.C. & Hellhammer D.H. (1999). Impact of gender, menstrual cycle phase, and oral contraceptives on the activity of the hypothalamus-pituitary-adrenal axis. Psychosomatic Medicine, 61(2):154-62.

Pruessner J.C., Hellhammer D.H. & Kirschbaum C. (1999). Burnout, perceived stress, and cortisol responses to awakening. Psychosomatic Medicine, 61(2):197-204.

Rosmond R. & Björntorp P. (1998). Endocrine and metabolic aberrations in men with abdominal obesity in relation to anxio-depressive infirmity. Metabolism: Clinical & Experimental, 47(10):1187-93.

Rosmond R. & Björntorp P. (1998). The interactions between hypothalamic-pituitary-adrenal axis activity, testosterone, insulin-like growth factor I and abdominal obesity with metabolism and blood pressure in men. International Journal ob Obesity & Related Metabolic Disorders, 22(12):1184-96.

Rosmond R., Dallman M.F. & Björntorp P. (1998). Stress-related cortisol secretion in men: relationships with abdominal obesity and endocrine, metabolic and hemodynamic abnormalities (see comments). J. Clinical Endocrinology & Metabolism, 83(6):1853-9. Comment in: J. Clinical Endocrinology & Metabolism, 83(6):1842-5.

Schmidt-Reinwald A., Pruessner J.C., Hellhammer D.H., Federenko I., Rohleder N., Schurmeyer T.H. & Kirschbaum C. (1999). The cortisol response to awakening in relation to different challenge tests and a 12-hour cortisol rhythym. Life Sciences, 64(18):1653-60. [A PDF of this article is available.]

Sephton, S. E., Sapolsky R. M., Kraemer H.C. & Spiegel D. (2000).Diurnal Cortisol Rhythm as a Predictor of Breast Cancer Survival. Journal of the National Cancer Institute, 92:994-1000. [A PDF of this article is available.]

Steptoe A., Wardle J. Lipsey Z., Mills R., Oliver G., Jarvis M. & Kirschbaum C. (1998). A longitudinal study of work load and variations in psychological well-being, cortisol, smoking, and alcohol consumption. Annals of Behavioral Medicine, 20(2):84-91.

Smyth J., Ockenfels M.C., Porter L., Kirschbaum C., Hellhammer D.H. & Stone A.A. (1998). Stressors and mood measured on a momentary basis are associated with salivary cortisol secretion. Psychoneuroendocrinology, 23(4):353-70. [A PDF of this article is available.]

Stone A.A., Schwartz, J. E., Smyth, J., Kirschbaum, C., Cohen, S., Hellhammer, D. & Grossman, S. (2001). Individual differences in the diurnal cycle of salivary free cortisol: a replication of flattened cycles for some individuals. Psychoneuroendocrinology, 26:295-303.

Turner-Cobb, J., Sephton S. E. , et al. (2000).Social support and salivary cortisol in women with metastatic breast cancer. Psychosomatic Medicine, 62:337-345.

back to top


UCSF Home About UCSF Search UCSF UCSF Medical Center