While we try to provide an overview of answers to questions
about cortisol measurement where there was consensus among meeting participants, we also
review areas where consensus was not reached, and note areas where future research is
needed.

1)What are the most appropriate measures of cortisol for
epidemiological or field type studies focusing on stress and disease? For example, should
we be looking at the shape of the daytime rhythm? Area under the curve? Awakening
challenge? Lunch Challenge? Dexamethasone challenge?
This is the area where it was the most difficult to reach group consensus. Although
the goal of the meeting was to settle on an optimal measure of cortisol for field studies
the complexity of the parameters involved in cortisol measurement worked against the
meetings reaching this goal. Despite the lack of consensus, the MacArthur Network on
SES and Health benefited greatly from the meeting, and the many discussions during the
meeting have contributed to the development of a protocol for a large epidemiological
study the Network is undertaking.
General factors: It is important to note that the "appropriateness" of
a cortisol measure may be at least in part based on the research question. For example, a
project to look at stress reactivity may optimally use a measure quite different from one
aimed at identifying the association between cortisol and health outcomes. The difference
between looking at cortisol as an indicator of disease (possibly evening cort more
sensitive) versus as a marker of stress reactivity (awakening challenge more important) is
central to establishing a measurement protocol.
The population under study may affect the choice of cortisol measure. In studying a
normal population area-under-the-curve may be the best measure whereas rhythm profiles may
be the optimum measure in the study of diseased or psychologically burned-out populations.
Any teleological interpretation of cortisol patterns, however measured, is risky (e.g., a
rise may have very different meanings dependent on the stress level or physiological
health of the subject). Thus, a sharp rise to a morning challenge with a robust and
short-term decline may indicate health in many populations, but a similar rise to morning
challenge in a chronically stressed population may indicate a less functional cortisol
response (e.g., the decline may be more gradual, indicating continued HPA
over-stimulation). The "flatness" of a cortisol rhythm may be indicative of a
long-term response to chronic stress (i.e., Kirschbaums burn-outs) or a variant
within the normal range.
Discussion of salivary ("free") versus plasma ("total") cortisol
measurement enters into any decision about measurement methods. In many research settings
salivary cortisol may be the only choice. The primary question then becomes what assay kit
to use and what sampling protocol to follow. Salivary cort kits vary significantly in
paralleling plasma cort (assumedly the "gold standard"); ORION is significantly
better correlated with plasma measurements than are other kits. More salivary measurements
are better with 49 being the "gold standand". But an agreed upon optimum daily
number and schedule that maximizes the valid and reliable measuring of diurnal cortisol
production can be calculated (see Q 3 below).
Total cortisol concentration over the day (Area-under-the-curve): This measure
seemed less controversial than did the rhythm/profile. The AUC was valued as a promising
measure most notably in establishing a link between cortisol levels and psychological
functioning.
Rhythm dysfunction: Although there is substantial variation in rhythm shapes
there is little evidence that this is associated with stress. In a review paper, Art Stone
in collaboration with Clemens Kirschbaum, Joe Schwartz and Sheldon Cohen show that a
proportion of profiles in a normal, healthy population is flat. They find that 51% show
typical cycles, 17% had flat cycles and 34% had inconsistent cycles. Individual
differences are large, and day-to-day stability is only modest. Curves vary with time and
with place and mode of awakening. What the profile means is not clear.

David Spiegel has shown that metastatic breast cancer patients with relatively flat
rhythms showed earlier mortality. These "flat" rhythms were "high"
flats as compared to the low "flats" observed in non-patient populations. Among
Spiegels patients split at the median cortisol slope 77% of those with flat rhythms
died (average 2.9 years), while only 52% of those with steep rhythms died (average
survival 4.1 years). It is important to note that those with flatter diurnal cortisol
profiles experienced more sleep disturbance and had diminished natural killer cell
numbers. However the flat rhythm pattern was not associated with questionnaire measures of
adjustment, and mean diurnal cortisol in this group was negatively associated with
reported positive interpersonal connections (e.g., appraisal, belonging, tangible support
and cohesive, expressive family environment). The picture of the environmental space these
patients inhabited is complex. These flat rhythms may be a marker of disease progression
as opposed to a cause of disease progression. The relationship of these cortisol patterns
to the psychotherapeutic intervention of group support that is the primary focus of
Spiegels study is as yet unclear.
Stone et al in their review article suggest that cycle type is associated with
subsequent self-reported URI rates (i.e., flat cyclers had lower upper respiratory
infection rates). But this may be confounded by the difficulty of interpreting what
"flat cyclers" means (i.e., are "normal" flat cyclers functionally
different than "flat cyclers" who are flat cycling due to a systemic response to
chronic stress, as in "burn-out" or PTSD). If so, how do we differentiate the
healthiness of flatness from the unhealthiness of flatness?
Wake-up challenge: Across a large group of "normal" subjects 50% show
a morning rise in cort with a peak at 30-45 min., 8/10 are responders to an awakening
challenge (+2.5 nmoll/l or more), and 75% show a consistent response over 2 days. Lack of
a morning rise may not be healthy (e.g., morning rise activates appetite and cognitive
functioning), so a lack of this rise may be indicative of dysfunction. Although some
researchers construe the magnitude of the rise itself as pathological, this may not be a
wise interpretation. Clemens Kirschbaum emphasizes that the "crispness" of the
response is of central importance (i.e., an abrupt response to the awakening challenge
followed by a quick and elegant decline that sets the stage for the diurnal movement
towards a healthy nadir and period of noctural quiescence).
Lunch challenge: It is agreed that there is universally a cortisol response to
the presentation and ingestion of food at noon. The magnitude of the cortisol response to
lunch challenge is in part dependent upon the composition of the lunch meal (e.g.,
carbohydrates elevate cort levels) which complicates the lunch challenge as a stressor
within a naturalistic (uncontrolled) study.
Dexamethasone test: Generally thought that as level of reported stress goes up,
the degree of suppression to dex test is reduced.

2) What evidence is there that the measures listed above are
associated with psychological and social characteristics? With health characteristics?
Total cortisol concentration over the day (Area-under-the-curve): Sheldon Cohen
finds that AUC is related to almost all psychosocial measures (elevated
"anxiety", "hostility", "not calm"). Its relationship to
health characteristics is less clear.
Rhythm dysfunction: This is an area of significant controversy. Does an
unexpected cortisol profile mean anything in terms of reactivity, in terms of health? May
it be the total output of cortisol that is related to health outcomes, rather than the
cortisol profile? What does it mean to be "burned-out"?
Eve Van Cauter reported that the analysis of multiple samples shows that the nadir
rather than the peak of diurnal cortisol is associated with disease progression. Related
to this, sleep deprivation is associated with elevations in evening cort. Sleep debt
flattens the slope and the area-under-the -curve is enlarged. Sleep debt produces a
pronounced response to breakfast challenge and big changes in glucose tolerance, and
appears to have profound and wide-ranging effects on cortisol functioning within the
individual. In the breast cancer patients, flatter diurnal cortisol profiles were
associated with diminished natural killer cell numbers.
Wake-up challenge: Clemens Kirschbaum notes that the response to morning
challenge may be tied to the state characteristics of the subject, and that many factors
for sorting subjects need to be monitored. Chronically stressed individuals may show a
quite different challenge response than do "normals".
The Perceived Stress Scale shows that high stress is associated with a larger response
to awakening, and that the number of stressful life events is associated with a higher
rise to awakening. But, data are also available that show that flat cycles are associated
with shift work and other scenarios that affect the quality and quantity of participant
sleep.
Lunch challenge: There is some evidence that the lunch challenge is associated
with chronic stress state. However, in the course of the meeting the lunch challenge
received the least attention as a useful salivary cortisol measurement tool, particularly
in a field (uncontrolled) study.
Dexamethasone test: Sheldon Cohen has demonstrated that individuals reporting
the lowest perceived stress and who demonstrate stable/normal cortisol curves show the
most suppression to a dexamethasone challenge. Those reporting mid-range stress and
showing stable flat curves show somewhat less suppression, and those showing high stress
and an unstable profile show the least suppression.

3) How many samples need to be collected to get a reasonable
assessment of each of these measures? When (what times) should they be collected? What
would be the absolute minimum measurement required for large epidemiological studies where
a greater number of participants can compensate for less precise measurement?
The agreement on this question was rather good. Generally, more measurements are
better per day, more days per measurement segment, and more distributed episodes of
measurement are better. Art Stones ecological momentary assessment with its
attention to both physiological reactivity and environmental precipitants was very well
received; however its feasibility with a SES diverse and large population was questioned.
Analyses done by Cohens lab and by Joe Schwartz agree that the minimal number of
samples needed for a one-day cortisol measurement protocol is 4 or 5. Researchers agree
that to get an accurate area-under-the-curve for a day that measurements at 1, 4, 9, and
11 hours after wakening provide good coverage. Single day assessments are very weak
approaches to this problem since measures are affected by many day-to-day variations, and
this is especially difficult when the shape of the rhythm is of interest, since this seems
rather sensitive to the influence of stress.
Other researchers, possibly with more interest in the rhythm profile, suggested that
the morning rise and the evening corts were most important. They would recommend that five
samples be obtained (#1 Immediately on wakening and before get out of bed; +45 min. after
awakening; 4-6 pm; 6-9 pm and 9-bedtime).The actual time of the later three samples would
be randomly determined (e.g., using Palm Pilot prompts) so that across the sample of
participants data would be obtained to cover the full interval. These parameters provide:
the morning awakening increase, the afternoon-evening slope, the evening nadir and the
area-under-the-curve.
Cushings Disease and Unipolar Depression patient cortisol profiles are
differentiated by the afternoon readings. If looking for abnormal cortisol profiles as a
pathway to the metabolic syndrome, evening corts are probably the most important.
Joe Schwartz calculated that data should be collected over 3-4 days to get a reliable
assessment of a "trait" daily concentration (area-under-the-curve), and for 6 or
more days to get a reliable assessment of a "trait" rhythm. The advantage of
using multiple days is that it helps to control the unreliability of one days data
which can underestimate the cortisol relationship to outcomes. For example, if nine
samples per day are collected, collection over 4 days will give an estimate of
area-under-the-curve with .80 reliability, and 8 days will give an estimate of the slope
with .80 reliability. The effect of under-sampling on estimates of the slope and the
possibility that under-sampling may lead to underestimation of the slope is a topic that
needs further research.
The MacArthur Network has settled on a one-day, six-sample protocol for a large
epidemiologic study it is undertaking. The timing for the samples is: 1) awakening, 2) 45
minutes after wakening, 3) 2.5 hours after wakening, 4) 8 hours after wakening, 5) 12
hours after wakening, and 6) bedtime.

4) What are the most appropriate techniques for analyzing
these data? What kinds of measures need to be considered as possible control factors
(e.g., menstrual cycle, age, gender, food intake)?
The most appropriate analysis techniques as with measurement itself, are dependent
upon the question being addressed (e.g., cortisol relationship to stress versus to disease
outcome). Area-under-the-curve, or total cortisol concentration, appears to be the most
universally accepted technique, whereas rhythm analyses (i.e., slope) are more
controversial. Rhythm and magnitude analyses may benefit from hierarchical linear
modeling.
As with the question above, the agreement on possible control factors was good. The
factors thought important to consider in cortisol measurement are of six types:
- Stable characteristics of individuals: age, gender
- State characteristics: menstrual cycle stage, contraceptive and other medication use
- Disease/"chronic" condition characteristics: liver disease, PTSD, malnutrition
or fasting, "voluntary" flattening of cort as result of lifestyle (e.g., jet lag
or shift work)
- Dynamic characteristics: food intake (e.g., carbohydrates increase cortisol), sleep
status (e.g., assess sleep quality and quantity on night prior to cortisol measurement),
exercise (e.g., level and timing), wake-up time
- Psychological characteristics: positive and negative affect, passivity of coping
- Whether it is advisable to screen out, or probe subjects for other factors which affect
cortisol patterns (e.g., smoking, alcohol use) is debatable, and possibly relates to the
issue under study (i.e., relationship to psychosocial factors versus to disease).
In the MacArthur salivary cortisol protocol a log has been developed for the patient to
fill-out during the day of cortisol collection. This log is in the piloting stage. It
elicits information at the time of each sample about most recent food ingestion and
psychological state. The psychological probes: 1) How much did you feel happy, excited, or
content when you woke up? (Not at all, Somewhat, Very much, Extremely). 2) How much did
you feel worried, anxious, or fearful when you woke up? (Not at all, Somewhat, Very much,
Extremely). In addition at the conclusion of the sampling, the participant is asked to
report in the logbook about a number of control factors. These include cigarette smoking,
alcohol consumption, drugs or medicines taken, vigorous exercise, time of usual awakening,
the most stressful event of the day (time, duration, degree of stressnot at all
stressed, somewhat, moderately, very stress, the most stressed Ive ever felt). The
participant is also asked about the typicality of the day in terms of how busy, pressured
or stressed the participant felt during the sampling day.

5) Can we agree on methods of maximizing adherence to
collection procedures, particularly timing of samples. This is a particularly difficult
problem with "wake up" samples. Because the most rapid changes in cortisol
levels happen in the hour or so after waking, the amount of time that elapses between
waking and the first sample can have a substantial impact on the shape of the day-time
rhythm?
General points about adherence: 1) Subjects need to be entrained into believing that
what they are doing is crucial to science and the medical care of a large majority of
people. 2) Subjects need to believe that if they cheat on their cortisol measurements
(e.g., measure late or fake measurements) they will be discovered. 3) Multiple, timed
(palm-pilot or watch prompted) salivary measurements are the ideal. 4) Ecological
momentary assessments are of interest when multiple interactive data points are possible
(i.e., biologic measurements coupled with environmental events).
Various technologies were suggested to increase adherence. Those that seem most
universally applauded have some method for either keying into a palm pilot or watch, to
record a code or to get a special code from the instrument which is written on the
cortisol sample. Clemens Kirschbaum has developed a system that records when the salivette
is removed for use; "smartcaps" is another version of this system. Even with
these systems data will be lost; in the case of watches, it was noted that 15-17% of data
may be lost.
The MacArthur epidemiologic protocol uses watches which vibrate at the programmed
sampling times. At the time of the vibration a code appears on the watch which the subject
is required to write in the logbook and on the salivette container. The logbook serves
both as a way to gather data (as noted in Question 4), and also as an adherence booster
(e.g., the participant is asked to record the time of the next sample at the bottom of
each page of the logbook, potentially increasing adherence).

6) On a micro level: What do we mean by awake? Opening eyes
first time? Deciding to be awake? Getting out of bed? And other procedural issues that are
so crucial to valid and reliable measurement?
There was good consensus that if one is interested in the awakening rise, that it
is crucial to insure compliance with a set protocol. For most researchers,
"awake" seemed to mean the conjunction of opening eyes and being alert enough to
insert a salivette into the mouth. Some researchers at the meeting held that it is
important that the first sample taken be done prior to any major physical movement such as
getting out of bed. Additionally, it is important that the individual not do normal
morning activities such as brushing teeth, eating breakfast, exercising, until any 30 and
45 min or 60 min samples are completed (since all these activities would affect cortisol
level sampling). Essentially, the ideal for assessing the "awakening challenge"
is an in bed awakening sample followed by a 45 minute sample, with no teeth brushing, food
ingestion or vigorous movement intervening.
When cortisol samples taken in the home setting (with natural wake-up regimen, either
with spontaneous awakening or routine alarm clock awakening) are compared to those
obtained in a controlled setting where timing of sampling is standardized (e.g., wake-up
in Sheldon Cohens hotel) the natural routines of the subjects can play a role. For
example, late risers are way down on their cortisol cycle when awakened in the hotel, thus
showing a later peak. However, early risers and late risers look no different when
measurement is done at home, assumedly due to each following a "normal"
awakening routine. This underscores the importance of establishing the participants
typical wake-up time, and the utility of using participant diurnal cycle sampling (e.g.,
sampling times linked to wake-up time rather than sampling at clock-linked points in the
day).

7) What are the most appropriate measures of cortisol
(HPA) response if we can get people in the laboratory?
This is an area where the salivary "free" versus the plasma
"total" contrast may be most active. Although total cortisol in overnight urines
may be included in large epidemiologic studies (i.e., as is the case in the current
MacArthur project), it is more typical that such more invasive measurements would be
limited to more intensive laboratory studies. The informativeness of obtaining such total
cortisol measurements was not questioned at the meeting, except on practicality grounds.
Further in large epidemiological studies or field studies, the possibility of taking blood
samples is quite small, within the lab setting this is quite possible. Again, there is a
research need to better understand the relationship between the measurement of salivary
cortisol, blood plasma cortisol levels, and total urine-based cortisol measurements.
Various challenge studies (e.g., the Trier Social Stress Test) are suitable for
laboratory use. Challenge tests are a very important means to assess reactivity and
endogenous activity of the HPA axis, and salivary cortisol represents one of the easiest
and most informative endpoints of HPA activity. Basal measurements of cortisol at wakening
and in the evening provide an estimate of the diurnal rhythm, and responses to morning
wakening and the TSST provide estimates of reactivity that reflect ongoing life stress as
well as the intrinsic potential of the HPA axis to respond.
The MacArthur Network is in the process of developing a reactivity protocol to be used
in conjunction with the larger epidemiologic study that will soon be underway. Questions
the Network is considering in developing a challenge study include:
1. Who should be studied (what sampling strategy)?
- SES (what measure to use, and what is reasonable stratification)
- Psychosocial characteristics (current characterization or include historical factors;
include social integration or relationship quality); possible attempt to fill 2 x 2
(Low/High SES and Low Psychosocial resources/reserves//High Psychosocial
resources/reserves)
- Childhood abuse history as parameter
- Other characteristics (e.g., gender, ethnicity, obesity, depression)
- Exclusion criteria (e.g., certain meds, health issues such as uncontrolled hypertension,
depression and PTSD, liver disease, etc.)
2. What should the challenge be?
- Issue of SES biases (e.g., meaning/impact of challenge given SES-based history; impact
of education and attendant life experiences on challenge response)
- Possible challenges
- video game (Atari Breakout; distribution of novelty?)
- mirror image star-tracing
- cold pressor (found to be highly aversive by many participants)
- cognitive tasks (likely to be highly influenced by educational differences)
- TSST (public speaking experiences biased along educational lines)
- Driving challenge (potential issue of non-drivers and lack of familiarity with
"task")
- Interpersonal challenge (discussion with "provocative confederate", can it be
made "real" enough to act as effective challenge)
3. What outcome (reactivity) measurements should be obtained?
- Saliva
- Cortisol repeated sampling? (see above)
- Oxytocin
- Other parameters?
- Cardiovascular
- BP, pulse
- Heart rate variability
- Other (e.g., impedance cardiography)
- Blood
- Baseline values useful even if cant get "reactivity" biologic measures?
- Igf-1
- Oxytocin
- DHEA (S)
- Testosterone
- Multiple samples
- ACTH
- Growth Hormone
- Immune assessment (baseline or repeated)
- Urine (for period of challenge and recovery)
- Integrated catecholamines (NE, EPI, other parameters)
4. What are the sample size requirements?
- 20 per cell a likely requirement
- sample size may be affected by the scope and structure of larger epidemiologic study
from which challenge population is drawn (i.e., practical issue of willingness to expose a
longitudinal sample to more intensive experimental protocols)