Patient Health Questionnaire(s) (PHQ-2, PHQ-4, PHQ-8)
By Maggie Bowman
Psychology Research Assistant
✅ Coming soon on Bravely Connect
The PHQ-2 and PHQ-8 are self-report measures used to assess depression symptoms. The PHQ-4 is also a self-report measure for depression, but it assesses both depression and anxiety. The PHQ-2, PHQ-4, and PHQ-8 consist of two, four, and eight questions respectively, all of which ask clients how often in the past two weeks they were bothered by various depression symptoms (and, in the PHQ-4’s case, anxiety symptoms). Total scores are calculated via the summation of each item’s score. Higher scores indicate higher depression (and for the PHQ-4, anxiety) symptom severity. Limitations for the PHQ scales include inability to provide diagnoses, small amount of symptoms assessed, and underreporting of depression symptoms in some cultures. The PHQ-2 and PHQ-4 may be best suited for basic screening tools while the PHQ-8 may be more appropriate for tracking depression symptoms over the course of treatment.
📏 Lengths: 2 questions (PHQ-2), 4 questions (PHQ-4), and 8 questions (PHQ-8)
📋 Administration: Self-administered
🎯 Uses: Assessing presence and severity of depression symptoms (and, in the PHQ-4’s case, anxiety symptoms)
⚠️ Important Caveats: Good as quick screeners, less good for actual diagnosis
✅ Available in Bravely Connect? Yes
🌏 Culturally Applicable? Lots of validated translations, but some cultural stigmas surrounding depression may lead to respondents underreporting symptoms
💬 Translations? Over 50 translations available via the Pfizer PHQ Screener website
The PHQ-2, PHQ-4, PHQ-8 Question types and lengths
The client is presented with two questions asking how often in the past two weeks they were bothered by depressed mood and anhedonia (loss of pleasure or interest). Both questions use a 4-point Likert-type scale ranging from 0 (not at all) to 3 (nearly every day).
The client is presented with four questions asking how often in the past two weeks they were bothered by two depression symptoms and two anxiety symptoms. All four questions use a 4-point Likert-type scale ranging from 0 (not at all) to 3 (nearly every day).
The client is presented with eight questions asking how often in the past two weeks they were bothered by different symptoms of depression. All eight questions use a 4-point Likert-type scale ranging from 0 (not at all) to 3 (nearly every day).
For the full lists of questions, check out the measures on Bravely Connect, or follow the following links to the original unautomated versions:
What do the PHQ-2, PHQ-4, and PHQ-8 measure
The PHQ-2 consists of two items that assess symptoms of depression. One question measures the respondent’s feelings of anhedonia, or loss of pleasure or interest. The other question measures depressed mood—feeling down, depressed, or hopeless. Higher scores indicate higher feelings of depression.
The PHQ-4 combines the two-item depression assessment of the PHQ-2 with the two anxiety assessing items from the Generalized Anxiety Disorder 2-item (GAD-2) scale. As with the PHQ-2, the PHQ-4 contains two items assessing a respondent’s feelings of anhedonia and depressed mood and two items assessing feelings of nervousness and anxiety, and a sense of not being able to control feelings of worry. Higher scores indicate higher feelings of depression and anxiety.
The PHQ-8 consists of eight items that assess most symptoms of depression as listed in the DSM-IV. These items include those in the PHQ-2 as well as items assessing sleep, energy levels, appetite, low self-esteem, concentration, and altered physical behaviors. Higher scores indicate higher feelings of depression.
The PHQ-2 and PHQ-4 have been found to be sensitive to change, making these measures suitable for tracking treatment progress in individuals struggling with symptoms of depression; the PHQ-4 may also be appropriate for tracking changes in those being treated for anxiety. And, although there is a lack of research on the PHQ-8’s sensitivity to change, the PHQ-8 is highly correlated to the PHQ-9* which has research backing its ability to track changes in depressive symptoms.
(Please note that the PHQ-9 contains an item assessing suicidality which makes it unsuitable for asynchronous symptom monitoring. Research indicates that the PHQ-8, which contains the same items as the PHQ-9 excluding the item assessing suicidality, is still operationally very similar to the PHQ-9).
PHQ-2, PHQ-4, PHQ-8 Factor structures
The PHQ-2 and PHQ-8 both set out to unidimensionally assess symptoms of depression. Research generally supports the single-factor structure of the PHQ-8, but some studies have found a potential two-factor structure which distinguishes between bodily/physical “somatic” symptoms of depression and the more emotional “non-somatic” or “affective” depression symptoms.
The PHQ-4 assesses symptoms of depression and symptoms of anxiety. In studies conducting factor analysis, the questions addressing each condition map onto their own respective factors; the first two questions, originally taken from the GAD-2, cluster together for an “anxiety” factor, while the second two taken from the PHQ-2 cluster for depresson.
The history and theory behind the PHQ-2, PHQ-4, and PHQ-8
Depression is one of the most common psychiatric ailments seen in primary care settings along with anxiety, somatoform, substance use, and eating disorders (Spitzer et al., 1994). In 1994, Dr. Robert L. Spitzer, Dr. Janet B.W. Williams, and colleagues at the New York State Psychiatric Institute created the Primary Care Evaluation of Mental Disorders (PRIME-MD), a physician-administered instrument which assessed these prevalent mental health concerns. Items were created based on diagnostic criteria from the DSM-III-R. The Patient Health Questionnaire (PHQ), a self-report version of the PRIME-MD, was later developed in 1999 and validated in a study of over 3000 participants in American primary healthcare settings (Spitzer et al., 1999).
The PHQ-9 consists of the nine questions of the PHQ’s scale assessing depression symptoms. The PHQ-8 contains the same items as the PHQ-9 with one item dropped to make it more appropriate for research settings while still retaining sensitivity for measuring depression symptom severity. For busy healthcare settings intending to preliminarily screen for depression symptoms without necessarily assessing severity, the two-item PHQ-2 debuted in 2003. Since depression often occurs alongside anxiety disorders, the PHQ-4 emerged in 2009 as a means of screening for both depression and anxiety in a single measure.
PHQ-2, PHQ-4, and PHQ-8 Scoring Interpretations
The PHQ-2, PHQ-4, and PHQ-8 all have similar scoring methods in which involve the summation of item scores. As continuous measures, higher sums indicate overall higher depression (and in the PHQ-4’s case, anxiety) symptom severity. The 2003 pilot study of the PHQ-2 found that a score of three or higher had an 83% sensitivity for patients with major depressive disorder (Kroenke et al.). For the PHQ-8, a cutoff point of 10 had a 100% sensitivity for those with major depressive disorder (Kroenke, Strine, et al., 2009). Higher scores on the PHQ-4 were found to be correlated with increased functional impairment (Kroenke, Spitzer, et al., 2009).
Please note that the PHQ-2, PHQ-4, and PHQ-8 are all brief measures that are meant to screen for symptoms and assess symptom severity, not diagnose. They should be used in conjunction with other clinical assessments like clinical interviews and a thorough examination of an individual's symptoms and history.
Who developed the measures, licensing and how to obtain the PHQ-2, PHQ-4, and PHQ-8
All items from the various PHQ scales are based on items from the PRIME-MD (Primary Care Evaluation of Mental Disorders) which was developed by Dr. Robert L. Spitzer, Dr. Janet B.W. Williams, and colleagues at the New York State Psychiatric Institute in 1994. The PHQ-2, PHQ-4, and PHQ-8 were further refined by Dr. Spitzer and Dr. Kurt Kroenke.
All versions of the PHQ are available without any licensing requirements. No prior permission is required to reproduce, translate, display or distribute any of the PHQ measures.
The PHQ-2, PHQ-4, and PHQ-8 are available on Bravely Connect as part of our automated measures.
See the PHQ-2, PH-4, and PHQ-8 on Bravely Connect →
There are currently over 50 translations and localisations of the PHQ items. If you find a version you’d like adding to Bravely Connect then just let us know here.
Limitations, biases and when you should/shouldn’t use the PHQ-2, PHQ-4, or PHQ-8
The PHQ-2, PHQ-4, and PHQ-8 are all brief measures that are meant to screen for symptoms and assess symptom severity; they alone are not to be used for diagnoses. The PHQ-2 in particular may not be appropriate for tracking symptom severity as it only assesses two symptoms of depression and is intended to be used to quickly identify people who may benefit from more detailed screenings. Meanwhile, the PHQ-8 assesses a wider scope of depression symptoms, which may make it more suitable for tracking client progress over the course of treatment.
Studies have shown that the PHQ-8 and PHQ-2 have good reliability and validity as assessments of depression symptoms in several cultures, including Western, Asian, and Middle Eastern cultures. However, there have been concerns about the cultural and linguistic specificities of the PHQ-8 and -2 measures as depression may be expressed differently in different cultures. Certain cultural or linguistic nuances may be lost in translation, leading to difficulties in accurately interpreting the results.
A person’s cultural background may also contribute to underreporting depression symptoms if expressing such symptoms is stigmatized, as seen in a 2013 study where the PHQ-9’s sensitivity was lower for the elderly, especially elderly people from Asian cultures, who were more hestitant to report symptoms (Inagaki et al., 2013). Comprehensive clinical assessments including cultural and linguistic factors should be used in conjunction with the PHQ-8 and PHQ-2 to ensure accurate and culturally-sensitive measurement of depression symptoms.
Research on the cross-cultural validity of the PHQ-4 has been limited, but some studies have been conducted in various regions including North America, Europe, Asia, and Latin America. Overall, the results suggest that the PHQ-4 has acceptable cross-cultural validity, but there are some cultural differences that can affect the interpretation and use of the measure. Again, comprehensive clinical assessments taking clients’ cultural backgrounds into consideration are likely to yield the best results.
As always, if you’ve found a measure you would like adding to Bravely Connect as an automated measure, just drop us a measure request here.
Utilise tech to optimise your therapy
Bravely Connect is the tech-forward solution to save therapists time and maximise client engagement and therapeutic alliance. Say goodbye to boring admin. Start with Bravely and discover how we can improve your practice.
Tech for therapists
Behind Bravely is a team of passionate and determined researchers, psychologists, designers and developers — who are, above all, human beings who know what it’s like to struggle with their mental health.
bravely.io · linkedin · instagram