PHQ-9 and PHQ-2 both screen for depression. Both are validated, widely used, and free. So which should you use?
The answer depends on your clinical context. These aren't competing tools. They're complementary instruments designed for different purposes.
The instruments
The PHQ-2 consists of two questions, the first two items from the PHQ-9: loss of interest or pleasure in doing things, and feeling down, depressed, or hopeless. Each rated 0-3, total score 0-6. Its purpose is quick identification of patients who may have depression and warrant further assessment.
The PHQ-9 covers all nine DSM-5 criteria for major depressive disorder: anhedonia, depressed mood, sleep disturbance, fatigue, appetite changes, worthlessness/guilt, concentration difficulties, psychomotor changes, and suicidal ideation. Total score 0-27, plus a functional impairment question. It works as a screening tool, severity measure, and treatment monitoring instrument.
Psychometric comparison
A 2020 JAMA meta-analysis of 100 studies (44,318 participants) established the definitive numbers:
| Measure | Threshold | Sensitivity | Specificity |
|---|---|---|---|
| PHQ-2 | ≥3 | 76% | 87% |
| PHQ-9 | ≥10 | 88% | 85% |
| PHQ-2 (≥2) → PHQ-9 (≥10) | Tiered | 85% | 89% |
PHQ-2 is designed for high specificity at the ≥3 threshold, but using a ≥2 cutoff followed by PHQ-9 captures more cases. The tiered approach reduced the number of patients needing the full PHQ-9 by 57% while maintaining sensitivity comparable to PHQ-9 alone.
What this means clinically: PHQ-2 is unlikely to miss depression but will flag some patients who don't have it. PHQ-9 provides better balance. Together, PHQ-2 casts a wide net and PHQ-9 confirms the catch.
When to use PHQ-2
Universal screening where efficiency matters: primary care wellness visits, ED triage, annual health assessments, large population screening. PHQ-2's 30-second administration makes universal screening practical.
High-volume settings where throughput matters: busy primary care clinics, community health centers, intake processes with many components. Two questions fit where nine might not.
Non-mental health specialty settings where depression screening isn't the primary focus: cardiology (depression is common in heart disease), oncology, chronic disease management, obstetric care. Brief screening fits more easily into specialty workflows.
Resource-limited contexts with limited behavioral health access, constrained follow-up capacity, or staff not trained in mental health assessment. Better to screen briefly than not screen at all.
When to use PHQ-9
Confirming positive PHQ-2 screens, the classic tiered approach. This uses PHQ-9's better specificity to filter PHQ-2's false positives while maintaining high sensitivity overall.
Assessing depression severity. PHQ-9 provides severity levels that guide treatment intensity:
- 0-4: Minimal
- 5-9: Mild
- 10-14: Moderate
- 15-19: Moderately severe
- 20-27: Severe
PHQ-2 doesn't differentiate severity.
Treatment monitoring. PHQ-9 is the standard for tracking treatment response because it covers all symptom domains, is sensitive to change, and provides meaningful severity tracking. PHQ-2 is too brief to capture treatment nuance.
Mental health specialty settings where depression is the focus: psychiatry practices, psychotherapy practices, depression treatment programs. You want the full picture, not just a quick screen.
Suicide risk identification. PHQ-9 item 9 specifically asks about suicidal ideation. PHQ-2 doesn't include this. For populations at elevated suicide risk, PHQ-9 provides safety screening that PHQ-2 cannot. (Note: item 9 has limited sensitivity for acute suicide risk and any positive response should trigger clinical evaluation, not substitute for a full safety assessment.)
Quality measurement requirements. Many metrics specify PHQ-9: HEDIS depression measures, MIPS quality measures, value-based care contracts.
The tiered screening approach
The tiered approach works like this: administer PHQ-2 to all patients, score immediately, administer PHQ-9 if score ≥3 (or ≥2 for higher sensitivity), then use PHQ-9 results to guide clinical response.
This is effective because roughly 75-80% of patients screen negative on PHQ-2, saving time. PHQ-2's sensitivity means few cases are missed, and PHQ-9 filters false positives.
Example implementation: In a primary care practice screening 100 patients, about 75 patients screen PHQ-2 negative (done). Of the 25 who screen positive, roughly half will have PHQ-9 ≥10 warranting clinical evaluation. The other half are likely false positives to monitor. Result: 12 patients identified for depression workup with minimal clinician time used for 75 negative screens.
When to skip PHQ-2 and go straight to PHQ-9
Mental health settings where depression is expected to be common. The efficiency of PHQ-2 offers little advantage here.
Treatment monitoring where you need PHQ-9 anyway.
High-risk populations with expected high positivity rates, since many patients need PHQ-9 regardless.
Suicide screening needed since item 9 provides safety information.
Quality measure requirements that specify PHQ-9.
PHQ-9 takes 2-3 minutes vs. 30 seconds for PHQ-2. In some contexts the extra time is acceptable: patients completing electronically before appointments, mental health intake processes, or already-identified depression patients. Time matters more in high-volume brief encounters than in scheduled mental health appointments.
Combining with other measures
PHQ-2 + GAD-2 (or the integrated PHQ-4) screens for both depression and anxiety in under a minute. Useful for general mental health screening, primary care triage, and population health programs. Followed by full PHQ-9 and/or GAD-7 for positives.
PHQ-9 + GAD-7 gives a thorough assessment of depression and anxiety in 4-5 minutes. Standard in many mental health settings.
For perinatal populations, consider the EPDS, which is specifically validated for that context, though PHQ-2/PHQ-9 tiered approach also works.
Scoring quick reference
PHQ-2: Score 0-2 is a negative screen (routine care). Score 3-6 is positive (administer PHQ-9).
PHQ-9: 0-4 minimal (watchful waiting), 5-9 mild (consider intervention), 10-14 moderate (treatment planning), 15-19 moderately severe (active treatment), 20-27 severe (immediate treatment, referral consideration).
PHQ-9 also includes a functional impairment question asking how difficult these problems have made work, home responsibilities, and relationships. High symptom score with minimal impairment differs clinically from high score with severe impairment.
Billing considerations
Both PHQ-2 and PHQ-9 qualify for 96127 billing when administered, scored, and documented with clinical interpretation. With the tiered approach, you can bill for the PHQ-2, and if positive, bill for the PHQ-9 as well (typically up to 2 units total per visit).
---
Survey Doctor offers both the PHQ-2 and PHQ-9 with automatic scoring and clinical interpretation. Create a free account to start screening patients with progress tracking over time.