Glossary
Cumulative Gain
A result can be relevant but still useless if it’s buried too deep in the list.
Cumulative gain measures total relevance in a set of results. It does not account for position. That’s why metrics like discounted cumulative gain (DCG) and normalized DCG (nDCG) are used. They give more weight to items that show up earlier in the list.
Because relevance alone isn't enough. Order makes it useful.
What Is Cumulative Gain?
Cumulative gain helps us understand the value of ranked content. It adds up the relevance scores of items in a result list, whether it's from a search engine or recommender system.
It works with binary relevance (relevant or not) or graded relevance (like a scale from 0 to 3).
But it doesn’t care where the item appears. A highly relevant result at the top gets the same score as one at the bottom.
This limits its usefulness. In practice, users tend to focus on top results. Cumulative gain ignores that.
It’s a helpful starting point, but not a full picture. To go further, we need to include position. That’s where DCG and nDCG come in.
How Cumulative Gain Works
To calculate cumulative gain, assign a relevance score to each item in a result list, then add the scores.
For example, if the scores are:
[3, 2, 3, 0, 1, 2]
Then the cumulative gain is:
3 + 2 + 3 + 0 + 1 + 2 = 11
This score does not change if you shuffle the order. That’s why it's limited for systems where order matters.
Why Cumulative Gain Falls Short
Cumulative gain tells you how many relevant results were returned. It doesn't tell you how well they were ranked.
You might return the best answers but place them at the bottom. Users won’t see them, and the experience will suffer.
From a user perspective, results ranked higher are more valuable. Cumulative gain doesn't reflect this.
That’s why it's not used alone in serious evaluation. It's a base metric. For ranking quality, we need to account for position.
From Cumulative to Discounted Cumulative Gain (DCG)
DCG adds ranking into the equation. The idea is simple: a relevant result is worth more when it appears early.
To reflect that, DCG divides each relevance score by the log of its position. The first item gets full value. Lower-ranked items get less.
For example:
- Rank 1: score
- Rank 2: score / log₂(3)
- Rank 3: score / log₂(4)
- and so on
If a score of 3 appears in position 1, it counts as 3. If it appears in position 5, it's divided by log₂(6), giving about 1.16.
This helps reward systems that bring useful results to the top.
Why DCG Needs Normalization
DCG alone can’t be used to compare across different lists. A longer list with more results will often get a higher DCG, even if it’s not better.
To fix this, we calculate the best possible DCG (called IDCG), and divide the actual DCG by it. This gives us a normalized score from 0 to 1.
This is called nDCG.
A value of 1 means the results were ranked in the best possible order. Lower values show room for improvement.
This makes it easier to compare performance across queries and systems.
How to Compute Cumulative Gain, DCG, and nDCG
Let’s go through a real example. Suppose we have a search result with relevance scores:
[3, 2, 3, 0, 1, 2]
Step 1: Cumulative Gain
Add the scores:
CG@6 = 3 + 2 + 3 + 0 + 1 + 2 = 11
Step 2: DCG
Apply logarithmic discounts starting from position 2:
DCG@6 = 3
+ 2 / log₂(3)
+ 3 / log₂(4)
+ 0 / log₂(5)
+ 1 / log₂(6)
+ 2 / log₂(7)
Numerically:
= 3 + 1.262 + 1.5 + 0 + 0.387 + 0.712 = 6.861
Step 3: IDCG
Sort the scores in descending order:
IDCG@6 = 3
+ 3 / log₂(3)
+ 2 / log₂(4)
+ 2 / log₂(5)
+ 1 / log₂(6)
+ 0
Step 4: nDCG
Now divide:
nDCG@6 = 6.861 / 8.740 ≈ 0.785
That means the actual ranking reached about 78.5% of the ideal.
When to Use Cumulative Gain Metrics
Each metric serves a different purpose.
Use CG when:
- You care only about whether relevant content was returned
- The order of results doesn’t matter
- You’re testing retrieval, not ranking
Use DCG when:
- You want to give more credit to items ranked higher
- You're evaluating changes to ranking models
- User behavior depends on order
Use nDCG when:
- You want a fair comparison across queries or lists
- You're tracking performance over time
- You need a single score that reflects both relevance and order
nDCG is the best option for large-scale systems that rely on ranked outputs.
FAQ: Cumulative Gain and Ranking Metrics
What is cumulative gain? It’s the total sum of relevance scores in a result list. It shows how much relevant content was returned, regardless of order.
Why doesn’t cumulative gain care about position? It just adds up scores. So it treats a relevant result at the top the same as one at the bottom.
How is DCG different from CG? DCG discounts results based on position. The lower the rank, the smaller the score.
What does nDCG mean? nDCG is DCG divided by the ideal DCG. It gives a score from 0 to 1 so you can compare different queries.
When should I use each metric? Use CG for basic checks. Use DCG when ranking matters. Use nDCG when comparing across different lists.
Can I use nDCG with binary relevance? Yes. It works with both binary and graded scores.
What’s IDCG? The highest possible DCG for a list. It’s used to normalize DCG into nDCG.
Does nDCG penalize irrelevant results? Not directly. It rewards good order. To penalize irrelevant results, combine nDCG with precision or recall.
Is there a best cutoff (k)? It depends on your use case. Use nDCG@5, nDCG@10, etc., depending on how far users look.
Summary
Cumulative gain metrics help you understand how well a system retrieves and ranks content.
- CG tells you what was returned
- DCG tells you how it was ranked
- nDCG lets you compare results across queries
They work together to turn raw relevance into actionable insight. Used well, they help improve search engines, recommender systems, and any tool that ranks content.
In a world where attention is limited, ranking matters. These metrics help make sure your results meet that need.
A wide array of use-cases
Discover how we can help your data into your most valuable asset.
We help businesses boost revenue, save time, and make smarter decisions with Data and AI