Glossary

Pointwise Mutual Information (PMI)

Pointwise Mutual Information (PMI)

Pointwise Mutual Information (PMI) is a statistical measure that is widely used in natural language processing and information retrieval. It helps in determining the strength of the association between two words in a given corpus of text. PMI measures the extent to which the occurrence of one word is dependent on the occurrence of another word within a specific context.

PMI is based on the principle of comparing the observed probability of two words co-occurring together in a text to the expected probability of their co-occurrence under the assumption of independence. By calculating the logarithm of the ratio between the observed and expected probabilities, PMI provides a valuable insight into the relationship between words.

The formula for calculating PMI is as follows:

PMI(word1, word2) = log2(P(word1, word2) / (P(word1) * P(word2)))

In this formula, P(word1, word2) represents the probability of both word1 and word2 occurring together, while P(word1) and P(word2) represent the individual probabilities of word1 and word2 occurring independently.

The resulting PMI value indicates the degree of association between the two words. A positive PMI value suggests a strong association, meaning that the occurrence of one word increases the likelihood of the other word appearing. Conversely, a negative PMI value suggests a negative association, meaning that the occurrence of one word decreases the likelihood of the other word appearing.

PMI has several applications in natural language processing and information retrieval. It is commonly used in tasks such as word sense disambiguation, information retrieval, and text classification. By highlighting the strength of word associations, PMI helps in improving the accuracy and relevance of various language-based algorithms and models.

In conclusion, Pointwise Mutual Information (PMI) is a statistical measure that quantifies the strength of the association between two words in a given corpus of text. It is a valuable tool in natural language processing and information retrieval, aiding in tasks such as word sense disambiguation, information retrieval, and text classification.

A wide array of use-cases

Trusted by Fortune 1000 and High Growth Startups

Pool Parts TO GO LogoAthletic GreensVita Coco Logo

Discover how we can help your data into your most valuable asset.

We help businesses boost revenue, save time, and make smarter decisions with Data and AI