Hello from The Everyday Series!
If you like our work, please consider supporting us so we can keep doing what we do. And as a current subscriber, enjoy this nice discount!

Also: if you haven’t yet, follow us on Twitter, TikTok, or YouTube!

One of the key reason the tasks of AI looks like magic is it's magic of finding words with similar meanings. So in a way we have found a magical thesauras running with numbers. One of the key ways by which it is done is using co-occurrence matrix.

A co-occurrence matrix is a table that displays the frequency with which two words or phrases appear together in a text. It can be used to identify patterns of word usage and help to understand the meaning of a text. For example, if you wanted to analyse the lyrics of a song, you could create a co-occurrence matrix to see which words are most commonly used together. This could give you an idea of the song's theme or mood.

There are different types of co-occurrence matrices, depending on how they are constructed. The most common type of co-occurrence matrix is the word frequency matrix. This matrix lists all of the words in a text, and then assigns a number to each word indicating how often it appears. The numbers are usually arranged from highest to lowest frequency. This type of matrix can be used to find the most common words in a text, or to identify relationships between words.

There are two kinds of co-occurrence between 2 words [Schütze and Pedersen, 1993]

First-order co-occurrence (syntagmatic association):
• They are typically nearby each other.
wrote is a first-order associate of book or poem.

Second-order co-occurrence (paradigmatic association):
• They have similar neighbors.
wrote is a second- order associate of words like said or remarked.

Another type of co-occurrence matrix is called an inverse document frequency (IDF) matrix . This Matrix measures how important a word is by counting how often it appears relative to other words . The IDF Matrix arranges its numbers from smallest (most important)to largest(least important).This Matrix Is Useful for identifying keywords

Finally, there's also the tf–idf (term frequency–inverse document frequency) Matrix which combines both Word Frequency and IDF measurements into one score .This matrix is useful for identifying important terms in a text.

Apart from various applications, Co-occurrence matrices can be used for research purposes. For example, if you were studying language development, you could use co-occurrence matrices to see how children use language at different stages of development. You could also use them to study how different cultures express themselves linguistically.

Tomorrow we will demonstrate by building one co-occurrence matrix

Do you like our work?
Consider becoming a paying subscriber to support us!