Reducing the dimension

In the last post, we studied how a matrix can be used to process information and one element of the matrix can represent some parameters of the element.

Let's say we want to perform an analysis of the population of a city's age with their purchasing power but we also want to take into account a few other parameters of the population, like the neighbourhood, gender, monthly earnings etc. We can clearly represent all of these in the form of a matrix as explained in the previous post. As the number of these parameters increases or the size of the population increases, we will find that matrix computation is computationally expensive. In some cases we may even have to convert text to numbers, something that's very common in AI-related applications, making it further complex to do various operations. This may end up giving us a sparse matrix.

A large sparse matrix is a matrix that has a lot of empty space and can take up a lot of space on a disk. It will further also be computationally expensive to run such a matrix for any type of training and validation.

For those who want to analyse the data, it is also difficult to understand and visualize data as there are many dimensions. So it would be better to compress or transform the data into a smaller dataset.