# Statistics & Linear Algebra- Top Interview Questions

Updated: Jan 21

**1. What is co-variance matrix- **

A covariance matrix is a square matrix that describes the covariance between multiple variables. It is commonly used in statistics and probability theory to describe the relationships between different random variables. The diagonal elements of a covariance matrix represent the variances of the individual variables, while the off-diagonal elements represent the covariances between pairs of variables. The matrix can be used to compute the correlation between variables, principal component analysis, and other statistical methods.

**2. What is Central Limit theorem and it's significance? **

The Central Limit Theorem (CLT) is a fundamental result in probability and statistics. It states that, given certain conditions, the average of a large number of independent, identically distributed random variables will converge to a normal (Gaussian) distribution, regardless of the underlying distribution of the individual variables.

In other words, if you have a large number of independent and identically distributed random variables, the sum or average of those variables will converge to a normal distribution.

This theorem is important because it allows us to approximate a wide range of distributions using the normal distribution, which is much simpler and has well-known properties. This simplifies many statistical calculations and allows us to make predictions about the behavior of large datasets based on the behavior of small samples.

**3. What is Gambler's Fallacy ?**

The Gambler's Fallacy is a cognitive bias that refers to the belief that the likelihood of a certain event happening is affected by previous events in a random sequence. In other words, it is the belief that the outcome of a random event is influenced by past events, such as the sequence of heads or tails in a coin toss, or the results of a series of spins in a slot machine.

For example, a person who believes in the gambler's fallacy may think that after a series of heads in a coin toss, the next toss is more likely to be tails. This is not the case as the each coin toss is an independent event, and the probability of getting heads or tails is still 50-50.

This bias is often seen in gambling, hence the name. It can lead to irrational decision making and can cause people to make poor decisions in situations involving probability. Understanding and recognizing this bias can help one to make more rational decisions in such situations.

**4. What is Law of Large Numbers?**

The law of large numbers is a fundamental concept in probability and statistics. It states that as the number of independent and identically distributed random variables in a sample increases, the sample mean (or expected value) will converge to the true mean of the underlying population.

In other words, as the sample size becomes larger and larger, the sample mean will become closer and closer to the true mean of the population. This convergence is almost certain but not immediate.

The law of large numbers is an important concept in statistics and is used in many different fields, including finance, insurance, and quality control. It is also the basis for many statistical methods and allows us to make predictions about the behavior of large datasets based on the behavior of small samples.

It is also important to note that the Law of Large numbers is a result of the central limit theorem which states that for a large enough sample, the distribution of the sample mean will be approximately normal.

**5. What is the difference between t test, anova, z test?**
t-test, ANOVA, and z-test are all statistical methods used to compare the means of two or more groups of data.
A t-test is used to compare the means of two groups of data. There are two types of t-test: a "student's t-test" and a "paired t-test". The student's t-test is used to compare the means of two independent groups, while the paired t-test is used to compare the means of two dependent groups.
ANOVA (analysis of variance) is used to compare the means of three or more groups of data. It can be used to determine whether there is a significant difference between the means of the different groups.
A z-test is a statistical test used to compare the mean of a sample to a known population mean, using a standard normal distribution. It is used when the sample size is large and the population standard deviation is known.
In summary, t-test is used to compare two means, ANOVA is used to compare multiple means and Z-test is used when the sample size is large and population standard deviation is known.

**6. What is cross entropy?**

Cross-entropy is a measure of the difference between two probability distributions. It is commonly used in machine learning, specifically in the training of neural networks and deep learning models, as a loss function to optimize the model's parameters. In a neural network, the output of the model is a probability distribution over the possible classes. The cross-entropy loss function compares this predicted distribution with the true distribution (also known as the target or label distribution) and measures the dissimilarity between the two. The cross-entropy loss is defined as the negative log likelihood of the true labels given the predicted probabilities. The goal of the training process is to minimize the cross-entropy loss, so that the predicted probabilities become as close as possible to the true labels. Cross-entropy is a popular loss function in deep learning because it is easy to optimize, it is well-behaved in terms of avoiding overfitting, and it has a nice probabilistic interpretation. It also can be used in a variety of other machine learning applications such as natural language processing and computer vision.

**7. What is PDF and PMF?**
What are PDF and PMF? The probability mass function (PMF) is used to describe discrete probability distributions. In contrast, the probability density function (PDF) is applied to describe continuous probability distributions.

**8. What is norm of a vector?**

The norm of a vector is a scalar value that represents the size or magnitude of the vector. There are different types of norms, but the most common one is the Euclidean norm, also known as the L2 norm/Frobenius norm. The Euclidean norm of a vector v with n elements, is calculated as the square root of the sum of the squares of the elements of the vector:

||v|| = sqrt(v1^2 + v2^2 + ... + vn^2)

In other words, it's the square root of the dot product of the vector with itself. This norm is often represented by the double vertical bars ||v||.

The Euclidean norm has the property that it's always non-negative and zero only for the zero vector. It's also the length of the vector if it is represented geometrically in a n-dimensional space.

There are other types of norms, such as L1 norm, L-infinity norm, max norm and so on, each of them have different properties and are useful for different types of problems.

**9. What is singular value decomposition ?**

Singular Value Decomposition (SVD) is a powerful tool in linear algebra that allows us to decompose a matrix into three simpler matrices: a unitary matrix, a diagonal matrix, and another unitary matrix. The decomposition is represented as:

A = U * Sigma * V^T

Where A is the matrix being decomposed, U and V are unitary matrices (i.e. their transpose is their inverse), and Sigma is a diagonal matrix.

The diagonal elements of the Sigma matrix are called the singular values of the matrix A, and they are always non-negative and arranged in descending order. The columns of the matrix U are called the left-singular vectors and the columns of the matrix V are called the right-singular vectors.

SVD has many useful properties and applications, such as:

It can be used to find the rank of a matrix.

It can be used to find the best low-rank approximation of a matrix.

It can be used to solve linear equations and perform least-squares fitting.

It can be used in image and signal processing, such as image compression and denoising.

It can be used in natural language processing, such as Latent Semantic Analysis.

It can be used in matrix factorization, such as Principal Component Analysis (PCA) and Latent Dirichlet Allocation (LDA).

SVD is a powerful technique and is widely used in many areas of science and engineering, such as computer vision, natural language processing, data mining, and many more.

**10. What is rank of a matrix?**

The rank of a matrix is a measure of the linearly independent columns or rows of a matrix. It can be thought of as the dimension of the space spanned by the columns or rows of the matrix. In other words, it's the maximum number of linearly independent columns or rows of a matrix.

There are different ways to calculate the rank of a matrix, but one of the most common methods is to use the Singular Value Decomposition (SVD) of the matrix. The rank of a matrix A is equal to the number of non-zero singular values of A.

Another way of calculating the rank is by finding the column rank or row rank of the matrix, which is the maximum number of linearly independent columns or rows, respectively.

It's important to note that the rank of a matrix can be different from the number of rows or columns, it can be less, equal or greater.

The rank of a matrix is a valuable information, it can be used to check if a matrix is invertible, find the solution of linear equations, check the linear independence of the columns or rows and also used in linear algebra and optimization.

**11. What are various types of Data distributions?**

There are several types of data distributions, including normal, uniform, binomial, and Poisson distributions.

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is symmetric around the mean.

The uniform distribution is a continuous probability distribution where all outcomes are equally likely.

The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of trials.

The Poisson distribution is a discrete probability distribution that describes the number of times an event occurs within a fixed interval of time or space.

Read more about the distributions - __https://www.mltutor.com/post/probability-distributions__

**12. What is trace of a matrix?**

sum of diagonal elements.

**13. What is a singular matrix?**

A singular matrix is a square matrix that is not invertible. In other words, it cannot be multiplied by its inverse to produce the identity matrix. This occurs when the matrix's determinant is equal to zero. A matrix is singular if and only if its rows or columns are linearly dependent, meaning that one row or column can be represented as a linear combination of the other rows or columns. Singular matrices are not invertible and they do not have a unique solution in linear equation systems. This is a problem in linear algebra and it's often used in the context of linear regression.

**14. W****hat is an invertiable matrix?**

An invertible matrix, also known as a non-singular matrix, is a square matrix that has a unique inverse. In other words, it can be multiplied by its inverse to produce the identity matrix. This occurs when the matrix's determinant is non-zero. A matrix is invertible if and only if its rows or columns are linearly independent, meaning that no row or column can be represented as a linear combination of the other rows or columns. Invertible matrices are used to solve linear equations and they play an important role in linear algebra and applied mathematics.

An invertible matrix can be written as A^(-1)A = I, where A^(-1) is the inverse matrix of A and I is identity matrix.

**15. What are linearly independent vectors?**

Linearly independent vectors are a set of vectors that cannot be represented as a linear combination of other vectors in the set. In other words, if you have a set of n vectors, {v1, v2, ..., vn}, they are linearly independent if none of the vectors in the set can be expressed as a linear combination of the other vectors in the set.

Formally, two vectors v and w are linearly independent if and only if the equation a*v + b*w = 0 where a, b are scalars and not both zero.

Linearly independent vectors have the property that the only solution to the equation is a = 0 and b = 0, meaning that the vectors are not proportional or parallel to each other.

Linearly independent vectors are important in linear algebra, they are used in the study of vector spaces, and they play a key role in the theory of linear equations and linear transformations.

**16. What are orthogonal vectors?**

Orthogonal vectors are a set of vectors that are perpendicular to each other. In other words, the dot product of two orthogonal vectors is equal to zero.

Formally, two vectors v and w are orthogonal if and only if their dot product is zero, i.e v.w = v1w1 + v2w2 + ... + vnwn = 0.

Orthogonal vectors are important in geometry, physics, signal processing and engineering applications, because they have the property that they do not change the other vector's direction when reflected. Also, when working with orthogonal vectors, the lengths of the vectors are preserved, and the angle between them is always 90 degrees.

Orthogonal vectors can be normalized to form an orthonormal set, which is a set of vectors that are orthogonal and have a length of 1.

**17. What are orthonormal vectors? **

Orthonormal vectors are a set of vectors that are both orthogonal and normalized. In other words, they are perpendicular to each other and have a length of 1.

Formally, two vectors v and w are orthonormal if and only if their dot product is zero, i.e v.w = v1w1 + v2w2 + ... + vnwn = 0 and the length of each vector is 1, i.e ||v|| = sqrt(v1^2 + v2^2 + ... + vn^2) = 1 and ||w|| = sqrt(w1^2 + w2^2 + ... + wn^2) = 1

Orthonormal vectors are important in many areas of mathematics and physics, because they have the property that they do not change the other vector's direction when reflected, they preserve the lengths of the vectors, and the angle between them is always 90 degrees, in addition to this, they have a simple dot product, which makes them useful in many computations.

Orthonormal sets of vectors are also used in many fields such as linear algebra, quantum mechanics, signal processing and engineering.

**18. What is inverse of a matrix?**

The inverse of a matrix is a matrix that, when multiplied by the original matrix, results in the identity matrix. The identity matrix is a square matrix with 1's along the main diagonal and 0's everywhere else. The inverse of a matrix is denoted by the superscript "-1", such as A^-1. Not all matrices have an inverse, such matrices are called as non-invertible or singular matrix.