Machine learning / Data science / Applied analytics : is a buzz word at present. It gives a competitive edge to business. It is high in demand and short in supply at present.
Here is the listing of some frequently asked interview questions (both basic level and advance level) and associated youtube link.
Some basic general questions
- What is Data Science?
Data science usually has four components
- Execution skills : SAS / R / python – etc. for application of statistical / machine learning concepts on data
- Basic statistics – which forms the basis of all analysis
- Advance statistics / applied statistics (machine learning)
- Domain knowledge / Business knowledge
You can refer to this video for more details
Link – https://www.youtube.com/watch?v=lKsRyjqt5gI
- Relationship between Analytics, Machine learning, Deep learning etc.?
Please note
- Deep learning is a specific type of artificial neural network (ANN), where weights and biases are assigned so intelligently that it converges fast even with many hidden layers (deep layers)
- Artificial neural network is an assembly of neurons in such a way that it can solve many kind of cases.
- Artificial neural network is part of machine learning algorithms.
- Machine learning algorithms are part of artificial intelligence.
- Everything of AI runs on concepts of statistics.
You can refer to this video (first 4 minutes ) https://www.youtube.com/watch?v=vaSyVT_ULas
Artificial intelligence has three components
- Statistical concepts
- Business context
- Optimal programming for execution of statistical concepts for the given business scenario
- What are the types of machine learning algorithms.?
Machine learning algorithms are of three types
- Supervised learning – when outcome is known and you are trying to find pattern that associated with outcome. Logistic regression, support vector machine, linear regression, classification tree all are example of supervised learning. You can see below table to understand where to apply which techniques
- Unsupervised learning – where there is no dependent variable. Cluster analysis is the tool for unsupervised learning
- Reinforcement learning – where algorithm observes situation and associated action. Like autonomous car driving. See here a simplistic example of reinforcement learning (autonomous car ) https://www.youtube.com/watch?v=jMj-3bUYMQk
Some Basic technical questions
- What is time series analysis? Where is it used? –
Time series analysis is used to consider a series (only dependent variable – no independent variable in general) and forecast a future value based on gives data of the series. TCSI is one of the easiest techniques for the same. It needs detection of trend / seasonal component etc.
Here is the youtube link to learn about time series analysis using Excel link https://www.youtube.com/watch?v=ptDezc0ZojM
- What is logistic regression? Where it is applicable?
logistic regression is one of the most used technique of predictive modeling in the. The technique is applicable in predicting binary outcome variable like predicting risk (who will default), response (who will take offer), collectibility (who will make payment of delinquent balance) etc. SAS is a global leader for developing predictive models.
It has several aspects like
- Model design
- Variable treatment
- Variable selection
- Dealing with multicollinearity
- Score / KS / Gini generation
You can read more about the same in youtube link
link – https://www.youtube.com/watch?v=w0MqLCHhR_c
- What is Classification and regression tree (decision tree) / Random Forest Method ?
Decision tree is one of the easiest technique of machine learning that yields tremendous benefit to the business. It can work on binary outcome (classification tree) as well as numeric outcome (regression tree). It is usually quite fast to develop considering, it has inbuilt data processing and high tolerance of data issues.
It has several aspects like
- how to develop the decision tree,
- how to interpret the output,
- algorithm to understand how it’s working,
- how to understand it’s business benefit.
Random forest method is one of the most used techniques of decision tree at present.
Please read about the decision tree (https://www.youtube.com/watch?v=VaQdEK_ivIo) and random forest methods (
https://www.youtube.com/watch?v=LIPtRVDmj1M )
- What is Cluster analysis ?
Cluster analysis is the bread and butter of marketing. It is all about developing segments when there is no dependent variable.
That’s why it is also called unsupervised learning. Whereas logistic regression and decision tree is an example of supervised learning (as they have the dependent variable defined)
It has many aspects like
- hierarchical clustering,
- non-hierarchical cluster (k means clustering) and many lingo associated with the same like dendogram, scree plot, Euclidean distance, simple linkage, wards method etc.
read more about the same
link – https://www.youtube.com/watch?v=4Q0kUCvhmAk
- what is neural Network?
Neural network is a method, which tries to mimic human brain. It works on the concept of neurons, which takes certain input, processes it and passes it as output. A number of times, there are several layers and output of one layer becomes input of next layer.
- What is word cloud ?
Word cloud gives a graphical representation of words that are present in free flowing texts like comments. More frequently occurring words come in big font and less frequently occurring words comes in small font.
You can read more about work cloud and how to develop it using R here link – https://www.youtube.com/watch?v=DYtIN_gf0Ns
- What is support vector machine and kernel trick?
Support vector machine is a very popular algorithm of classification. It works on the basis of minimization of classification error.
Kernel trick is used for classification of non linear data using modified support vector machine algorithm
You can read more about SVM on link – https://www.youtube.com/watch?v=ikt7Qze0czE
- What is digital analytics ?
Digital analytics is just an application of analytics in the area of data related with digital platform.
You can read more about digital analytics on link – https://www.youtube.com/watch?v=W041P0nACqs
- What is conjoint analysis ?
Conjoint analysis is used in the context of product design related survey analysis. It delves into the process of decision making. How much value is associated by customer for each attribute of product specification?
For example – if there are mobiles getting developed, how much customer value 16 GB memory vs 32 GB memory, or 3 GB RAM vs 4 GB RAM etc.
You can read more about conjoint analysis on link – https://www.youtube.com/watch?v=TxzSuOIBBRo
- What is gradient boosting technique?
Graident boosting is a type of ensemble modeling, which work on errors in seuquential model to come up with final model, which can do a better job of prediction
Please do take a look at graphical presentation of the same on link https://www.youtube.com/watch?v=ErDgauqnTHk
Some Advance technical questions –
At this level you encounter questions related with comparison of techniques.
- Principal component analysis vs factor analysis
Principal component analysis deales with total variance in the data and it is not bothered with the meaning of components. Whereas factor analysis tried to find meaning associated with the factors
Here is the youtube link to read more
What is PCA – https://www.youtube.com/watch?v=8BKFd9izEXM
PCA vs factor analysis – https://www.youtube.com/watch?v=Lzv7GpGpbSM
- CART vs CHAID
Though both are decision tree techniques, they have different underlying techniques. CART uses GINI for classification whereas CHAID uses chi square. CART can handle numeric outcome as well, which is not possible for CHAID
Here is the youtube link to read more on link https://www.youtube.com/watch?v=CxFRrNVBIFQ
- Regression tree vs linear regression
Both has numeric outcome variable only. However regression tree can take even categorical variable as input variable, which can not be taken in linear regression case as it is. To use categorical variable in case of linear regression, one will have to create dummy variable.
Here is the youtube link for linear regression vs Regression tree on link https://www.youtube.com/watch?v=RrOrGQKkHu4
- LDA vs PCA
LDA is very similar in terms of usage as classification tree or logistic regression. It can be extended easily for those cases whereas there is more than two categories of outcome variable (like High risk / medium risk / low risk) . It has slightly stringent assumptions about numeric independent variables than logistic regression
Here is the youtube link to read more on link https://www.youtube.com/watch?v=M4HpyJHPYBY
- Bagging vs boosting
See example of the same here on link : https://www.youtube.com/watch?v=AiePAlZy_t8
- ANOVA vs MANOVA
See it through graphs on link : https://www.youtube.com/watch?v=jUksjmKvwos