I have seen many theoretical explanations on difference between big data / machine learning / data science etc. The below is based on what I see practically. Let me elaborate on why 18 months of effort is required for data scientist 1 month for SAS / R : to be able to o Read write data o Filter / sort / merge / append data o Derive / format new fields o Able to apply set of commands together for various use cases 6 months for basic statistics : to be able to become comfortable with o Probabilities, Central tendencies & dispersion around centre o Normal distribution & Central Limit Theorem, o Sampling distribution of mean, proportion o Hypothesis testing (p value, 1 / 2 tail test, type 1 / 2 error, power of a test) o Linear regression (coefficient of determination, regression coefficient) o ANOVA (one way / two way) o Categorical data analysis (chi square tests of contingency tables) o non parametric tests (run test / spearman rank correlation) 6 months on machine learning o Decision tree techniques CHAID (chi square automatic interaction detector) CART - classification algorithm (for categorical outcome) and regression tree (for linear outcome variable) ID3 - another algorithm of decision tree, which use entropy C 4.5 Random Forest method o Logistic regression Model design Variable selection Dealing with multi collinearity Strength of the model (KS / GINI etc.) Model validation o Cluster analysis Hierarchical clustering Non-hierarchical (k-means) clustering Dealing with practical challenges of clustering 6 months for knowing applicability of different techniques / valid model design / proper model validation etc.)