Data Science – Key Probability & Statistics Topics to Master

Table of content for probability & statistics
This article represents a list of key probability & statistics topics that one may need to master if he is aiming to become a data scientist. This article lists topics that has worked for me so far in relation with working on a data science problem. One could also see the below list as table of content for key probability and statistics topics for data science. However, I do believe that there are some topics that I might not have mentioned. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.
Probability & Statistics Topics

Following are some of the key topics listed under categories such as Probability and Statistics, that one would want to master to get good at data science.

  • Probability: Following are probability-related topics which once mastered would prove very helpful while working on various machine learning algorithms:
    • Introduction to Probability: This topic covers the basics related with concept of Probability including the basic formulae.
    • Probability concepts: This topic covers the basic fundamentals in relation with some of the following. Note that these concepts would prove very handy in classification related machine learning algorithms such as Logistic Regression, naïve Bayes classification:
      • Union (Probability of union of two or more events)
      • Intersection (Probability of the intersection of the two or more events)
      • Complement (Probability that the event does not occur)
      • Bayes rule (Probability that an event occurs given that another event has occurred)
    • Random variables: This topic defines random variables and cover some of the following concepts in this relation:
      • Types of variables (Discreet, Continuous)
      • Mean, Median, Variance
    • Probability Distributions: This topic, being one of the most important one, covers fundamentals related with different probability distributions that would prove handy while working on different machine learning algorithms.
      • Probability distribution types (Discreet, Continuous)
      • Discreet Probability Distributions: Following are some of the key discreet probability distributions:
        • Binomial, Negative Binomial
        • Poisson
      • Continuous Probability Distributions: Following are some of the key continuous probability distribution examples which would help in evaluating different machine learning algorithms such as linear regression (T-value, F-value), logistic regression (Z-value, Chi-square):
        • Normal distribution
        • Z-distribution
        • T-distribution
        • F-distribution
        • Chi-Square distribution
        • Gamma distribution
    • Sampling theory (Sampling methods such as SRS/Stratified/Cluster, Sampling distribution)
  • Statistics: Following are some of the key topics related with Statistics which will prove very helpful while working with different machine learning algorithms:
    • Quantitative data analysis: Following are some of the key concepts that one may not be able to live without while doing statistical analysis:
      • Mean, Median, Mode
      • Variance
    • Plots: Following are some of the key plots that are useful in understanding patterns of data based on center, spread, shape etc.
      • Histogram
      • Boxplot
      • Scatterplot
    • Estimation (Standard error, Error margin, Confidence intervals)
    • Hypothesis testing: Following are some of the sub-topics that would be covered as part of this topic. Understanding following concepts is key to understanding the evaluation techniques for some of the machine learning models including linear regression, logistic regression etc.
      • Null hypothesis, Alternate hypothesis
      • Type I & Type II error
      • Region of acceptance, Statistical significance, P-value
Ajitesh Kumar
Follow me
Latest posts by Ajitesh Kumar (see all)

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog, Revive-n-Thrive.com
Posted in Big Data. Tagged with , .

Leave a Reply

Your email address will not be published. Required fields are marked *