Posts

Showing posts from July, 2021

Inferential Statistics in Data Science

Image
  Experiment  →Uncertain situations, which could have multiple outcomes. A coin toss is an experiment. Outcome  → result of a single trial. So, if "head" lands, the outcome of the coin toss experiment is “Heads” Event  → one or more outcomes from an experiment. “Tails” is one of the possible events for this experiment. Basic Probability Chance of something happening, but in the academic term “likelihood of an event or sequence of events occurring”. for example Tossing a coin Rolling a dice Conditional Probability Probability of an event occurring given that another event has already occurred. for example Picking 3 blue balls from a box has 5 red and 5 blue balls. The probability of picking the first blue ball is 5/10 = 1/2. We’re left with 9 balls in total. So the probability of picking the second blue ball is 4/9. Similarly picking the 3rd blue ball from the box is 3/8. The final probability is 1/2 * 4/9 * 3/8 = 0.08333 or 8.3%. Probability Density function and Prob...

Inferential Statistics in Data Science

Image
  Experiment  →Uncertain situations, which could have multiple outcomes. A coin toss is an experiment. Outcome  → result of a single trial. So, if "head" lands, the outcome of the coin toss experiment is “Heads” Event  → one or more outcomes from an experiment. “Tails” is one of the possible events for this experiment. Basic Probability Chance of something happening, but in the academic term “likelihood of an event or sequence of events occurring”. for example Tossing a coin Rolling a dice Conditional Probability Probability of an event occurring given that another event has already occurred. for example Picking 3 blue balls from a box has 5 red and 5 blue balls. The probability of picking the first blue ball is 5/10 = 1/2. We’re left with 9 balls in total. So the probability of picking the second blue ball is 4/9. Similarly picking the 3rd blue ball from the box is 3/8. The final probability is 1/2 * 4/9 * 3/8 = 0.08333 or 8.3%. Probability Density function and Prob...

Descriptive Statistics in Data Science

Image
Measure of Central Tendency Measure of Spread Dependence Measure of Central Tendency: Mean  → Average of a set of data points. Median  → Middle element of data points which are sorted in ascending order. Mode   →  A data point that appeared the most number of times out of a set of data points. Measure of Spread: Standard Deviation (SD)  → Average distance between mean and each data points. Variance  → Measure of how far each value in the data set is from the mean (Square of SD). Range →  Maximum value minus Minimum value from a set of data points. Percentile  → Representation of position of a value in a dataset (dataset should be sorted in ascending). Quartiles   (Q1, Q2, Q3)  → Divide a complete data set into 4 Quarters (dataset should be sorted in ascending). Q1, Q2, and Q3 are the 25, 50, and 75 percentile of the dataset. Q2 is the median value of the dataset (fig 1). fig 1: Quartiles and Percentiles Interquartile Range (IQR)  → ...

Hypothesis Testing in Data Science

Image
Other names are AB testing, Confirmative Analysis, and significance testing. Generally, population parameters (standard deviation, maximum, minimum, and so on) are unknown in real-time. However, we do have hypotheses about what the true values are. Hypothesis testing is a bunch of methods to evaluate the hypothesis about the population parameter based on the available sample parameters. There are 2 terms in the hypothesis, they are  null hypothesis  and  alternate hypothesis. Null Hypothesis (H0): A general statement about the population parameters which assumed to be true unless strong proof for the opposite statement. The default statement is that there is no difference between the measured phenomenon, there is an association among groups. Alternate Hypothesis (H1 or Ha): Just the opposite of the null hypothesis. the default statement is that there is a difference between the groups. Testing types: T-Test Also called Student’s T-Test. If the sample size is less than 30 ...