A histogram, also called a frequency histogram, is just like a bar graph, except that we collect the data into buckets or bins, and then sketch a bar for each bucket. Each bucket needs to be the same size so that they’re capable of holding the same amount of data. A stem-and-leaf plot (also called a stem plot), is just another way to summarize data. It’s similar to a histogram, because both types of charts group together data points, and are good ways to show how many data points fall into a certain category or range.
Read MoreWe’re familiar now with displaying data in a two-way table. But sometimes it’s helpful to express the data in a two-way table as percentages. If we want to express percentages, then we just need to change the two-way table into a relative frequency table, which is a table that shows percentages instead of actual counts.
Read MoreWe looked at measures of central tendency, which we saw were various ways of representing the “middle” of a data set. But central tendency isn’t the only thing we’re interested in when it comes to data. We also want to know about spread, which is how, and by how much, our data set is spread out around its center. We also call measures of spread measures of dispersion, or scatter.
Read MoreWe’ve seen how we can express data in a two-way table, and then translate that data into bar and line graphs. But we can also express data from a two-way table in a different visualization, called a Venn diagram. Venn diagrams help us express the overlap, or not, in categorical data.
Read MoreA joint distribution is a table of percentages similar to a relative frequency table. The difference is that, in a joint distribution, we show the distribution of one set of data against the distribution of another set of data. In this lesson we’ll look at joint, marginal, and conditional distributions.
Read MoreBox-and-whisker plots (also called box plots) are a great way to represent a data set when we want to show the median and spread of the data at the same time. The median is represented by the line through the center of the data, and the first and third quartiles are represented by the edges of the box.
Read MoreA density curve is technically the smooth line that encloses a distribution. We call it a distribution because the area under the curve shows us the distribution of our data. In this lesson we’ll look at distributions with different shapes, like symmetric and normal distributions, and skewed distributions.
Read MoreSometimes we’ll need to find the probability that two events occur together within one experiment. Remember that an event is a specific collection of outcomes from the sample space. For example, what’s the probability that we roll a pair of 6-sided dice and either get at least one 1, or an even sum when we add the dice together? This is when we use the addition rule for probability.
Read MoreWe can do more than just calculate the probability of pulling exactly 3 red marbles in 5 total pulls. For any binomial random variable, we can also calculate something like the probability of pulling at least 3 red marbles, or the probability of pulling no more than 3 marbles.
Read MoreBayes’ theorem, also known as Bayes’ law or Bayes’ rule, tells us the probability of an event, given prior knowledge of related events that occurred earlier. To simplify Bayes’ theorem problems, it can be really helpful to create a tree diagram. If you’re ever having trouble figuring out a conditional probability problem, a tree diagram is a great tool to fall back on, because it shows all of the sample space of the problem.
Read MoreNow we want to look at what happens when we combine two data sets, either by adding them or subtracting them. When we’re combining multiple linear random variables, we can find the mean and standard deviation of the combination using the means and standard deviations of the individual variables.
Read MoreRemember that for a binomial random variable X, we’re looking for the number of successes in a finite number of trials. For a geometric random variable, most of the conditions we put on the binomial random variable still apply: 1) each trial must be independent, 2) each trial can be called a “success” or “failure,” and 3) the probability of success on each trial is constant.
Read MoreIn the last section we talked about the regression line, and how it was the line that best represented the data in a scatterplot. In this section, we’re going to get technical about different measurements related to the regression line.
Read MoreLine graphs are really similar to bar graphs. In fact, to turn a bar graph into a line graph, all you have to do is connect the middle of the top of each bar to the middle of the top of the bar beside it with a straight line, and you’ll form the line graph. Ogives are like cumulative line graphs.
Read MoreRemember that “bi” means two, so a binomial variable is a variable that can take on exactly two values. A coin is the most obvious example of a binomial variable because flipping the coin can only result in two values: heads or tails.
Read MoreShifting the data set by a constant k means adding k to every value in the data set, or subtracting k from every value in the data set. On the other hand, scaling the data set by a constant k means multiplying or dividing every value in the data set by k.
Read MoreThe simplest kind of information we’ll work with in this course is a set of individuals with one or more properties, called variables. The individuals are the items in the data set and can be cases, things, people, etc. When we construct a table, we want to think about whether we have more individuals or more variables. We’ll usually put whichever we have more of down the side of the table, so that the table is taller rather than wider.
Read MoreUp to now we’ve been talking about statistics, which is all about data, and how to display, summarize, and analyze data. Now we’ll transition into probability, which is all about the likelihood of whether or not some event will occur.
The reason we study statistics and probability together is because when we collect data as part of a statistical study, we want to be able to use what we know about probability to say how likely it is that our results are reliable. So in that way, statistics and probability go hand-in-hand.
Read MoreLike the binomial distribution, the Poisson distribution models a discrete random variable, and it’s particularly useful for finding the probability that a specific number of events will occur in a given period of time.
Read MoreTo find RMSE, you’ll find the residual for each data point, then square it. You’ll add up all of those square residuals, and then divide by n-1, just like when we were taking sample standard deviation, instead of population standard deviation. Then you’ll take the square root of that result, and you’ll get the standard deviation of the residuals.
Read More