WEBVTT
1
00:00:00.000 --> 00:00:06.640
Cumulative frequency curves. This screencast assumes that you know what a
2
00:00:06.640 --> 00:00:10.680
grouped frequency distribution is and that you're familiar with terms like
3
00:00:10.680 --> 00:00:17.040
range and median. I will show how to find the cumulative frequencies from the
4
00:00:17.040 --> 00:00:23.560
frequency table, how to write down the upper glass boundaries, how to plot the
5
00:00:23.560 --> 00:00:28.440
cumulative frequency curve. Then I'll share how to find the median from the
6
00:00:28.440 --> 00:00:32.520
cumulative frequency curve and also how to find the upper and lower
7
00:00:32.520 --> 00:00:37.920
quartiles and finally we'll calculate the inter-cortile range and I'll try and
8
00:00:37.920 --> 00:00:44.240
explain what it means. Cumulative frequency is simply the running total of the
9
00:00:44.240 --> 00:00:49.640
frequency. To explain what I mean let's look at a concrete example. Here's some
10
00:00:49.640 --> 00:00:56.120
data on the results from some crest seedlings grown on soil. There were seven
11
00:00:56.120 --> 00:01:01.640
crest seedlings between 40 and 45 millimeters in height. To find the
12
00:01:01.640 --> 00:01:06.520
cumulative frequencies you just add a row for the cumulative frequencies. The
13
00:01:06.520 --> 00:01:11.200
first cumulative frequency is seven because nothing plus seven. The next
14
00:01:11.200 --> 00:01:16.920
cumulative frequency is 19 because that's seven plus 12. The third
15
00:01:16.920 --> 00:01:22.440
cumulative frequency is 7, 12 and 15 which gives 34 but you could work that
16
00:01:22.440 --> 00:01:27.080
out by taking 15 and adding it to 19 which itself is the total of the
17
00:01:27.080 --> 00:01:34.680
previous two numbers. And finally just add three to the 37. Sorry just add three
18
00:01:34.680 --> 00:01:40.280
to the 34 to get 37. That 37 should actually be equal to the total of the
19
00:01:40.280 --> 00:01:46.000
frequencies and that's a way of checking the result. Next we need to identify
20
00:01:46.000 --> 00:01:49.400
upperclass boundaries as that's what we plot the cumulative frequencies
21
00:01:49.400 --> 00:01:54.280
against when you draw the cumulative frequency curve. The upperclass
22
00:01:54.280 --> 00:01:58.880
boundary of the first interval is the largest value that can be allocated to
23
00:01:58.880 --> 00:02:03.440
the first interval. Now you see how it's x smaller than or equal to 45. That
24
00:02:03.440 --> 00:02:08.960
makes the first upperclass boundary 45. By the same token the next upperclass
25
00:02:08.960 --> 00:02:17.320
boundary is 50. The next one is 55 and then the final one is 60. These
26
00:02:17.320 --> 00:02:20.080
intervals are written in a particularly straightforward fashion. Some
27
00:02:20.080 --> 00:02:23.280
intervals can be written in such ways to make it slightly harder to work out
28
00:02:23.280 --> 00:02:29.000
the upperclass boundaries. There are examples of that on the handout. Now we
29
00:02:29.000 --> 00:02:32.320
need to actually plot the cumulative frequencies against the upperclass
30
00:02:32.320 --> 00:02:37.000
boundaries to be able to draw the curve. This is what the data we're plotting
31
00:02:37.000 --> 00:02:40.920
looks like. You don't need to draw a new table in exams. There's normally a
32
00:02:40.920 --> 00:02:45.120
table you fill in. The upperclass boundaries plotted on the horizontal axis
33
00:02:45.120 --> 00:02:50.680
and the cumulative frequency is always plotted on the vertical axis. A
34
00:02:50.680 --> 00:02:55.640
suitable graph scale might look like this. Notice how the height goes from 35
35
00:02:55.640 --> 00:02:59.960
up to 60 millimeters because that suits the data but the cumulative frequency
36
00:02:59.960 --> 00:03:06.720
axis always starts at zero. To plot the first point I go along to 45
37
00:03:06.720 --> 00:03:12.160
millimeters on the horizontal axis and up to seven on the vertical axis. I'm
38
00:03:12.160 --> 00:03:16.800
using enormous blobs to get over the YouTube resolution. You'd be using small
39
00:03:16.800 --> 00:03:24.280
neat crosses. Along to 50 and up to 90 and gives us the second blob. Along to 55
40
00:03:24.280 --> 00:03:31.600
and up to 34 gives us the third blob and the last blob is 60 up to 39. It's
41
00:03:31.600 --> 00:03:39.800
actually 37. Then we add a point at 40 and zero cumulative frequency. I chose
42
00:03:39.800 --> 00:03:45.600
40 millimeters as the zero point because that's 45 minus 5 or an interval
43
00:03:45.600 --> 00:03:51.240
width of 5 before the first upperclass boundary. This extra point provides us
44
00:03:51.240 --> 00:03:56.320
with a starting point for the curve. Finally you draw a smooth curve through the
45
00:03:56.320 --> 00:04:03.920
points. Don't join the points dot to dot. I'm now going to remove my big orange
46
00:04:03.920 --> 00:04:10.960
blobs. You can see the smooth S-shaped curve clearly. Most data will give you
47
00:04:10.960 --> 00:04:17.320
an S-shaped curve of some kind. Once you have a cumulative frequency curve you
48
00:04:17.320 --> 00:04:22.960
can use it to read off various values including the media. Remember that the
49
00:04:22.960 --> 00:04:28.000
median is the value of the middle data item. Now 37 was our total frequency and
50
00:04:28.000 --> 00:04:36.680
37 divided by 2 is 18.5. You can see that from this table of the data. So we
51
00:04:36.680 --> 00:04:42.280
can find 18.5 on the cumulative frequency axis. Then we draw a line along
52
00:04:42.280 --> 00:04:48.280
until it meets the curve. Then draw a line down until it crosses the horizontal
53
00:04:48.280 --> 00:04:53.640
axis. You read off the value on the horizontal axis. I estimated to be 49
54
00:04:53.640 --> 00:05:00.080
millimeters. Remember it is the value on the horizontal axis corresponding to
55
00:05:00.080 --> 00:05:04.160
half the frequency that gives the median. Don't write down half the frequency
56
00:05:04.160 --> 00:05:09.240
and pretend it's the median. That's a very common mistake in tests. Having found
57
00:05:09.240 --> 00:05:12.920
the median and we can go on and find the quartiles. Quartiles as the name
58
00:05:12.920 --> 00:05:18.960
suggests have to do with quarters. The quartiles can give you some information on
59
00:05:18.960 --> 00:05:22.960
how spread out your data is and they can also tell you something about the
60
00:05:22.960 --> 00:05:30.080
middle 50% of your data items. The lower quartile tells us what value 25% of the
61
00:05:30.080 --> 00:05:35.880
sample are less than and therefore 70% more than. The upper quartiles just the
62
00:05:35.880 --> 00:05:40.160
reverse of that. It tells us what value 75% of the sample are less than and
63
00:05:40.160 --> 00:05:50.040
therefore what the top 25% reached to. There are 37 crest seeds in total so you
64
00:05:50.040 --> 00:05:59.360
can probably work out what a quarter and three quarters of 37 is. A quarter is
65
00:05:59.360 --> 00:06:07.920
9.25 and three quarters is 27.75. So to recap we now use the graph to find what
66
00:06:07.920 --> 00:06:12.200
lengths or heights of the seedlings correspond to cumulative frequencies of
67
00:06:12.200 --> 00:06:21.400
9.25 and 27.75. To find the lower quartile here's our blank curve again. To find
68
00:06:21.400 --> 00:06:25.760
the lower quartile just draw that line along from 9.75 until it meets the
69
00:06:25.760 --> 00:06:31.680
curve and drop a line down to the x axis. I make it to be 45 millimeters
70
00:06:31.680 --> 00:06:40.040
or less. In the same way to find the upper quartile you draw a line across from
71
00:06:40.040 --> 00:06:45.120
27.75 on the vertical cumulative frequency axis until it meets the curve and
72
00:06:45.120 --> 00:06:50.360
you drop a line down and read off the corresponding upper quartile value. I
73
00:06:50.360 --> 00:06:58.680
make that to be 52 millimeters to the nearest one. The inter-cortile range as
74
00:06:58.680 --> 00:07:03.480
the name suggests it's simply the upper quartile minus the lower quartile and
75
00:07:03.480 --> 00:07:07.840
it tells you how wide your frequency distribution would be if you plotted a
76
00:07:07.840 --> 00:07:12.760
histogram or bar chart. The inter-cortile range is just a difference between
77
00:07:12.760 --> 00:07:17.840
the upper and lower quartile. Recapping our values so far we've got
78
00:07:17.840 --> 00:07:25.200
median is 49 millimeters, lower quartiles 45 upper quartiles 52 so the inter-cortile
79
00:07:25.200 --> 00:07:33.720
range is 52 minus 45 or 7 millimeters. The inter-cortile range tells you how the
80
00:07:33.720 --> 00:07:38.720
middle 50% of your distribution spread out so you know that half the crests,
81
00:07:38.720 --> 00:07:48.560
seedlings lie in the range 45 up to 52 millimeters in height or length. You can
82
00:07:48.560 --> 00:07:53.240
also use the inter-cortile range to compare distributions for spread. There'll be
83
00:07:53.240 --> 00:07:59.760
more about comparing distributions in a later screencast. In summary, if you're
84
00:07:59.760 --> 00:08:04.800
asked to draw a cumulative frequency chart in the exam, you need to calculate a
85
00:08:04.800 --> 00:08:09.840
running total of the frequencies, plot those against the upper class boundaries
86
00:08:09.840 --> 00:08:14.920
which you've identified correctly. Draw a smooth curve through the points. That
87
00:08:14.920 --> 00:08:17.800
curve should always be going up, it should be an S-shaped going up, it should
88
00:08:17.800 --> 00:08:21.520
never come back down again. If it comes back down again you've probably plotted
89
00:08:21.520 --> 00:08:28.640
the frequencies by mistake. You can read off the median value from your graph,
90
00:08:28.640 --> 00:08:32.280
remember to use the value corresponding to the middle of the sum of the
91
00:08:32.280 --> 00:08:37.200
frequencies, not the middle of the sum of the frequencies itself. Then you can
92
00:08:37.200 --> 00:08:41.720
calculate the lower and upper quartiles and you can use that to find the
93
00:08:41.720 --> 00:08:46.160
inter-cortile range by subtracting. The inter-cortile range gives you some
94
00:08:46.160 --> 00:08:51.600
information on how spread out your data is. Your turn, if you download the hand
95
00:08:51.600 --> 00:08:56.400
act that goes with this screencast, you'll find a range of different kinds of
96
00:08:56.400 --> 00:09:02.360
frequency table, so you can practice finding the upper class boundaries. Any GCSE
97
00:09:02.360 --> 00:09:07.720
textbook will also have piles of revision examples for this topic. And that's the
98
00:09:07.720 --> 00:09:17.720
end of this screencast.