WEBVTT
1
00:00:00.000 --> 00:00:04.600
We're going to talk about a whole selection of different ways to solve recurrence equations.
2
00:00:04.600 --> 00:00:15.000
You're going to add them to your set of tools and you'll be able to solve pretty much anything that will come up in most computer science applications.
3
00:00:15.000 --> 00:00:27.400
Recurrents equations come up most often in algorithms and they come up most often in particular kinds of algorithms where you split the problem into two halves or three thirds where you divide it.
4
00:00:27.400 --> 00:00:33.400
Not where you break it up into n and minus one and one or n minus two and two. That's usually a bad way to break up a problem.
5
00:00:33.400 --> 00:00:42.400
The best way is to divide it. That makes the algorithms generally faster and those kind of algorithms which are recursive algorithms that split a problem into two different fractions.
6
00:00:42.400 --> 00:00:51.400
Those are called dividing conquer algorithms. There's a whole two or three weeks we'll talk about this in algorithms and we'll talk about how to solve these recurrence equations.
7
00:00:51.400 --> 00:01:02.400
When I get up to it in algorithms I'll say, hey, remember how to do this. You'll say no but I'll be able to review it in five times the speed as you've never seen it before.
8
00:01:02.400 --> 00:01:17.400
That's the idea. You're going to see it now first. All right, questions so far. Let me give you the big picture. I'm going to write down all the different methods that I'm going to go through with you to solve recurrence equations in the order that we're going to do them and the order that we have done them.
9
00:01:17.400 --> 00:01:34.400
These are methods for solving recurrence equations. In doing this today we're going to come up with a lot of algebra. Don't get two boggans in the algebra. If you lose track of the algebra, don't lose track of the technique that's behind it.
10
00:01:34.400 --> 00:01:43.400
Methods for solving recurrence equations. One, the one that I call repeated substitution.
11
00:01:43.400 --> 00:01:59.400
You've got to be good at adding sums and knowing what sums add up to.
12
00:01:59.400 --> 00:02:05.400
I call this the master theorem. It's kind of developed that name over the last few years because people have called it that in a book.
13
00:02:05.400 --> 00:02:15.400
The first time I ever saw this theorem, it wasn't called the master theorem. I'm not sure anybody really knows who created this. I think it just kind of grew with lots of different people contributing to it. I'll tell you what this means.
14
00:02:15.400 --> 00:02:33.400
But this is basically somebody went ahead picked a whole class of recurrence equations, did repeated substitution and wrote down and proved what the general complexity of the answer was based on things that you could look at before you bothered even doing the substitution.
15
00:02:33.400 --> 00:02:41.400
So it's a shortcut for bothering to do all the sums and it takes care of a whole whole class of problems that you normally would have to do with method one.
16
00:02:41.400 --> 00:02:47.400
I'll talk about that later. This is usually used for the class of algorithms called dividing conquer.
17
00:02:47.400 --> 00:02:52.400
This is the one that you see most in computer science. That's why they call it the master theorem in algorithms.
18
00:02:52.400 --> 00:02:56.400
Next method.
19
00:02:56.400 --> 00:03:10.400
A change of variables. Sometimes when you change the variables of a recurrence equation, you take something that looked ugly at first and then it's something that's a lot easier to work with.
20
00:03:10.400 --> 00:03:18.400
You can solve the second thing that you came up with and then convert it back to a solution in the original thing by reversing the change of variables.
21
00:03:18.400 --> 00:03:36.400
And I'll show you a couple of examples of that. For guess the answer, improve by induction.
22
00:03:36.400 --> 00:03:44.400
I'm going to check the ones that concentrate primarily not on the perfect closed form for the answer, but on the overall complexity.
23
00:03:44.400 --> 00:03:51.400
The master theorem and guessing the answer improving it by induction is guessing the overall complexity improving it by induction.
24
00:03:51.400 --> 00:03:57.400
So I have a recurrence equation. I don't know the slightest idea would it would sum up to, but I have a good feeling that's a linear time recurrence equation.
25
00:03:57.400 --> 00:04:04.400
I think it's order n. So I go ahead and guess that and I can prove it's true by induction, even though I don't have to actually calculate the sum.
26
00:04:04.400 --> 00:04:12.400
So that's a very nice powerful technique. There's two more.
27
00:04:12.400 --> 00:04:21.400
There's a whole class of recurrence equations that don't come up that much in computer science. Come up less frequently than to divide and conquer, but do come up never the last.
28
00:04:21.400 --> 00:04:27.400
And there a class that is completely well understood and has a mathematical theory just like differential equations has a mathematical theory.
29
00:04:27.400 --> 00:04:35.400
And we'll go through this. This will be the standard kind of calculus stuff where you just learn mechanics and you do a few examples and you repeat them.
30
00:04:35.400 --> 00:04:42.400
This is relatively easy from a math point of view. You don't have to discover much. You learn the technique and repeat it.
31
00:04:42.400 --> 00:04:47.400
And it's called linear homogenous equations.
32
00:04:47.400 --> 00:05:08.400
Close enough. All right. All right. All right. Linear homogenous equations. And then finally, as if you thought you'd never see it again.
33
00:05:08.400 --> 00:05:20.400
You can solve recurrence equations by doing diagonalization. So you all are going to have to go back and remember how to do that.
34
00:05:20.400 --> 00:05:30.400
And I'll do one example of this here in class. We're actually going to solve the Fibonacci recurrence equation using linear algebra like you did last semester.
35
00:05:30.400 --> 00:05:36.400
Two semesters ago. All right. So here are the methods. We've done this one. Check.
36
00:05:36.400 --> 00:05:43.400
We're going to do this one right now. And if I have time today, I'm going to go through these two because the ideas there are easy.
37
00:05:43.400 --> 00:05:52.400
And then on Monday, I'm going to try to finish these two. And then Tuesday, I can move on to some different things. That's my goal.
38
00:05:52.400 --> 00:05:57.400
Worst cases we get through the end of this Tuesday. Okay. That's our big picture for the next couple of days.
39
00:05:57.400 --> 00:06:02.400
And you'll get all the intuition behind the proof that you ever wanted through these two examples.
40
00:06:02.400 --> 00:06:09.400
So in some sense, it'll be proof by example, which is no proof at all. But if the example is general enough, it really is a proof.
41
00:06:09.400 --> 00:06:19.400
So here are the two examples that we'll start with. We'll actually make it three. Here's one.
42
00:06:19.400 --> 00:06:34.400
Here's two. These are dividing conquer algorithms that turned up.
43
00:06:34.400 --> 00:06:59.400
These are dividing conquer algorithms that turned up with these recurrence equations. You can tell they're dividing conquer because the smaller case is a fraction of what you started with rather than something subtractive from it.
44
00:06:59.400 --> 00:07:07.400
Because it's divide by two. Divide by two. Could be divide by three or divide by ten. There's a lot of other things it could be. But in these cases, it's divide by two.
45
00:07:07.400 --> 00:07:16.400
The numbers in front here for dividing conquer algorithms need to be constants. Not ends. Otherwise, it's just too many subproblems.
46
00:07:16.400 --> 00:07:24.400
So you'd like to keep this to two and three. And then there's this combining factor, which is the amount of time it takes to take your subproblems and combine them together.
47
00:07:24.400 --> 00:07:34.400
So if you're doing recursion, these represent the recursive calls. And this represents the amount of work your algorithm does in between recursive calls or before or after the recursive call.
48
00:07:34.400 --> 00:07:43.400
Keep in mind, every recursive algorithm you write has a recurrence equation associated with it that says how much time it takes.
49
00:07:43.400 --> 00:07:57.400
Every recursive algorithm you write has a proof by mathematical induction that it actually does what it's supposed to do. You assume that the smaller case does what it's supposed to do. And your connecting code, make sure the bigger case does what it's supposed to do.
50
00:07:57.400 --> 00:08:07.400
So induction is the proof on any recursive algorithm. Recurrents equation is the time complexity in any recursive algorithm. They all go together. And these are all dividing conquer algorithms.
51
00:08:07.400 --> 00:08:21.400
The situation that then is not even. You're combining yourself into the K. Yes, for the purposes of doing the computational confine ourselves the powers of two, but it won't actually affect the overall complexity.
52
00:08:21.400 --> 00:08:32.400
We could just put floors and ceilings and it will just look messier. But let's assume that our size of our thing is always two to the case so we can just go ahead and do n over 2n over 4 and never worry about fractions.
53
00:08:32.400 --> 00:08:44.400
That's a good question. All right. This comes from a real life algorithm. Actually, so does that, but this one's very well known. This comes from an algorithm sometimes called merge sort.
54
00:08:44.400 --> 00:08:57.400
In merge sort, you're going to sort a whole bunch of elements. Okay. And here's how I do it. I split up in half. I give half to Teresa. And I say, go sort this for me, please.
55
00:08:57.400 --> 00:09:08.400
And I give half the Tony. I say, go sort this for me, please. They each sort it. They come back to me. And now I have two sorted lists each half the size. And what do I do with them?
56
00:09:08.400 --> 00:09:19.400
I get I give them Michael. And I say Michael merge these for me. And he takes these two sorted lists and he merges them, which is a much faster process than sorting because merging.
57
00:09:19.400 --> 00:09:38.400
It's kind of just like there. You compare, you keep a pointer at the top and whichever one's bigger goes interleaving the cards as you go. It's proportional to the time that it takes to run through both lists. It takes linear time to merge.
58
00:09:38.400 --> 00:09:55.400
So this is the merging step. And here are the two recursive calls each half the size. This comes from merge sort. There's lots of lots of algorithms that have the same recurrence equation.
59
00:09:55.400 --> 00:10:04.400
Here's another one with the same recurrence equation. We'll call this I made it up on the spot sort. All right. We take a list of numbers. We want to sort it.
60
00:10:04.400 --> 00:10:14.400
So I find the median. That's the number. That's that's the middle number. Not the average the middle number. I take all the numbers that are bigger than that.
61
00:10:14.400 --> 00:10:23.400
And I give them the Peter and I say sort these for me. I take all the numbers that are equal or smaller than that. And I give them the neon and I say sort these for me.
62
00:10:23.400 --> 00:10:31.400
And they come back with sorted list. And what do I do with them? I just stick them together and I'm done. So where's the work? Still two problems.
63
00:10:31.400 --> 00:10:45.400
Each one is half the size. Still this is the same. But where is this income from? Right. Finding the median. It's the work that I have to do at the beginning to find that middle that middle spot to find that center element.
64
00:10:45.400 --> 00:10:58.400
It's not obvious at all how to do that in linear time. But you can do it in linear time. It will clever algorithm does it in linear time. So that's another algorithm that that has the same recurrence equation.
65
00:10:58.400 --> 00:11:06.400
We're going to solve this see what it equals. You should still know how to do what we're about to do. But it would help. It would help to be able to short circuit it if you can.
66
00:11:06.400 --> 00:11:15.400
We're going to do the same thing here. I'm not going to talk about algorithms anymore. But this comes from a problem where you can split it into halves. But you have to give it to three different people.
67
00:11:15.400 --> 00:11:27.400
It's a little bit like that towers of annoy that doesn't work problem right split it into half to three separate. Half towers of annoy and then do a little work connecting them.
68
00:11:27.400 --> 00:11:39.400
Okay. Let's do this one first. The technique again is to get rid of these capital T's. I'm going to do this just enough to get a pattern and then we're going to write down what our pattern is.
69
00:11:39.400 --> 00:11:49.400
So 2tn over 2 is 2tn over 4 plus n over 2. That's using this equation again in order to get a value for tn over 2.
70
00:11:49.400 --> 00:11:59.400
And now we're going to take that value and substitute it in place of tn over 2 and get a new equation for t event. It's going to be 2 times all of this.
71
00:11:59.400 --> 00:12:15.400
So it's 2 squared tn over 4 plus n plus 2 and divided by 2. That's our pattern.
72
00:12:15.400 --> 00:12:27.400
Okay. So far. If I do it many many times, who thinks they can see the pattern already? Anybody see the pattern already? I know Sam sees the pattern already.
73
00:12:27.400 --> 00:12:36.400
Even though he's not telling me. Who else sees the pattern? Who thinks they know it? Christy, you know it?
74
00:12:36.400 --> 00:12:48.400
If you do it r times, this becomes 2 to the r t and over 2 to the r. And what does a sum look like?
75
00:12:48.400 --> 00:13:03.400
Well, let's try to leave in all the details we can. So n plus 2n over 2 plus how far does it go? What happens next? Am I messing up?
76
00:13:03.400 --> 00:13:21.400
No, I just need 2 squared n over 2. 2 squared n over 2 or 2 squared n over 2 squared. Let's do it maybe.
77
00:13:21.400 --> 00:13:33.400
It is tony your way ahead of me. Right. If you do this twice you got 2 n's. If you do it three times you're going to have 3 n.
78
00:13:33.400 --> 00:13:43.400
So, let's actually rewrite this. Easy. 2 to the r and over 2 to the r plus r times n.
79
00:13:43.400 --> 00:13:54.400
And we'd like to make this t disappear. How do we make this t disappear? We need n divided by 2 to the r to be equal to 1.
80
00:13:54.400 --> 00:14:04.400
That means n equals 2 to the r and r equals log base 2 of n. There's a convention that if I mean base 2 I'm just going to write lg.
81
00:14:04.400 --> 00:14:16.400
It saves me from writing the little 200 e-phobotite. So r equals log base 2 of n. Let's put it in. This becomes 1.
82
00:14:16.400 --> 00:14:40.400
2 to the log base 2 of n is n. So when r equals log of n I get n plus n log n. And that's big theta of what?
83
00:14:40.400 --> 00:14:55.400
Big theta n log n because the n part is smaller than the n log n part. You should know that theorem that if you have the sum of a lot of different functions then the big theta of the whole sum is big theta of the biggest one.
84
00:14:55.400 --> 00:15:00.400
You can prove that theorem and you probably know it intuitively already. Just reminding you.
85
00:15:00.400 --> 00:15:10.400
This is big theta n log n. And that's no surprise because as you know sorting algorithms that are good generally take n log n. That's the best you can really do.
86
00:15:10.400 --> 00:15:17.400
You just said that you only have to look at your biggest term when you have a series.
87
00:15:17.400 --> 00:15:38.400
If you have a sum I should say a finite well. You're talking about this. Or this. And you're saying this is big theta n squared and this is big theta n cubed.
88
00:15:38.400 --> 00:15:55.400
And well. Good point. I wasn't talking about things like this. I was talking about things that look like this.
89
00:15:55.400 --> 00:16:09.400
Because these things are not in terms of n. So there's an implication here that there's something else going on. But good. Right. OK. Questions about this n log n example.
90
00:16:09.400 --> 00:16:21.400
Try to imagine that there was an n squared here also because that's the third example I want to do. So that when we get to that third example we don't have to go through the whole thing from scratch but we can change the appropriate places more quickly.
91
00:16:21.400 --> 00:16:28.400
We're going to do one example here with n. Then we're going to do an example here with n squared. OK.
92
00:16:28.400 --> 00:16:40.400
Here's the first step is before T n over 2 here. Let me get some help. What's T n over 2 going to be.
93
00:16:40.400 --> 00:16:51.400
We're going to substitute in for this so we need to know what T n over 2 is. 3 T n over 4 plus n over 2.
94
00:16:51.400 --> 00:17:01.400
Every place you see an n you put in your n over 2. And now we're going to take this value and put it back into here.
95
00:17:01.400 --> 00:17:10.400
So Anthony it's good to see you here instead of watching me virtually. Find me what I get here.
96
00:17:10.400 --> 00:17:34.400
Good. I'm waiting. This n over 2 also gets multiplied by that 3. So you get n over 2 plus n plus 3 n over 2. Good.
97
00:17:34.400 --> 00:18:03.400
What happens if I did it again? What happened? 3 cubed T n over 2 cubed plus n plus 3 n over 2 plus 3 n over 2 plus 3.
98
00:18:03.400 --> 00:18:21.400
This new term comes from the fact that we multiply these old terms by 3 over 2. So this became 3 n over 2. This became 3 over 2 squared n.
99
00:18:21.400 --> 00:18:28.400
And then we had the original n from our equation. The next time it would multiply each of these again by 3 over 2.
100
00:18:28.400 --> 00:18:40.400
So let's write down our general result here in terms of r. T over n is 3 to the r T n divided by 2 to the r plus n.
101
00:18:40.400 --> 00:18:47.400
And I'm going to factor out the n because it appears in every place exactly once.
102
00:18:47.400 --> 00:18:58.400
Let's write down what we get. 1 plus 3 halves plus 3 halves squared all the way up to what's the last term.
103
00:18:58.400 --> 00:19:15.400
This exponent here is 1 less than the exponent there. So we go all the way up to 3 halves to the r minus 1. Good. That's an r.
104
00:19:15.400 --> 00:19:22.400
All right. So your geometric series experts should be able to add this up.
105
00:19:22.400 --> 00:19:33.400
It goes up to the next exponent, 3 halves to the r. And you divide by the difference between 1 and 3 halves.
106
00:19:33.400 --> 00:19:54.400
The difference between 1 and 3 halves is 1 and 3 halves is 1 and 3 halves. You divide by 1 and 3 halves. That's multiplying by 2. So this whole part here is 2 times 3 halves to the r minus this.
107
00:19:54.400 --> 00:20:06.400
If you're not sure where this turned into this, we've done it a few times and you can go back and review it separately.
108
00:20:06.400 --> 00:20:14.400
Make a note to yourself. Review geometric series. The trick is to call this x. Multiply both sides by 3 halves.
109
00:20:14.400 --> 00:20:24.400
Do a subtraction. Solve for x and you'll get this. You could all do it. I know if you had the time and focused on it.
110
00:20:24.400 --> 00:20:33.400
Most. All right.
111
00:20:33.400 --> 00:20:43.400
So here again, we want r to be equal to log n. Same thing as before. Because we want this to turn into t of 1 and you want the t of 1 to disappear.
112
00:20:43.400 --> 00:21:00.400
So you get r as log n. 3 to the log n plus n times 2 to the 3 halves log n minus 1.
113
00:21:00.400 --> 00:21:07.400
So you can just assume the logs are always 2 or... Yeah, LG means base 2, but they are all on base 2. Definitely. They're definitely in base 2.
114
00:21:07.400 --> 00:21:17.400
And they're all relative to each other or constant, so it probably won't matter.
115
00:21:17.400 --> 00:21:30.400
Something I mentioned yesterday, I'm going to prove it right now in a couple of seconds and remind you of that you learned it before a couple of months ago.
116
00:21:30.400 --> 00:21:39.400
This is a good thing to know that if you have a number, raise to the log 2 to some other number, then you can basically swap these two and they're identical.
117
00:21:39.400 --> 00:21:50.400
So how do I convince you this is true? I'm going to resort to something else that I think you will remember. You probably all remember this because I made a big deal about it.
118
00:21:50.400 --> 00:22:06.400
What is log base a of b times log base b of c? Anybody remember that identity? Log base a of c. The b's in some ways cancel. That's an a.
119
00:22:06.400 --> 00:22:14.400
Good. You can prove this with the rules of log or them, so you can just remember it. But this is going to immediately imply that this is true.
120
00:22:14.400 --> 00:22:23.400
And there's a lot of ways to see it. This is one way. Let's look at these equations and take the log base 3 of both sides.
121
00:22:23.400 --> 00:22:29.400
The reason we're going to do that is because log base 3 of 3 to the something equals just the exponent.
122
00:22:29.400 --> 00:22:38.400
So take the log base 3 of this left side. What do you get? Log base 2 of n. Taking the log base 3 of both sides.
123
00:22:38.400 --> 00:22:46.400
The log base 3 here is log base 3 of n to the log base 2 of 3.
124
00:22:46.400 --> 00:23:02.400
Question mark. The log of something raised to a power is the log base 2 of 3 times log base 3 of n.
125
00:23:02.400 --> 00:23:13.400
Okay, I just brought the exponent out in front. Log 2, 3. Log 3 n equals log 2 n. Yes, that's true.
126
00:23:13.400 --> 00:23:22.400
The 3's cancel, yes, in a way. So the fact that these two are equal depends completely on the fact that this equation is true.
127
00:23:22.400 --> 00:23:31.400
And this equation is true because you all remember that from two months ago. Whatever you remember is true.
128
00:23:31.400 --> 00:23:39.400
Get used to the fact that when you see stuff like this, you can switch the numbers. I'm completely serious because you want to think of it notation wise.
129
00:23:39.400 --> 00:23:46.400
The fact that you can switch these numbers is going to make your manipulation fast. Don't go into the tank and rediscover this every single time to use this.
130
00:23:46.400 --> 00:23:52.400
Make sure that if you have to convince yourself it's true you can. But once you do, turn that into a black box.
131
00:23:52.400 --> 00:24:00.400
Turn the neurons off as far as why it works. And just start thinking about this mechanically. Get it to be automatic.
132
00:24:00.400 --> 00:24:08.400
Because then you'll recognize patterns. So now we're going to go back to here and try to put our automatic gears in gear and use this information.
133
00:24:08.400 --> 00:24:15.400
Let's see. 3 to the log base 2n becomes what?
134
00:24:15.400 --> 00:24:26.400
N to the log 3. Well that's something that looks reasonable. N to some number. You can do log base 2 of 3 and your calculator. You can figure out what it is.
135
00:24:26.400 --> 00:24:41.400
Plus 2n times what is this going to be? N to the log 3 has.
136
00:24:41.400 --> 00:24:58.400
If you're just looking for large O you can now drop the light of swapping. What about these two? Which one is going to be bigger?
137
00:24:58.400 --> 00:25:19.400
The same? Let's check. Well we'll check it in a minute. Before we check it, I want you to notice the following thing that happens when you do these problems.
138
00:25:19.400 --> 00:25:31.400
You'll always get two pieces in your final answer. You get the piece that came from this part and you get the piece that came from this part.
139
00:25:31.400 --> 00:25:40.400
This is the part that winds up your recursions that keeps track of the structure and this is the part of all the work that it takes to glue those pieces back together.
140
00:25:40.400 --> 00:25:53.400
Sometimes the gluing is the dominant factor. I do a lot of work when all my friends come back and give me the answers and if I include all the gluing that their friends did when they got their answers it all adds up to this.
141
00:25:53.400 --> 00:26:03.400
Sometimes the gluing is trivial and nobody does anything except just pass results back and then the actual construction of the tree and how many people are actually involved becomes the dominant factor.
142
00:26:03.400 --> 00:26:09.400
This is the number of people and the number of recursive calls and the structure of it. This is the gluing that gets done at each step.
143
00:26:09.400 --> 00:26:17.400
The master theorem is a way of deciding which one of these is dominant if either one and then quickly deciding what the big theta should be.
144
00:26:17.400 --> 00:26:22.400
We're doing it by substitution here. We haven't hit the master theorem yet but we'll hit it soon.
145
00:26:22.400 --> 00:26:25.400
We're going to compare these two in this case and see which one is bigger.
146
00:26:25.400 --> 00:26:29.400
So this is n to the log 3 and let's simplify this a little bit.
147
00:26:29.400 --> 00:26:44.400
Sam's right, we don't need the minus one. That's noise in the big theta picture. This is 2 to the n times n to the log 3 half. So it's 2 to the n log 3 over 2 plus 1.
148
00:26:44.400 --> 00:27:05.400
Okay. Log 3 over 2 is log 3 minus log 2. Log 2 is equal to 1. So minus 1 plus 1 is going to cancel and these two are exactly the same.
149
00:27:05.400 --> 00:27:14.400
So what's the overall complexity of this algorithm? Neither one dominates. They're both the same. We know what the master theorem is going to tell you to do in these situations.
150
00:27:14.400 --> 00:27:23.400
It's going to tell you, hey, forget about this sum. Why bother adding it together if this is going to be the same thing? Where did this come from?
151
00:27:23.400 --> 00:27:32.400
Here's the original equation.
152
00:27:32.400 --> 00:27:44.400
Where did that number come from? Came from there. Where did that number go? It's in the base here. It just disappeared. But it's supposed to be there. It's log base 2.
153
00:27:44.400 --> 00:27:55.400
If you knew that this was going to be the dominant factor, you could just say the complexity overall is n to the log this base of this number.
154
00:27:55.400 --> 00:28:04.400
That's what the master theorem does. It checks it all in advance and it says in this kind of situation, you can just look at these two numbers and that's your big theta.
155
00:28:04.400 --> 00:28:12.400
And that's what happens here. And I'll give you the criterion and one you can do it and we'll do a few examples quick, quick, just by looking at the numbers.
156
00:28:12.400 --> 00:28:18.400
But right now I want you to see the summation. So you see the distinction. All right, questions?
157
00:28:18.400 --> 00:28:29.400
So can you really ignore the two enter? Yeah, any hour that's greater than one, however small.
158
00:28:29.400 --> 00:28:36.400
I mean, you have to give it enormous and to be insignificant.
159
00:28:36.400 --> 00:28:39.400
You're saying can I ignore this two here?
160
00:28:39.400 --> 00:28:53.400
The minus one. Oh, absolutely. And it has an enormous number of n. But n is strictly small o n to the log 3.
161
00:28:53.400 --> 00:28:59.400
This is really bigger than this asymptotically. And I agree that you'd need big ends to see that happen.
162
00:28:59.400 --> 00:29:11.400
But theoretically, this is a worse algorithm than this. Linear is better than this. I should mention one thing just because this is coming up now before I go on to the n squared case and show you what happens.
163
00:29:11.400 --> 00:29:19.400
I want to say this because this gives you a big picture about algorithms and maybe the difficulty that theory sometimes has for catching up with practice.
164
00:29:19.400 --> 00:29:29.400
This is good theory and it gives you a way of organizing algorithms into a hierarchy. But let me give you an example of an algorithm that's a real life algorithm that has a particular complexity.
165
00:29:29.400 --> 00:29:38.400
And the complexity is worse than linear. But for all intents and purposes in a real physical life universe, it will be completely linear.
166
00:29:38.400 --> 00:29:45.400
Even though the theory says it's worse than linear. So that means the theory and the practice are kind of breaking down there. But it's still interesting.
167
00:29:45.400 --> 00:29:57.400
And I'll show you this in just a couple of seconds. This function takes a number and raises two to the power in a stack as many times as that number is.
168
00:29:57.400 --> 00:30:18.400
So for example, f of one means you have just one stack of twos on top. So that's four. F of two is going to be good. F of three is going to be no two to the 16.
169
00:30:18.400 --> 00:30:36.400
Just the way I defined it, the number of twos in the stack. F of three is two to the 16. It means I got two and then three twos in a tower to the two to the two to the two. So two to the 16. 64,000 something.
170
00:30:36.400 --> 00:30:56.400
25,000, 536. All right. Powers of two people. All right. F of four. Yeah, I'll just get my recursive friends to help me. This number is a huge number. This number is bigger than the atoms in the universe probably. It's a big number.
171
00:30:56.400 --> 00:31:08.400
Okay. You're not going to ever make a computer that has as much memory even if you're gateway. Right. Penty m inside. But two to the 65,000 is a big number.
172
00:31:08.400 --> 00:31:18.400
So once you get the four or five. This is just a huge function. Now here's the inverse of this function. It's called log star n.
173
00:31:18.400 --> 00:31:39.400
If I put in 65,000 into n here, this tells me three. If I put in two to the 65,000 into n here, it tells me four. This number for all practical size inputs in the universe or in any conceivable universe is less than or equal to four.
174
00:31:39.400 --> 00:31:53.400
It's just never bigger than that. I know conceivably it grows without bound. If you put in two to the gazillion, this number will sooner or later get as big as you want, but not in real life. In real life, it's four or less.
175
00:31:53.400 --> 00:32:01.400
There is an algorithm that uses a set data structure that I talked about briefly when we talked about sets called a union find data structure.
176
00:32:01.400 --> 00:32:10.400
You construct sets and the two things you're able to do with this data structure is given an element find the set that it's in and given two sets merged on together.
177
00:32:10.400 --> 00:32:19.400
It's a data structure that you could have as an assignment and scheme. You could store it in lists if you wanted to. But the way this data structure is implemented is that it's very fast.
178
00:32:19.400 --> 00:32:33.400
When you go ahead and solve the minimum spanning tree problem with this data structure, instead of the usual n log n time algorithm for minimum spanning tree, you get an n log star n algorithm for minimum spanning tree.
179
00:32:33.400 --> 00:32:49.400
So somebody writes this up shows a good use of union find gets a PhD says I've improved the n log n to n log star n good job. Did they come up with a linear time algorithm? No, this is theoretically worse than linear time.
180
00:32:49.400 --> 00:33:03.400
So practically speaking, we all know that this is four and or less for anything in the universe. So practically speaking, it really is less than linear time. So this begs the question of what a better theory might be when you get into strange things like this.
181
00:33:03.400 --> 00:33:12.400
But it's something you should be aware of. There are weird functions that that you really don't see the asymptotic stuff describing what really happens.
182
00:33:12.400 --> 00:33:28.400
Back to this problem again. We solved this and we said it was big theta n to the log three. And now I want to know what happens.
183
00:33:28.400 --> 00:33:44.400
What happens when the function that glues it together instead of being n turns out to be n squared. I warned you I was going to do this in advance. Maybe some of you actually did it. I am getting a little tired of going through all the details of the algebra. You're probably getting tired of it too.
184
00:33:44.400 --> 00:33:56.400
So if you don't mind me short circling this, I'll tell you what you get in that t event step with the r times and we'll compare it to what we got before.
185
00:33:56.400 --> 00:34:10.400
I get the same thing for the recursive part. That part doesn't change at all. The only part that changes is the gluing part. But here's my new sum for the gluing part.
186
00:34:10.400 --> 00:34:39.400
Instead of n on the outside, I get n squared. And on the inside, instead of 1 plus 3 halves plus 3 halves squared plus 3 halves cubed, all the way up to 3 halves, 3 r minus 1, what I get is 1 plus 3 fourths plus 3 fourths squared plus 3 fourths cubed plus 3 fourths.
187
00:34:39.400 --> 00:34:58.400
Now without going ahead and solving this geometric series, which one of these terms is dominant? This one is. It doesn't give two squats with this ends of being as long as it's bigger than 1 or as long as it's a constant, as long as it's bigger than 0.
188
00:34:58.400 --> 00:35:11.400
This adds up to some number times n squared. This is less than n squared, right? It's one point something. So we don't care about this. It gets dominated by the gluing part.
189
00:35:11.400 --> 00:35:30.400
The master theorem keeps track of which part is going to be dominant and tells you in advance and mentions therefore that in this case where the gluing part is dominant, the complexity has nothing to do with these numbers, but is simply the same complexity as a single gluing stage, big theta n squared.
190
00:35:30.400 --> 00:35:44.400
Okay, who has a question so far? So there's two possibilities. Either this part's dominant or this part's dominant. So far we have those two possibilities. The third possibility was the first thing I did today.
191
00:35:44.400 --> 00:35:59.400
When I did 2n over 2 and I added n, what was the answer to this recurrence equation? It was big theta what? N log n.
192
00:35:59.400 --> 00:36:20.400
What happens there is that neither one really dominates. You kind of get the perfect balance of splitting and gluing. And what results from a perfect balance of splitting and gluing is this gluing function gets an extra factor of log n.
193
00:36:20.400 --> 00:36:42.400
Okay, when you design a recursive algorithm and you know all these things in advance, you should know how much slack you have as far as how many sub problems you're going to divide. If you know what the complexity is going to turn out to be based on this number and this number, then you know if I divided into two parts, how many different pieces I can have and still keep the algorithm under n squared.
194
00:36:42.400 --> 00:36:58.400
How much my gluing procedure is allowed to take before it ends up being so dominant that it doesn't make a difference how carefully I split the problems. Knowing this in advance helps you design algorithms, not just analyze them when you're done, but it helps you plan how you want them to be. It's really important.
195
00:36:58.400 --> 00:37:12.400
So that's what the master theorem is going to summarize and I'm going to put it up in the board right now and you got to be careful because there's a lot of conditions in this theorem. Let me be very specific now. I said that one of these is going to dominate and the master theorem tells us which one.
196
00:37:12.400 --> 00:37:29.400
When everything's perfectly balanced, it turns out that the sum also dominates the recursive part, but it doesn't dominate it equal to itself. It dominates it with an extra factor of log n and in some sense that's I think the ideal kind of balance.
197
00:37:29.400 --> 00:37:42.400
You could find a case where both terms are equal. You could, you could, where one doesn't, where you could, but in those cases, we know we just pick one of them. You could, they could be equal, a good time.
198
00:37:42.400 --> 00:37:43.400
Yes.
199
00:37:43.400 --> 00:37:47.400
What was your final answer on the current?
200
00:37:47.400 --> 00:38:00.400
This is big theta n squared because that's dominating that. You don't like that?
201
00:38:00.400 --> 00:38:19.400
It's four times, it takes the sum of the square brackets. There's four times three quarters to be minus one. This is what I get exactly.
202
00:38:19.400 --> 00:38:34.400
Did I mess up? I don't know either. I thought this just ended up being something smaller, but maybe I'm wrong. Let's check that in a second. Here's the master theorem.
203
00:38:34.400 --> 00:38:57.400
The master theorem takes care of recurrence equations that look like this. Some constant number of subproblems, some division of the size of each of those subproblems, some function for gluing the answers together.
204
00:38:57.400 --> 00:39:08.400
It divides it into three categories. One, two, and three.
205
00:39:08.400 --> 00:39:17.400
The categories compare this function to some combination of these numbers, and here's what it compares. Case one.
206
00:39:17.400 --> 00:39:34.400
F of n equals big theta n log b a. If F of n and n log b a are the same, that's one case. The other two cases are when one's bigger and when one's smaller.
207
00:39:34.400 --> 00:39:43.400
We'll do an example in a second. Let me just write down the other cases.
208
00:39:43.400 --> 00:39:55.400
By bigger and smaller here, I mean asymptotically bigger and smaller.
209
00:39:55.400 --> 00:40:04.400
Three cases. Either they're the same asymptotically or one's smaller or one's bigger.
210
00:40:04.400 --> 00:40:18.400
In this case, the overall complexity is n to the log b a times log n. That's like the first example I did today, where you get that extra log n factor.
211
00:40:18.400 --> 00:40:36.400
In this case, it's just plain, that's right, big theta. In this case, it's just plain n to the log b a. That's like the second example I did. That's like the three tn over two plus n example.
212
00:40:36.400 --> 00:40:57.400
In the third case, I meant to do an example where the other one was dominant, namely F of n. Those are the three possibilities. This saves you all the boring work and summarizes it all in one nice result.
213
00:40:57.400 --> 00:41:10.400
Let's do some examples, which one of these is bigger and squared is bigger. Therefore, this should be n squared is bigger. Therefore, this should be big theta n squared.
214
00:41:10.400 --> 00:41:30.400
The answer to your question is probably where I said take my word for it, you shouldn't have taken my word for it. I probably got that some wrong. I did something incorrect there, but it should have worked out to n squared. I'm not sure what I did wrong.
215
00:41:30.400 --> 00:41:45.400
The answer was three. You said you had a, yes, yes, minus one, one minus. And then the one multiply n squared would be the dominant.
216
00:41:45.400 --> 00:42:03.400
Oh, yeah, that would have been more helpful. Yeah, I understand what you're saying, Baruch. Right. Baruch said I reversed the minus one. It should have been one minus. And you may very well be right. Whatever it was, the mistake was an algebra mistake.
217
00:42:03.400 --> 00:42:18.400
I know that the kind of mistakes you all come to me and say, well, why did you take off for that? Well, I make these mistakes and I apologize. I should have been more careful right in the sun.
218
00:42:18.400 --> 00:42:38.400
Okay, let's try each of these. We need to compare the functions to log of base this of this number. So let's try to do it. Hey, Joe, what do you know? Help me out.
219
00:42:38.400 --> 00:42:50.400
And to the log of what ends of the log. Space three of right. What's the log base three and nine?
220
00:42:50.400 --> 00:42:53.400
Two, three. Two, three.
221
00:42:53.400 --> 00:42:56.400
Sorry.
222
00:42:56.400 --> 00:43:02.400
Three, two. Right. So three.
223
00:43:02.400 --> 00:43:10.400
You shall count to three. Neither shall you count to four except on. We're going to get that movie next time.
224
00:43:10.400 --> 00:43:19.400
Log base three of nine is two. So that's n squared. So n squared's bigger. That means which case are we talking about?
225
00:43:19.400 --> 00:43:34.400
It's talking about case two. This guy's bigger. That means that the order of this algorithm is n squared. Painless, right? Not only is it painless, but you should know when you're designing the algorithm.
226
00:43:34.400 --> 00:43:45.400
Oh, I've got this built in. You know, if it was four and two, it'd also be squared. Okay. Get used to. If these were equal, if it was three and three, what would happen then?
227
00:43:45.400 --> 00:43:59.400
Three and three. Log base three of three. So it's n. That means the two are the same. You don't average it to. The two are the same. No, you get n log n.
228
00:43:59.400 --> 00:44:10.400
Okay. If you have three problems of size three each with an n-gluing factor, you get n log n as your result. If you have six, if you have nine problems, it ends up being n squared.
229
00:44:10.400 --> 00:44:25.400
What if you had five problems? Five subproblem. It'd be log base three of five. Is that bigger or smaller than n asymptotically? It's bigger.
230
00:44:25.400 --> 00:44:39.400
It's bigger. And what would the asymptotic complexity of your algorithm be? It'd be n to the one point god knows something. And that's why if you ever read in the textbook or in a paper, the complexity of this algorithm is n to the one point seven, three.
231
00:44:39.400 --> 00:44:55.400
Two, and you're wondering, well, what kind of weird fraction of loop gave this guy into the one point seven, three, two? It comes from these kind of recurrence equations. Right? Not every complexity is just a loop in a loop so that it's n or n squared or n cube. You get it from here.
232
00:44:55.400 --> 00:45:06.400
One point six plus one point six. Something. All right. This one. What's the deal here? Can you use this?
233
00:45:06.400 --> 00:45:29.400
The bottom is five over three. Right? Think of it as you have to write it as n over b. So if you write this as n over a number, it's n divided by five thirds. So it's the log of log of one log base five thirds of one.
234
00:45:29.400 --> 00:45:44.400
How do you get one? Anything to the zero is one. So the fact that this is three fifths is kind of just the red herring could be any fraction at all.
235
00:45:44.400 --> 00:46:13.400
Log base of anything to the one is zero. Enter the zero is a constant. This is a constant. They're the same. So the complexity of this algorithm is a constant times log n. This is big theta log n. This is big theta and squares.
236
00:46:13.400 --> 00:46:30.400
This is just as good as binary search. You're not to divide something exactly in half and glue it together with one guess. That's what binary searches. You can divide it into something a little bigger than half. In fact, if this was nine tenths n, it would still be log n.
237
00:46:30.400 --> 00:46:38.400
As long as you're taking a constant fraction away, even if it's nowhere near half, you're still getting a log n complexity.
238
00:46:38.400 --> 00:46:56.400
So this is the last one. Yeah, because you have to write it as n divided by a number. You have to match that picture. So so three fifths n is the same as n divided by five thirds.
239
00:46:56.400 --> 00:47:10.400
That's a good question. I should have made that clear. So you have to get the number down there somehow. Although you should keep in mind that in this particular case, it wouldn't really matter either way because any number to the zero is one. So you get constant and constant.
240
00:47:10.400 --> 00:47:15.400
What about here? What do we compare in here?
241
00:47:15.400 --> 00:47:33.400
Log base four of three to n log n log base four of three is what? Between one and two, right? No smaller than one less than linear. And this is bigger than linear.
242
00:47:33.400 --> 00:47:49.400
So this is bigger. So the complexity of this algorithm is going to be n log n question. All right. So now I have to write a question. Now I have to tell you what I really mean by these less than equal signs and greater signs.
243
00:47:49.400 --> 00:48:06.400
I kind of fudge one little thing less than greater than a kind of fudge it. And I said, well, asymptotically. And I let it go. And nobody needle me because it wasn't obviously wrong. But now we're getting up to this problem. And I want to tell you exactly what I mean and what the theorem says.
244
00:48:06.400 --> 00:48:19.400
The theorem says that if you're greater than or less here, it's got to be by some polynomial amount. It can't be just bigger by a log n factor. It's got to be bigger by an n to the something.
245
00:48:19.400 --> 00:48:29.400
Okay. Otherwise the theorem won't work. The theorem only works when this greater or less than is by a polynomial factor. So let's go back and compare this one.
246
00:48:29.400 --> 00:48:43.400
Here we get n to the one point something say one point three. And we have n times log n. Is it okay to use our trick? Is it polynomial bigger?
247
00:48:43.400 --> 00:49:11.400
Sorry, this is a one point three. This is point eight or something. Right. So is this polynomial bigger? Yes or no? Well, certainly log n bigger. But it's also n to the point two bigger. Right. It's n to the something bigger.
248
00:49:11.400 --> 00:49:25.400
It's the difference between log base four or three and one. So you do have that little bit extra that you can still grab on to. If this were exactly n, then you couldn't use the master theorem because they only have a log n extra factor.
249
00:49:25.400 --> 00:49:44.400
But we still have enough to get a polynomial time. The polynomial improvement. Let me write it up here and to the point eight versus n log n. This is n to the point two times log n bigger.
250
00:49:44.400 --> 00:49:57.400
That's the polynomial and that's okay. The difference between point eight and one.
251
00:49:57.400 --> 00:50:15.400
Okay. If it was just log n, you couldn't do it. So let me give you an example that you can't use the master theorem for and then I'll tell you what you do when you can't use it.
252
00:50:15.400 --> 00:50:31.400
So do we have a question you're thinking about something? No. Okay. Let me do one example where you can't use it. So everything we did so far is ripe. You got to know what that means. There's also by the way, there's another condition here that I'm not even going to tell you some particular condition about f of n.
253
00:50:31.400 --> 00:50:45.400
It's an obscure condition that you rigorously need to have be true for this theorem to work, but it's almost always true for any functions that you'll see. And I won't bother even telling you what it is.
254
00:50:45.400 --> 00:50:54.400
And it'll never come up. But you should realize that the real theorem has some peculiar condition on f of n and that exists too.
255
00:50:54.400 --> 00:51:09.400
Even sure our book even goes into it. I know the revest algorithms book would tell you what the condition has to be on f of n. But I don't think I think our book just says assume f of n is some polynomial because they they're all the right function.
256
00:51:09.400 --> 00:51:19.400
This is just n. This is n long again. This is a log n factor larger asymptotically, but that's not enough for us to use the theorem.
257
00:51:19.400 --> 00:51:29.400
It does not fit into big theta. It does not fit into less than or greater than polynomial. There are gaps in this theorem. This theorem does not divide every one of these recurrence equations into one of three categories.
258
00:51:29.400 --> 00:51:37.400
You can fit into one of these three or you could be out in limbo. This is an example that's out limbo. It's neither big theta of each other.
259
00:51:37.400 --> 00:51:54.400
Nor is it a polynomial time greater. It's just log n greater. So this doesn't work. And what do you do? You got to go back to your substitution drawing board and actually go ahead and calculate it.
260
00:51:54.400 --> 00:52:09.400
In doing this, you get a really, really interesting sum, which comes up in a lot of other areas and it's good technique. So I think it's worth actually doing this one, seeing what it gets. And when we're done doing it, I'm going to give you a theorem that actually works for when you're a log n factor bigger.
261
00:52:09.400 --> 00:52:16.400
And they don't have to do this sum again. But you should at least know that the master theorem doesn't cover this.
262
00:52:16.400 --> 00:52:32.400
Questions about it. Who recognizes this recurrence equation? Towers of annoy. Remember how we solved this? We did repeat it substitution and then we got a long geometric series. And that's when I first told you how to sum a geometric series.
263
00:52:32.400 --> 00:52:47.400
So here's how we're going to do it today. We're going to change variables. I'm going to create a new variable relative to the T's here. And I'm going to get a recurrence equation. That's the simplest one you've ever seen.
264
00:52:47.400 --> 00:52:54.400
Just like the compound interest one we did in the very first day. So here's my new change of variable.
265
00:52:54.400 --> 00:53:16.400
S of n, it's my new variable, equals T of n plus one. How is this going to help? Let's go ahead and change this equation in terms of S of n.
266
00:53:16.400 --> 00:53:26.400
S of n minus one. That's the same as T of n. Equals.
267
00:53:26.400 --> 00:53:48.400
Two times S of n minus one. Here. I don't think so. T of n is S of n minus one. So every place you see a T of n here you put an S of n minus one.
268
00:53:48.400 --> 00:54:11.400
So T of n minus one is two S of n minus two minus one plus one. Let me do it again. Since if I clubbed with the chalk you're going to think this is harder than it really is.
269
00:54:11.400 --> 00:54:21.400
We changed the variable. And if you're thinking why the heck did he change the variable that way. And you're probably wondering why I haven't explained why.
270
00:54:21.400 --> 00:54:30.400
And I will in a minute because this is a magic trick. I change this variable. I get T of n equal S of n minus one. In every place I see T of n. I put an S of n minus one.
271
00:54:30.400 --> 00:54:48.400
Except here I messed it up a little because here I have T. Two times S of n minus one minus one. This is T n minus one is S n minus one minus one.
272
00:54:48.400 --> 00:55:06.400
This is a place of T n minus one. And then I add one from that one. Now let's fiddle with this a little. I'm putting that over there.
273
00:55:06.400 --> 00:55:25.400
This is what you got. If you fiddle with this just a little bit all the constants go away and you get the simplest recurrence equation you ever saw in your life. This is just like compound interest. This says that S of n is just twice of the one we fought.
274
00:55:25.400 --> 00:55:43.400
What's the solution to this if I say S of one equals one. By the way what is S of what's T of zero zero right. What's S of zero.
275
00:55:43.400 --> 00:55:55.400
You can figure out your base case just by translating it through your change of variable. So if T of zero zero then S of zero is zero plus one. So it's S of zero is one. How do I solve this recurrence.
276
00:55:55.400 --> 00:56:08.400
The repeated substitution here doesn't even develop a sum. It's the easiest possible one. You just keep multiplying by two and by two and by two and by two. So what's the final answer.
277
00:56:08.400 --> 00:56:25.400
S of n equals two to the end. So what's T of n. Two to the n minus one. So what's easier doing a geometric series doing this. This is easier. But how did I know that this would be so pleasant.
278
00:56:25.400 --> 00:56:42.400
Well the real way I knew is because I knew the answer is two to the n minus one right. And I know that two to the n comes from a simple recurrence equation. So I just go backwards. And I know the change of variables is going to work and I run it through and I make it look like magic.
279
00:56:42.400 --> 00:56:55.400
But Sam's right. I can do this backwards the first time because I knew what the ending was. I know the end of the story. So I make the story look good. But if you get good at this. You can start guessing in advance.
280
00:56:55.400 --> 00:57:06.400
That the end is going to look in a certain way and changing the variables before you get into the crazy some. And that takes ingenuity and it takes practice and it takes experience.
281
00:57:06.400 --> 00:57:18.400
So how do I expect you guys to do it if you're just beginners. Just take these pills.
282
00:57:18.400 --> 00:57:40.400
The kind of problem I expect you to do that's this style is I'll say here's a ridiculously looking hard recurrence equation. Try this change of variable and see what happens. You'll see things like that. And then you'll learn to appreciate this technique without being in the position of G. How do I get it because it does take experience and it really isn't so easy to guess the change of variable.
283
00:57:40.400 --> 00:57:51.400
But you should at least get the intuition here. This particular change of variable what it's really doing is just kind of doing the geometric series in advance.
284
00:57:51.400 --> 00:58:01.400
The geometric series is is is double everything with an extra one term on the end. So let's just add the one term now so everything would just be double. That's kind of what's going on.
285
00:58:01.400 --> 00:58:11.400
So let's say this was plus four. And you do the same exact change of variable what he ended up getting.
286
00:58:11.400 --> 00:58:33.400
Yes, and minus one equals. You tell me Joe you thought of it. You're stuck now. Right. So now let's try to solve for S of n.
287
00:58:33.400 --> 00:58:47.400
Plus of n equals two S of n minus one minus two plus four plus one.
288
00:58:47.400 --> 00:59:04.400
Three minus two plus four plus one three. Yeah, Jesus. We're a little better. So you did repeat.
289
00:59:04.400 --> 00:59:11.400
Good idea. You can do change of variables more than once. Right. And if you notice this gets you to something a little better.
290
00:59:11.400 --> 00:59:19.400
Maybe we could change the variable again and change the variable again. And then you might wonder, well, maybe we could do all these change the variables all at once.
291
00:59:19.400 --> 00:59:27.400
Oh, good. Good. You will be mature. Give me back the pills. We did change of variables. We did the master theorem. We did repeated substitution.
292
00:59:27.400 --> 00:59:39.400
We're not going to do linear algebra stuff today. We're not going to do linear homogeneous today. I still owe you that one sum for the one that didn't work for master theorem. Maybe I'll ask Tara to do it in recitation as a review.
293
00:59:39.400 --> 00:59:48.400
The good news is recitations from now for the next few days are going to be nothing new. They're going to be let's do lots of these until everybody feels they can completely do it.
294
00:59:48.400 --> 01:00:00.400
All right. So they're going to be a lot more recitations typically are we don't have to catch up. We're right here. So bring in a problem. You don't know how to do it. Bring in a sum that's tough. You can talk about it. It'll be it'll be nice.
295
01:00:00.400 --> 01:00:12.400
I don't know how long that'll last, but at least for the next few days we're set. This is this is about this will be a 10, 15 minute topic that we're done for today.
296
01:00:12.400 --> 01:00:19.400
What do you whisper in my?
297
01:00:19.400 --> 01:00:27.400
Let's have chock-flices.
298
01:00:27.400 --> 01:00:35.400
That's what I got from my sixth graders. I got a I got that guy and they get real excited when they're flake wins.
299
01:00:35.400 --> 01:00:40.400
Jeez.
300
01:00:40.400 --> 01:00:55.400
Guessing an answer to a recurrence equation improving a by induction is a great example of the power of induction. It lets you use your intuition. You get used to recurrence equations after while you get a sense of oh I think I know what that one is.
301
01:00:55.400 --> 01:01:03.400
But I don't have the slightest idea of how to prove it. So I'm going to take a darn good guess. And maybe I can just verify my guess by induction.
302
01:01:03.400 --> 01:01:14.400
And you don't have to get the exact close formula. All you have to do is guess a big theta and verify that by induction. And you get what you want with a lot less effort.
303
01:01:14.400 --> 01:01:18.400
And you can rely on your intuition. So it's cool.
304
01:01:18.400 --> 01:01:26.400
The example I'm going to do is an example that comes from algorithms. A real life example. I'm going to put a recurrence equation up in the board and I'll tell you the algorithm that it's from.
305
01:01:26.400 --> 01:01:35.400
We're going to look at it. Scratch your head and say God knows I don't want to do this with any techniques because we don't have any techniques that are going to work. We're going to guess an answer because I'm going to tell you what I think it should be.
306
01:01:35.400 --> 01:01:48.400
And then we're going to figure out how to prove it by induction. The big picture. We're going to try to prove it by induction. I know if I did it the right way the first time. You say I think I get that. And then when you try to do it yourself you'd say I got what you didn't class. But now that I try to do it myself I can't do it.
307
01:01:48.400 --> 01:01:57.400
So I'm going to do it the way I think you would try to do it the first time, which is I'm going to do it wrong. I'm serious. I'm going to do it and it's going to mess up.
308
01:01:57.400 --> 01:02:06.400
And right when it messes up instead of having for you to come to me you'll be here already. And then I'll fix it and we'll do it right. And I think that way you'll learn it better.
309
01:02:06.400 --> 01:02:24.400
Now it's not really wrong. It'll just be a bad guess the first time. And we'll improve our guess. So it'll be a good experience.
310
01:02:24.400 --> 01:02:40.400
Here's an equation that comes out of an algorithm for finding the median of n elements. Don't ask about the details of the algorithm. It's quite a complicated algorithm.
311
01:02:40.400 --> 01:02:48.400
We'll spend a day on it or so in February whenever we get there. But it's an algorithm that takes n elements and finds the middle guy.
312
01:02:48.400 --> 01:03:00.400
Now you can always find the middle guy by sorting the list right and pulling out the middle guy. That takes n log n. So if you've got any idea of how to pull out the middle guy it better be better than n log n or I can just send it to my sorting man.
313
01:03:00.400 --> 01:03:12.400
This algorithm therefore we hope is going to be linear time. Proportional to the looking at every single element exactly once. It's not obvious how to do that.
314
01:03:12.400 --> 01:03:28.400
It's not clear at all how to do better than sorting but you can't. And it turns out in the details of this algorithm we solve a problem of one fifth of size. We solve another problem 70% of size and then we do 15 and steps to glue them together.
315
01:03:28.400 --> 01:03:45.400
Well for one thing we don't have just one set of sub problems you know three sub problems and over two. We have two different ones. For another thing if we tried substituting we get these long ugly summations where two things would start developing.
316
01:03:45.400 --> 01:03:50.400
We just don't want to go in that route. It's just going to be ugly.
317
01:03:50.400 --> 01:04:00.400
So instead we're just going to guess that this is linear time and try to prove it by induction. So our proof is going to be by induction on the size of the problem.
318
01:04:00.400 --> 01:04:11.400
We're going to assume that it's linear for smaller problems for say things less than n and we're going to try to show that it's actually linear for size n using the inductive assumption working our way up.
319
01:04:11.400 --> 01:04:24.400
And the only thing we know to help us besides the inductive assumption that it's linear is we know this we know that t of n is no bigger is less than or equal to t n over 5 plus t 7 and over 10 plus 15.
320
01:04:24.400 --> 01:04:38.400
So I'm going to guess a wrong guess just because I have no clue what this complexity is and I'm going to guess that this t of n is less than or equal to 60 times n.
321
01:04:38.400 --> 01:04:50.400
And don't have to why I pick 60 because I picked it for no good reason I just picked it I'm hoping that this is less than something linear times n so I picked a number 60 and I'm going to try to prove this by induction.
322
01:04:50.400 --> 01:04:58.400
I'm going to try to prove that t of n is less than 60 and less than or equal to 60 and by induction assuming this is true.
323
01:04:58.400 --> 01:05:08.400
Everybody with me so far it turns out that you can't do this we're going to get stuck but the point that we get stuck is a point that we can backtrack and use a better number and get it to work.
324
01:05:08.400 --> 01:05:18.400
Here's my base case t of n is less than or equal to 5 as long as n is smaller than or equal to 6.
325
01:05:18.400 --> 01:05:28.400
And this is part of the definition. This is part of the definition this is the algorithm if it's got less than 5 elements can do it in.
326
01:05:28.400 --> 01:05:31.400
If it's got 6 or fewer elements it can do it in 5 steps.
327
01:05:31.400 --> 01:05:34.400
You give me 6 or 5.
328
01:05:34.400 --> 01:05:51.400
I'm going to try to do this. Here's what I know.
329
01:05:51.400 --> 01:06:12.400
n over 5 is a smaller problem. Everyone agree that n I'm trying to prove t n is less than or equal to 60 and by induction that t n over 5 is therefore less than or equal to 60 and divided by 5 or 12.
330
01:06:12.400 --> 01:06:22.400
By the same inductive assumption t of 7 and over 10 is smaller than or equal to 42.
331
01:06:22.400 --> 01:06:26.400
Those two things I get for free by my inductive assumption.
332
01:06:26.400 --> 01:06:35.400
And now I'm going to go ahead and use the one thing I know for how to combine things and relate them to t of n to see if I get a conclusion.
333
01:06:35.400 --> 01:06:51.400
Hello, Donna. Oh no. We're just lecturing backwards today. We'll be finishing at 9. If I can spin like he's fast enough for the world to go around at the speed of life.
334
01:06:51.400 --> 01:07:07.400
This is smaller than 12 n plus 42 n plus 15 n and that equals 69 n and that's smaller than 60 crap.
335
01:07:07.400 --> 01:07:21.400
It would have worked perfectly if that's 69 turned out to be smaller than 60 but 69 is bigger this week than 60 so it doesn't work. So we have to come up with a better constant.
336
01:07:21.400 --> 01:07:30.400
So one thing might happen this. We might end up not being able to find any constant. Maybe this isn't linear time. Maybe we guessed wrong and you'll find that out after a few more tries.
337
01:07:30.400 --> 01:07:41.400
But rather than keep trying vaguely and randomly, let's do this whole trick again and instead of using 60 n, I'll just call it K n.
338
01:07:41.400 --> 01:07:48.400
I know this make the K whatever I need it to be at the very end to work instead of guessing.
339
01:07:48.400 --> 01:07:59.400
So if I had started this whole thing and I say, okay, so let's let t of n less than or equal to K n. Yeah, maybe you would get it. Maybe you wouldn't but I think this is clear.
340
01:07:59.400 --> 01:08:09.400
But now everything gets more complicated. My inductive assumption t n over 5 is less than what now? K n over 5.
341
01:08:09.400 --> 01:08:18.400
The minute the specific example goes away and things get general, you'll lose it unless you have some kind of picture of the specific example in your head that you can graph the generality on.
342
01:08:18.400 --> 01:08:32.400
Here you go. 7K n over 10. And let's go through the same steps. So now let's figure out what this is. Let's add it together. K n over 5 is the same as 2K n over 10. True?
343
01:08:32.400 --> 01:08:38.400
So I get 9K n over 10 plus 15 n.
344
01:08:38.400 --> 01:08:56.400
Which n have an equal? Yes, you're right. I'm hoping. I'm hoping that this is less than or equal to K n. I'm hoping.
345
01:08:56.400 --> 01:09:14.400
So if I want this to be true, I've got complete control over K. Right? I picked you. So I can pick them to be anything I want them to be or her.
346
01:09:14.400 --> 01:09:38.400
I'm hungry too. Let's move all the K ns on one side. All right. I don't usually let Sam tell me what to do, but today it's such a good idea.
347
01:09:38.400 --> 01:09:54.400
All right. Take all the ns out. I agree. Take the ns out. Let's put the Ks together. So I get how many Ks on this side? K over 10. And here I get 15. So K's got to be bigger than or equal to 150.
348
01:09:54.400 --> 01:10:13.400
So if I just picked 150 at the beginning and ran through the whole proof and everything would have worked and I'd say C's there for T n is less than or equal to 150 times n. And it's linear time. And you wouldn't have ever bothered asking me how I came up with that number 150 until two days later when you hit another problem set.
349
01:10:13.400 --> 01:10:27.400
And you'd say, well, how did you ever come up with that 150? So now you know, right? You don't really come up with it.