Exaud Blog
Blog
CPUs for Dummies: Part 2
Brace yourself for some really fun gastronomic analogies and to learn more about CPUS with Junior Software Developer Carlos Pinto. Posted onby ExaudIn the previous Part 1 we overviewed how a CPU works, without getting too technical. In this Part 2 we’ll see how engineers have streamlined this fine piece of hardware to become faster and more efficient.
Speed, speed, speed!
Now the main goal is to do all the processing as fast and as efficiently as possible. I’ll go through with some of the tricks that engineers have come up with to speed up processing, without necessarily just turning up the clock knob, or increasing processor transistor count.
Staying Busy with Pipelining
Imagine you run a food stand of fried chicken (we’ll be sticking with this analogy throughout, so bear with me). A customer comes in and places their order, you turn your back and get to frying. Only after the chicken’s ready, do you serve the customer and take their payment, in the meantime, more customers have shown up. You could, of course, have taken the next order while waiting for the chicken to finish frying.
That’s what CPU engineers have realized early on, while we’re decoding an instruction, we can already fetch more, and while we’re executing instructions, we can already decode the next, and so on… Now, this is not as trivial as it sounds, different instructions take different amounts of time, and this calls for some major logistics, including computing, ahead of time, how long an instruction will take from the moment it is decoded, to the moment its output has been computed, and deciding whether the next instructions can proceed or not.
Doing More of the Same with Multithreading
Your fried chicken stand is a success! Unfortunately, too many customers and waiting times, means impatient customers are leaving. But you’re already pipelining, so you decide to hire an extra person (let’s call him Gus), and buy an extra fryer. You split the queue into two and now both you and Gus do exactly the same thing, but don’t care about what the other is doing, and things are back on track.
We can apply the same rationale for the CPU. Create a replica of a CPU that runs exactly the same, but receives its own stream of instructions. Now, the term “CPU” starts getting ambiguous here, so I’ll make this distinction: we’ll call these “replica CPUs” physical cores (the word physical will come into play later), and, thus, a CPU is made up of multiple physical cores, which are not necessarily copies, and have multiple resources, like the ALU, CU, LSU, etc. This allows your computer to not only run multiple programs (multiprocess), but allow developers to write programs that run in multiple cores (multithread), while praying that they don’t have concurrency issues in their code ;).
Hyperthreading
Let’s take a break from chicken (for now). You work at a post office, and, from time to time, clients don’t bring their forms filled, so you ask them to fill the forms, which usually takes quite some time. However, they need to sit in front of you, should they have any questions. So, you decide to, instead, take in two queues at a time, with the rationale that it’s likely for clients to have unfilled forms from time to time, even if at times, a client is just sitting, waiting for you to finish talking with the other. There’s a bit of a tradeoff, but overall, the process should be faster.
Hyperthreading is Intel’s proprietary version of this situation. AMD calls theirs Clustered Multithreading, and, as far as I know, there’s no generic name for this technology, although the term Simultaneous Multithreading gets thrown around. Sometimes, your CPU undergoes an instruction that takes a very long time. So, we can define so-called CPU virtual cores which mean that our Operating System will look at them as being different cores that they can run different programs in, but they actually belong to the same CPU physical core. The same core takes in instructions from two separate streams, under the assumption that, eventually, some of those instructions will stall execution for some time, and they can switch over to the other thread. This, of course, means that if all instructions that are coming in have no significant delays, this technology would theoretically have no benefit, but this is rarely the case.
Getting things done with Out-of-Order Execution
Back to fried chicken! You’ve realized that it’s a bit weird that you literally only serve chicken, so you added a fridge for beverages, and decided to also get a waffle maker (Gus also gets a waffle maker of his own). You’ve also realized that, while a majority of people wanted chicken, some just came for the waffles and a cold one. So, whenever the fryers are completely full, you decide to start asking down the queue, who wants waffles or beverages, so that you can get on those orders ahead of time. Also, now, in your head, “I want a piece of chicken and a waffle with hot chocolate topping” doesn’t sound like a single order, but rather two smaller orders, which you get done, and then put them back together into a single tray. So, even if someone down the line places such an order, you can’t do anything about the chicken yet, but you can get to work on those waffles!
There are two things going on here: first, engineers realized that complex instructions could be broken down into simpler instructions; second, they realized that they didn’t necessarily need to execute instructions in the same order they came in, they could “ask further down the line”. The first point is regarding so called micro-operations or μops instructions that are rudimentary in what they do (but are actually usually bigger in size, so their naming might have not been the best), and it consists in having an extra step in the decoding phase, where incoming instructions are first translated into μops, which are then what is effectively executed. The second point regards out-of-order execution which relates to how the CPU tries to attribute any μop to any available resources, even if that μop is not the most recent. Their results are then forwarded to a temporary “box” register called the Reorder buffer, which makes sure that, even though they executed out-of-order, their results are outputted in-order (even though we can make the waffle ahead of time, we first store it in a warm place, and only when that customer reached the head of the queue, do we give it to them, along with the chicken). Out-of-order execution and micro-ops aren’t necessarily two sides of the same coin, you can have any one without the other. But μops make out-of-order execution more promising, and our little waffle/chicken example tells us why: you wouldn’t make the waffle ahead of time if you thought of the “waffle and chicken” combo as a single request, that you can’t yet work on.
Predicting the Future with Speculative Execution
Gus, our trusty coworker from the fried chicken stand, has applied to two colleges, and he says that if he’s accepted in one, he won’t be able to work for your fried chicken stand anymore, as his college would be pretty far away, but the other college is fairly close, and he still needs the money, so he’ll stay on the job if he gets in that one. The catch is, the closer college is harder to get in. You still need a helping hand, so you decide to talk with friends to see if anyone’s interested in working with you if Gus is not accepted in the nearest college. Eventually, you find someone who’s fine with it. Some time goes on and Gus tells you with a smile that he’s been admitted to the nearest college. You tell your other friend you won’t be needing them after all, they shrug (it wasn’t really a dream job for them). All your work has been for naught, but it wasn’t much work, and it would be far worse to have to find someone else, had Gus left.
This is a personal favorite of mine. There are instructions that essentially tell your processor “if A, jump to instruction B, else, jump to instruction C”. These are called conditional branches. The problem comes from the fact that the “A”, in “if A” might not be known immediately. It might depend on an operation that has stalled, or simply hasn’t been computed yet, as it patiently awaits a long queue of instructions. So, the “fetch” part of the CPU, rather than sitting idly by, just takes a guess, i.e. it speculates, and starts fetching/decoding/executing from there. As it does this, it “remembers” what state it was in when it started speculating. If the guess happened to be correct, the CPU keeps its changes and carries on forward. If the guess is incorrect, the CPU throws away all computations and goes back to that “remembered” state. Three cool things to point out about this: first, this can actually happen multiple times, say you take a guess, but the guess itself contains another conditional branch that you can’t resolve quite yet, you can speculate on that as well, and so on! Second, the “guess” is not random, the CPU keeps a record of previous occurrences of that branch and decides which is most likely (“for” and “while” loops appreciate this very much), and also identify patterns such as “branch taken” alternated by “branch not taken”. Third, not only can the processor predict whether a branch will be taken or not, but also to which instruction it should jump to (think function calls), keeping that on!
Conclusion
There’s quite a lot going on inside CPUs, having an entire system depend on it for the most generic computations and all. There’s a lot more to be said, but I hope this was enough to keep you appreciative of the amazing lil’ guy working full time to bring you kitten content and whatnot!
Working at Exaud is more than just a job. Want to come along for the ride? We’re always looking for great people to join us. Check our current openings on our Careers Page.