Jump to content

Multi-processor Scheduling


Glenn9999

Recommended Posts

How does Windows XP (or any Windows) handle having multiple processors/cores when a program is run? I have a program that places load onto a CPU. I figured that since this program is not thread-oriented that it would pick a processor (the one with the least load from documentation I have read) and then run the program on it. But I'm noticing that when I observe the CPU load under task manager that it's putting load on both CPUs (50% total), and the graphs for both CPUs are bouncing around wildly.

So, help me to understand what is happening to cause load on both CPUs from a single process. Is this an OS function, or a CPU function (other CPU isn't busy so shunt instructions to it as appropriate)? Any good references are appreciated as well, since I was unable to find anything with a good answer upon searching.

Note #1: I realize this could fit in one of 2 or 3 forums here (Windows XP, Hardware, Programming), so hopefully there isn't a problem that I selected this one.

Note #2: I know all about how to set code to run on a specific processor and change its priority, so that's no issue - I rewrote the program to be able to place 100% load on a specific CPU and it does that wonderfully.

Edited by Glenn9999
Link to comment
Share on other sites


As to threads on multple CPUs, if the application is threadsafe and multi-threaded, it can run any thread on any CPU unless designed (or directed with processor affinity) otherwise. Also, the OS itself is threadsafe, so it (generally) tries to schedule it's load across multiple CPUs as well. Some of the caveats are DPC handling and the actual scheduler, which generally run off of CPU 0 in XP, but for the most part a multithreaded threadsafe app will run any thread on any available CPU. Also, you'll find that the XP scheduler changes between real cores and hyperthreaded (virtual) CPUs as well, and will (in general) actively try to schedule a thread on a real processor, falling back to a virtual processor only if no physicals are available and the application dev hasn't explicitly written the app to deny running on a virtual CPU. There's also "favorite" CPU execution to consider, where a thread that starts running on a specific CPU will tend to keep running on that CPU as it's time slice comes up over and over unless certain conditions are met (CPU's current queue, priority levels for objects in the queue, whether or not another CPU is less busy, or even idle, etc). Then there's the balance set manager...

There's a lot more that goes into thread affinity and the CPU scheduler in XP, and I'm realizing it's probably a bit more than I can think to type at 2AM. Do you have access to the Windows Internals book(s) by Mark Russinovich? He explains it far better than I will, although if you want an online explanation I can try again tomorrow ;). If you've got the 4th edition of the book, an explanation starts on page 323 ;).

Link to comment
Share on other sites

Very simplified (it is an OS function):

All the threads on the system in the "ready (to run)" state sit in a single list on XP - this is where the scheduler will pick the next thread to run from when the currently running thread enters the "wait" state or reaches the end of its allotted time slice (quantum).

So if the number of threads "ready to run" is greater than the number of logical processors, threads will bounce around processors as they reach the head of the list and the processors become available through pre-emptive scheduling.

(With Windows Server 2003 this single list was replaced with per-processor lists, which makes better use of the CPU cache and reduces delays by having to lock the system-wide dispatcher database - this does not mean that processors can go idle when their list alone is empty, however, as they can yank threads from other processors' lists.)

For an excellent, detailed description of the whole design, see Windows Internals 4th Edition - chapter 6 : Processes, Threads and Jobs.

(The 4th edition covers up to Windows Server 2003, and the new 5th edition covers Windows Vista/Server 2008 as there have been major redesigns under the hood.)

Edit: What cluberti said :)

*mental note, preview posts before submitting*

Link to comment
Share on other sites

I understand about the thread processing part and all of that. What I'm not understanding is that it's going to both CPUs with a single-threaded, single process, which has no OS API calls (of course it could have something in the background I'm not thinking of) or any other interaction. The code load is basically:

while 1 = 1 do

begin

i := i + 1;

i := i -1;

end;

Maybe I'm just not seeing something that's going on?

Link to comment
Share on other sites

Your single thread is one of many that is (most of the time) "ready to run", so it pre-emptively shares all the available CPU time with every other thread in the same state - it doesn't just get associated with a processor and run forever until it decides to stop.

(Physical) processor affinity can have some value because of the ability to get high L2 cache hits, but ultimately if your thread's turn comes around to get some processor time, you want it to use whichever processor is available right now, not wait around for a specific one.

You have to understand that the system has threads of its own, then there are services running in the background, then the shell itself has threads of course... your apps' threads are not the only ones that need love :)

Link to comment
Share on other sites

(Physical) processor affinity can have some value because of the ability to get high L2 cache hits, but ultimately if your thread's turn comes around to get some processor time, you want it to use whichever processor is available right now, not wait around for a specific one.

I suppose this could be the case (and yes I understand all that other stuff) of why a single thread is putting load on two processors. Hopefully I can get some confirmation, but one thing I am realizing quick is that special measures have to be used to get the most out of a multi-core processor, especially since this one thread is only managing a sporadic load on each processor at 50% total. I look to do some benchmarks when I can get around to it when it comes to performance of such things.

Link to comment
Share on other sites

The nice thing with Windows is that it's a symmetric multi-processing (SMP) OS, so you know every thread on the system is treated equally (taking into exception process priorities) - it sounds like you would find the history of the thread scheduler (and possily memory manager) an interesting read, I recommend strongly that you take a look at Windows Internals.

One common misconception is that dual-proc systems are twice as fast as uni-proc counterparts - the performance can't scale linearly due to controlled access to shared data structures and synchronization between processors - but Windows does take advantage of multiple processors itself, even if none of your applications do.

Up to Windows Server 2003, the OS loaded either the uni-proc (UP) or multi-proc (MP) kernel at startup - from Vista onwards there is only 1 kernel now, and with Windows 7 there have been some cool advances like getting rid of the dispatcher lock entirely.

This webcast from Mark Russinovich is interesting viewing too.

Link to comment
Share on other sites

One common misconception is that dual-proc systems are twice as fast as uni-proc counterparts - the performance can't scale linearly due to controlled access to shared data structures and synchronization between processors - but Windows does take advantage of multiple processors itself, even if none of your applications do.

Indeed. I know I've found that misconception rampant within all the "should I get the dual-core machine" threads I've encountered, when I've gone in and tried to explain that they wouldn't get much of a performance increase in anything over an equivalent single-core CPU. While there is a responsiveness bump given that Windows would have multiple processors to call on in thread scheduling, multi-processing only works if a certain set of criteria are met.

Most algorithms are dependent & linear, which makes having the multi-processing a moot proposition. But it is a performance boost for those things where all the criteria are met for multiprocessing through the design of the algorithm. Consequently, you do find multi-processing being used where the algorithms meet the criteria, such as image processing and database work. I'm sure it will be headed to games eventually if it hasn't already. But seriously, most applications won't benefit from it.

FWIW, I did the benchmark that I was talking about. Truthfully it seems to make no difference (statistically) whether you let Windows send the thread where ever it wants or you isolate it to a specific CPU. I suppose the only advantage in allowing it to go to *any* CPU is to allow the OS some flexibility in thread scheduling.

Here's the times in sorting 100M Double Words using Quicksort.

ANY: 25234 ms
CPU1: 25343 ms
CPU2: 25266 ms

When I get to it, I plan on trying out a multi-processing version. (I selected Quicksort because the algorithm is almost insanely suited to multi-processing environments)

Definitely fun stuff...

Edited by Glenn9999
Link to comment
Share on other sites

When I get to it, I plan on trying out a multi-processing version. (I selected Quicksort because the algorithm is almost insanely suited to multi-processing environments)

Okay I did that, and in the process got the answer to my original question. You have to explicitly set things up to use both processors within the API and within your algorithm in order to get a performance pop out of it. I ended up getting around 12000ms-24000ms on my Quicksort benchmark when I did this. The specific time for each case is dependent on how much the Quicksort kept both processors busy.

Question answered. :thumbup

Link to comment
Share on other sites

While I haven't been able to see any noticeable performance increase from locking a single-threaded apps's thread affinity to one CPU, I do wonder if it might have other advantages - like, power management on more recent CPUs?

Link to comment
Share on other sites

While I haven't been able to see any noticeable performance increase from locking a single-threaded apps's thread affinity to one CPU, I do wonder if it might have other advantages - like, power management on more recent CPUs?

Good question. I would definitely be interested in knowing how power consumption would change depending on how the CPUs get used. I would think though, as long as both remain powered up & fully available, that both would be consuming power. I wonder, though, if there's a way to throttle down a CPU/core to conserve power if you put load on a certain set of cores more than others (i.e. core 1&2 normally used, core 3&4 seldom used in a 4-core processor). But then again, that scenario I described would almost invite some time in task manager/whatever to set some process affinities to those two cores.

I just did read in some MS-connected there would be some benefit to tying an S.-T. app to a single processor in heavy loads, since if a process switches processors it would have to reload all its flags. My guess is that doing that is so negligible in performance cost as to not be a concern, though.

A process thread can migrate from processor to processor, with each migration reloading the processor cache. Specifying a processor for a thread can improve performance under heavy system loads by reducing the number of times the processor cache is reloaded."
Link to comment
Share on other sites

There is already FSB throttling to have dynamic power consumption under load, but in effect what you are talking about here is core parking, and it's already in Windows 7 / Server 2008 R2:

http://blogs.technet.com/mattmcspirit/arch...-in-action.aspx

A lot of work was done in NT 6.0 for power management, and improved even further in NT 6.1.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...