[sles-beta] Tasks get stuck with RT priority

Mon May 12 03:55:37 MDT 2014

Hi Mike, 

Thanks for the comments, which solved several of our problems. 
Happy to hear that there is an on-going  work to provide the near bare metal behaviour.
Can you please provide a pointer or resource with more information related to the on-going development of that feature. 

Regards,
Mahendra. 

-----Original Message-----
From: Mike Galbraith [mailto:mgalbraith at novell.com] 
Sent: 09 May 2014 11:19
To: Laxman, Mahendra
Cc: 'Libor Pechacek'; Indika Prasad Kumara; Michael Galbraith; sles-beta at lists.suse.com
Subject: RE: [sles-beta] Tasks get stuck with RT priority

On Fri, 2014-05-09 at 04:27 +0000, Laxman, Mahendra wrote:
> Hi Libor,
> 
> Thanks for the suggestions. We have tested them and still see the 
> kworker get stuck in the run queue. Afaik I think keeping kworker not 
> scheduled for a long time is not good. Further we found gpg is get 
> stuck when any one of the kworkes running in each core is get stuck.

Yes.  Starving your own kernel indefinitely is a bad idea.  As things stand, you simply cannot safely saturate any core.  No matter what you do, workqueues may need to run even on an isolated core, so taking a brief breath to let them run is a must.  That will change, but in the here and now, 100.0% saturation is dangerous.

One aspect of that you met, tasks on non-isolated cores may depend upon a workqueue running on the core you are monopolizing to make progress, so your rt hog can block other tasks all over the box indefinitely.

> By the way we had already tested several of the settings you have 
> suggested other than the 'echo -1
> > /proc/sys/kernel/sched_rt_runtime_us'.
> This looks promising that now stress will monopolize the given core.
> We see zero context switches of stress with this settings.

Yes, if you really want to saturate with rt tasks, turning the throttle off is a must.  You also want it turned off because it is a global throttle.  When the timer fires, the CPU running that timer will
traverse the entire box, perturbing all.   Should the timer happen to
run on your critical CPU, it will spend quite a few cycles doing that instead of doing your latency critical work.
> 
> The issue is when running gpg, It doesn't matter whether we isolated 
> the core, until the kworker is stuck, gpg will be stuck. See the stack 
> of the gpg below when it is stuck. Afaik I think flush_work is a work 
> which submitted to work queues (kworkers) of all cores by the gpg (on 
> behalf of gpg), since that work does not served in the stressed core, 
> gpg get stuck.
> 
> # cat /proc/`pidof gpg`/stack
> [<ffffffff8107c102>] flush_work+0x22/0x30 [<ffffffff8107c223>] 
> schedule_on_each_cpu+0xb3/0xf0 [<ffffffff81123055>] 
> sys_mlock+0x45/0x130 [<ffffffff81466712>] 
> system_call_fastpath+0x16/0x1b [<00007fbf0f286aa7>] 0x7fbf0f286aa7 
> [<ffffffffffffffff>] 0xffffffffffffffff

That is the exact situation I mentioned above.  Until you allow the kworker thread your rt hog is blocking to run, that synchronization will not ever complete, that pgp task is as good as dead.

What you want, but currently cannot have, is bare metal.  A feature to get as close to that as possible is underway, but not ready for primetime.  When that effort is complete, you will no longer need to run your polling task with rt policy/priority, the improved isolation will remove the competition you are trying in vain to keep completely away from your precious core.
> 
-Mike

This e-mail transmission (inclusive of any attachments) is strictly confidential and intended solely for the ordinary user of the e-mail address to which it was addressed. It may contain legally privileged and/or CONFIDENTIAL information. The unauthorized use, disclosure, distribution printing and/or copying of this e-mail or any information it contains is prohibited and could, in certain circumstances, constitute an offence. If you have received this e-mail in error or are not an intended recipient please inform the sender of the email and MillenniumIT immediately by return e-mail or telephone (+94-11) 2416000. We advise that in keeping with good computing practice, the recipient of this e-mail should ensure that it is virus free. We do not accept responsibility for any virus that may be transferred by way of this e-mail. E-mail may be susceptible to data corruption, interception and unauthorized amendment, and we do not accept liability for any such corruption, interception or amendment or any consequences thereof.  www.millenniumit.com