[sles-beta] Tasks get stuck with RT priority

Libor Pechacek lpechacek at suse.com
Tue May 6 06:36:18 MDT 2014


Hello Indika,

On Wed 30-04-14 08:29:55, Indika Prasad Kumara wrote:
> I have a X3650 M4 (16core 64GB RAM....). After fresh installation of SLES12,
> I run following command (as root).
> 
> taskset -c 2 chrt 99 ./stress -c 1
> 
> this command will run the stress binary on CPU 2 with RT priority. stress -c
> 1 will spawn one thread that does a while(true); - endless loop, taking 100%
> CPU.
> 
> Note that there are 15 other CPUs free, after 10 ~ 15 seconds, we can see
> couple of kworkers come and hang in "R" state on CPU 2.
> 
> I know running 100% with RT priority is considered bad, but here I'm running
> a controlled setup, which one thread runs with pure CPU bound load. Why are
> kworkers get scheduled on this core when there are 15 other free cores
> available ?

Because RT processes are not allowed to monopolize the CPU by default.  But the
system can be adjusted.

We looked into it with a colleague and came to the following set of
recommendations for the scenario you describe:
1) set "isolcpus=2" on kernel command line during boot, or update your boot
   loader configuration to include this parameter
   (update the value in case you want to reserve some other/more CPUs)
2) allow RT task to run for unlimited time period:
   "echo -1 > /proc/sys/kernel/sched_rt_runtime_us"

At this stage you should see almost undisturbed process run.

Further fine tuning may be done as follows:
3) "echo 0 > /proc/sys/kernel/nmi_watchdog" and
   "echo 0 > /proc/sys/kernel/watchdog" disable watchdog functionality on the
   system
4) "echo 0 > /sys/devices/system/machinecheck/machinecheck2/check_interval"
   disable MCE checking on the CPU
5) "echo 9999999 > /proc/sys/vm/stat_interval" increase the interval between VM
   stats collection

HTH,
	Libor

> This is a major problem for us and needs to be looked at. We want to run our
> low latency application on SLES 12 with RT priority and app will utilize 100%
> CPU on some known cores. Currently when we do this there are bunch of
> kworkers get stuck even on cores where the load is 100% cpu bound.
> 
> How can we resolve this ?
> 
> Thanks,
> Indika
> 
> 
> 
> 
> This e-mail transmission (inclusive of any attachments) is strictly confidential and intended solely for the ordinary user of the e-mail address to which it was addressed. It may contain legally privileged and/or CONFIDENTIAL information. The unauthorized use, disclosure, distribution printing and/or copying of this e-mail or any information it contains is prohibited and could, in certain circumstances, constitute an offence. If you have received this e-mail in error or are not an intended recipient please inform the sender of the email and MillenniumIT immediately by return e-mail or telephone (+94-11) 2416000. We advise that in keeping with good computing practice, the recipient of this e-mail should ensure that it is virus free. We do not accept responsibility for any virus that may be transferred by way of this e-mail. E-mail may be susceptible to data corruption, interception and unauthorized amendment, and we do not accept liability for any such corruption, interception or amendment or any consequences thereof.  www.millenniumit.com 
> 

> _______________________________________________
> sles-beta mailing list
> sles-beta at lists.suse.com
> http://lists.suse.com/mailman/listinfo/sles-beta


-- 
Libor Pechacek
Project Manager SUSE Labs, Prague


More information about the sles-beta mailing list