[sles-beta] Tasks get stuck with RT priority
Libor Pechacek
lpechacek at suse.com
Tue May 6 06:36:18 MDT 2014
Hello Indika,
On Wed 30-04-14 08:29:55, Indika Prasad Kumara wrote:
> I have a X3650 M4 (16core 64GB RAM....). After fresh installation of SLES12,
> I run following command (as root).
>
> taskset -c 2 chrt 99 ./stress -c 1
>
> this command will run the stress binary on CPU 2 with RT priority. stress -c
> 1 will spawn one thread that does a while(true); - endless loop, taking 100%
> CPU.
>
> Note that there are 15 other CPUs free, after 10 ~ 15 seconds, we can see
> couple of kworkers come and hang in "R" state on CPU 2.
>
> I know running 100% with RT priority is considered bad, but here I'm running
> a controlled setup, which one thread runs with pure CPU bound load. Why are
> kworkers get scheduled on this core when there are 15 other free cores
> available ?
Because RT processes are not allowed to monopolize the CPU by default. But the
system can be adjusted.
We looked into it with a colleague and came to the following set of
recommendations for the scenario you describe:
1) set "isolcpus=2" on kernel command line during boot, or update your boot
loader configuration to include this parameter
(update the value in case you want to reserve some other/more CPUs)
2) allow RT task to run for unlimited time period:
"echo -1 > /proc/sys/kernel/sched_rt_runtime_us"
At this stage you should see almost undisturbed process run.
Further fine tuning may be done as follows:
3) "echo 0 > /proc/sys/kernel/nmi_watchdog" and
"echo 0 > /proc/sys/kernel/watchdog" disable watchdog functionality on the
system
4) "echo 0 > /sys/devices/system/machinecheck/machinecheck2/check_interval"
disable MCE checking on the CPU
5) "echo 9999999 > /proc/sys/vm/stat_interval" increase the interval between VM
stats collection
HTH,
Libor
> This is a major problem for us and needs to be looked at. We want to run our
> low latency application on SLES 12 with RT priority and app will utilize 100%
> CPU on some known cores. Currently when we do this there are bunch of
> kworkers get stuck even on cores where the load is 100% cpu bound.
>
> How can we resolve this ?
>
> Thanks,
> Indika
>
>
>
>
> This e-mail transmission (inclusive of any attachments) is strictly confidential and intended solely for the ordinary user of the e-mail address to which it was addressed. It may contain legally privileged and/or CONFIDENTIAL information. The unauthorized use, disclosure, distribution printing and/or copying of this e-mail or any information it contains is prohibited and could, in certain circumstances, constitute an offence. If you have received this e-mail in error or are not an intended recipient please inform the sender of the email and MillenniumIT immediately by return e-mail or telephone (+94-11) 2416000. We advise that in keeping with good computing practice, the recipient of this e-mail should ensure that it is virus free. We do not accept responsibility for any virus that may be transferred by way of this e-mail. E-mail may be susceptible to data corruption, interception and unauthorized amendment, and we do not accept liability for any such corruption, interception or amendment or any consequences thereof. www.millenniumit.com
>
> _______________________________________________
> sles-beta mailing list
> sles-beta at lists.suse.com
> http://lists.suse.com/mailman/listinfo/sles-beta
--
Libor Pechacek
Project Manager SUSE Labs, Prague
More information about the sles-beta
mailing list