1. 24 Apr, 2011 1 commit
  2. 14 Apr, 2011 1 commit
  3. 03 Feb, 2011 1 commit
    • Rik van Riel's avatar
      sched: Use a buddy to implement yield_task_fair() · ac53db59
      Rik van Riel authored
      
      Use the buddy mechanism to implement yield_task_fair.  This
      allows us to skip onto the next highest priority se at every
      level in the CFS tree, unless doing so would introduce gross
      unfairness in CPU time distribution.
      
      We order the buddy selection in pick_next_entity to check
      yield first, then last, then next.  We need next to be able
      to override yield, because it is possible for the "next" and
      "yield" task to be different processen in the same sub-tree
      of the CFS tree.  When they are, we need to go into that
      sub-tree regardless of the "yield" hint, and pick the correct
      entity once we get to the right level.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ac53db59
  4. 18 Jan, 2011 3 commits
  5. 30 Nov, 2010 1 commit
    • Mike Galbraith's avatar
      sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4
      Mike Galbraith authored
      
      A recurring complaint from CFS users is that parallel kbuild has
      a negative impact on desktop interactivity.  This patch
      implements an idea from Linus, to automatically create task
      groups.  Currently, only per session autogroups are implemented,
      but the patch leaves the way open for enhancement.
      
      Implementation: each task's signal struct contains an inherited
      pointer to a refcounted autogroup struct containing a task group
      pointer, the default for all tasks pointing to the
      init_task_group.  When a task calls setsid(), a new task group
      is created, the process is moved into the new task group, and a
      reference to the preveious task group is dropped.  Child
      processes inherit this task group thereafter, and increase it's
      refcount.  When the last thread of a process exits, the
      process's reference is dropped, such that when the last process
      referencing an autogroup exits, the autogroup is destroyed.
      
      At runqueue selection time, IFF a task has no cgroup assignment,
      its current autogroup is used.
      
      Autogroup bandwidth is controllable via setting it's nice level
      through the proc filesystem:
      
        cat /proc/<pid>/autogroup
      
      Displays the task's group and the group's nice level.
      
        echo <nice level> > /proc/<pid>/autogroup
      
      Sets the task group's shares to the weight of nice <level> task.
      Setting nice level is rate limited for !admin users due to the
      abuse risk of task group locking.
      
      The feature is enabled from boot by default if
      CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
      the boot option noautogroup, and can also be turned on/off on
      the fly via:
      
        echo [01] > /proc/sys/kernel/sched_autogroup_enabled
      
      ... which will automatically move tasks to/from the root task group.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5091faa4
  6. 23 Nov, 2010 1 commit
  7. 18 Nov, 2010 1 commit
  8. 21 Jul, 2010 1 commit
  9. 27 May, 2010 1 commit
  10. 04 May, 2010 1 commit
    • Li Zefan's avatar
      sched: Fix an RCU warning in print_task() · b629317e
      Li Zefan authored
      
      With CONFIG_PROVE_RCU=y, a warning can be triggered:
      
        $ cat /proc/sched_debug
      
      ...
      kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
      ...
      
      Both cgroup_path() and task_group() should be called with either
      rcu_read_lock or cgroup_mutex held.
      
      The rcu_dereference_check() does include cgroup_lock_is_held(), so we
      know that this lock is not held.  Therefore, in a CONFIG_PREEMPT kernel,
      to say nothing of a CONFIG_PREEMPT_RT kernel, the original code could
      have ended up copying a string out of the freelist.
      
      This patch inserts RCU read-side primitives needed to prevent this
      scenario.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b629317e
  11. 02 Apr, 2010 2 commits
  12. 11 Mar, 2010 3 commits
    • Mike Galbraith's avatar
      sched: Remove avg_overlap · e12f31d3
      Mike Galbraith authored
      
      Both avg_overlap and avg_wakeup had an inherent problem in that their accuracy
      was detrimentally affected by cross-cpu wakeups, this because we are missing
      the necessary call to update_curr().  This can't be fixed without increasing
      overhead in our already too fat fastpath.
      
      Additionally, with recent load balancing changes making us prefer to place tasks
      in an idle cache domain (which is good for compute bound loads), communicating
      tasks suffer when a sync wakeup, which would enable affine placement, is turned
      into a non-sync wakeup by SYNC_LESS.  With one task on the runqueue, wake_affine()
      rejects the affine wakeup request, leaving the unfortunate where placed, taking
      frequent cache misses.
      
      Remove it, and recover some fastpath cycles.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301121.6785.30.camel@marge.simson.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e12f31d3
    • Mike Galbraith's avatar
      sched: Remove avg_wakeup · b42e0c41
      Mike Galbraith authored
      
      Testing the load which led to this heuristic (nfs4 kbuild) shows that it has
      outlived it's usefullness.  With intervening load balancing changes, I cannot
      see any difference with/without, so recover there fastpath cycles.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268301062.6785.29.camel@marge.simson.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b42e0c41
    • Lucas De Marchi's avatar
      sched: Implement group scheduler statistics in one struct · 41acab88
      Lucas De Marchi authored
      
      Put all statistic fields of sched_entity in one struct, sched_statistics,
      and embed it into sched_entity.
      
      This change allows to memset the sched_statistics to 0 when needed (for
      instance when forking), avoiding bugs of non initialized fields.
      Signed-off-by: default avatarLucas De Marchi <lucas.de.marchi@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268275065-18542-1-git-send-email-lucas.de.marchi@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      41acab88
  13. 14 Dec, 2009 1 commit
  14. 10 Dec, 2009 1 commit
    • Ingo Molnar's avatar
      sched: Remove forced2_migrations stats · b9889ed1
      Ingo Molnar authored
      
      This build warning:
      
       kernel/sched.c: In function 'set_task_cpu':
       kernel/sched.c:2070: warning: unused variable 'old_rq'
      
      Made me realize that the forced2_migrations stat looks pretty
      pointless (and a misnomer) - remove it.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b9889ed1
  15. 09 Dec, 2009 2 commits
  16. 04 Nov, 2009 1 commit
  17. 17 Sep, 2009 1 commit
    • Peter Zijlstra's avatar
      sched: Add new wakeup preemption mode: WAKEUP_RUNNING · ad4b78bb
      Peter Zijlstra authored
      
      Create a new wakeup preemption mode, preempt towards tasks that run
      shorter on avg. It sets next buddy to be sure we actually run the task
      we preempted for.
      
      Test results:
      
       root@twins:~# while :; do :; done &
       [1] 6537
       root@twins:~# while :; do :; done &
       [2] 6538
       root@twins:~# while :; do :; done &
       [3] 6539
       root@twins:~# while :; do :; done &
       [4] 6540
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max          4750 usec
              Avg           497 usec
              Stdev         737 usec
      
       root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max            14 usec
              Avg             5 usec
              Stdev           3 usec
      
      Disabled by default - needs more testing.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      LKML-Reference: <new-submission>
      ad4b78bb
  18. 02 Sep, 2009 1 commit
    • Arjan van de Ven's avatar
      sched: Provide iowait counters · 8f0dfc34
      Arjan van de Ven authored
      
      For counting how long an application has been waiting for
      (disk) IO, there currently is only the HZ sample driven
      information available, while for all other counters in this
      class, a high resolution version is available via
      CONFIG_SCHEDSTATS.
      
      In order to make an improved bootchart tool possible, we also
      need a higher resolution version of the iowait time.
      
      This patch below adds this scheduler statistic to the kernel.
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4A64B813.1080506@linux.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8f0dfc34
  19. 17 Jun, 2009 1 commit
  20. 24 Mar, 2009 1 commit
  21. 18 Mar, 2009 1 commit
  22. 15 Jan, 2009 1 commit
  23. 11 Jan, 2009 1 commit
  24. 01 Dec, 2008 1 commit
  25. 16 Nov, 2008 1 commit
    • Ingo Molnar's avatar
      sched: fix kernel warning on /proc/sched_debug access · 29d7b90c
      Ingo Molnar authored
      Luis Henriques reported that with CONFIG_PREEMPT=y + CONFIG_PREEMPT_DEBUG=y +
      CONFIG_SCHED_DEBUG=y + CONFIG_LATENCYTOP=y enabled, the following warning
      triggers when using latencytop:
      
      > [  775.663239] BUG: using smp_processor_id() in preemptible [00000000] code: latencytop/6585
      > [  775.663303] caller is native_sched_clock+0x3a/0x80
      > [  775.663314] Pid: 6585, comm: latencytop Tainted: G        W 2.6.28-rc4-00355-g9c7c3546 #1
      > [  775.663322] Call Trace:
      > [  775.663343]  [<ffffffff803a94e4>] debug_smp_processor_id+0xe4/0xf0
      > [  775.663356]  [<ffffffff80213f7a>] native_sched_clock+0x3a/0x80
      > [  775.663368]  [<ffffffff80213e19>] sched_clock+0x9/0x10
      > [  775.663381]  [<ffffffff8024550d>] proc_sched_show_task+0x8bd/0x10e0
      > [  775.663395]  [<ffffffff8034466e>] sched_show+0x3e/0x80
      > [  775.663408]  [<ffffffff8031039b>] seq_read+0xdb/0x350
      > [  775.663421]  [<ffffffff80368776>] ? security_file_permission+0x16/0x20
      > [  775.663435]  [<ffffffff802f4198>] vfs_read+0xc8/0x170
      > [  775.663447]  [<ffffffff802f4335>] sys_read+0x55/0x90
      > [  775.663460]  [<ffffffff8020c67a>] system_call_fastpath+0x16/0x1b
      > ...
      
      This breakage was caused by me via:
      
        7cbaef9c
      
      : sched: optimize sched_clock() a bit
      
      Change the calls to cpu_clock().
      Reported-by: default avatarLuis Henriques <henrix@sapo.pt>
      29d7b90c
  26. 11 Nov, 2008 1 commit
    • Bharata B Rao's avatar
      sched: include group statistics in /proc/sched_debug · ff9b48c3
      Bharata B Rao authored
      
      Impact: extend /proc/sched_debug info
      
      Since the statistics of a group entity isn't exported directly from the
      kernel, it becomes difficult to obtain some of the group statistics.
      For example, the current method to obtain exec time of a group entity
      is not always accurate. One has to read the exec times of all
      the tasks(/proc/<pid>/sched) in the group and add them. This method
      fails (or becomes difficult) if we want to collect stats of a group over
      a duration where tasks get created and terminated.
      
      This patch makes it easier to obtain group stats by directly including
      them in /proc/sched_debug. Stats like group exec time would help user
      programs (like LTP) to accurately measure the group fairness.
      
      An example output of group stats from /proc/sched_debug:
      
      cfs_rq[3]:/3/a/1
        .exec_clock                    : 89.598007
        .MIN_vruntime                  : 0.000001
        .min_vruntime                  : 256300.970506
        .max_vruntime                  : 0.000001
        .spread                        : 0.000000
        .spread0                       : -25373.372248
        .nr_running                    : 0
        .load                          : 0
        .yld_exp_empty                 : 0
        .yld_act_empty                 : 0
        .yld_both_empty                : 0
        .yld_count                     : 4474
        .sched_switch                  : 0
        .sched_count                   : 40507
        .sched_goidle                  : 12686
        .ttwu_count                    : 15114
        .ttwu_local                    : 11950
        .bkl_count                     : 67
        .nr_spread_over                : 0
        .shares                        : 0
        .se->exec_start                : 113676.727170
        .se->vruntime                  : 1592.612714
        .se->sum_exec_runtime          : 89.598007
        .se->wait_start                : 0.000000
        .se->sleep_start               : 0.000000
        .se->block_start               : 0.000000
        .se->sleep_max                 : 0.000000
        .se->block_max                 : 0.000000
        .se->exec_max                  : 1.000282
        .se->slice_max                 : 1.999750
        .se->wait_max                  : 54.981093
        .se->wait_sum                  : 217.610521
        .se->wait_count                : 50
        .se->load.weight               : 2
      Signed-off-by: default avatarBharata B Rao <bharata@linux.vnet.ibm.com>
      Acked-by: default avatarSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Acked-by: default avatarDhaval Giani <dhaval@linux.vnet.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ff9b48c3
  27. 10 Nov, 2008 1 commit
  28. 04 Nov, 2008 1 commit
  29. 30 Oct, 2008 1 commit
  30. 10 Oct, 2008 1 commit
    • Lai Jiangshan's avatar
      [PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock() · a6bebbc8
      Lai Jiangshan authored
      
      lock_task_sighand() make sure task->sighand is being protected,
      so we do not need rcu_read_lock().
      [ exec() will get task->sighand->siglock before change task->sighand! ]
      
      But code using rcu_read_lock() _just_ to protect lock_task_sighand()
      only appear in procfs. (and some code in procfs use lock_task_sighand()
      without such redundant protection.)
      
      Other subsystem may put lock_task_sighand() into rcu_read_lock()
      critical region, but these rcu_read_lock() are used for protecting
      "for_each_process()", "find_task_by_vpid()" etc. , not for protecting
      lock_task_sighand().
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      [ok from Oleg]
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      a6bebbc8
  31. 27 Jun, 2008 2 commits
  32. 20 Jun, 2008 1 commit
  33. 29 May, 2008 1 commit