1. 18 Oct, 2011 1 commit
    • Peter Zijlstra's avatar
      cputimer: Cure lock inversion · bcd5cff7
      Peter Zijlstra authored
      There's a lock inversion between the cputimer->lock and rq->lock;
      notably the two callchains involved are:
      
       update_rlimit_cpu()
         sighand->siglock
         set_process_cpu_timer()
           cpu_timer_sample_group()
             thread_group_cputimer()
               cputimer->lock
               thread_group_cputime()
                 task_sched_runtime()
                   ->pi_lock
                   rq->lock
      
       scheduler_tick()
         rq->lock
         task_tick_fair()
           update_curr()
             account_group_exec()
               cputimer->lock
      
      Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
      the second one is keeping up-to-date.
      
      This problem was introduced by e8abccb7
      
       ("posix-cpu-timers: Cure
      SMP accounting oddities").
      
      Cure the problem by removing the cputimer->lock and rq->lock nesting,
      this leaves concurrent enablers doing duplicate work, but the time
      wasted should be on the same order otherwise wasted spinning on the
      lock and the greater-than assignment filter should ensure we preserve
      monotonicity.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Reported-by: default avatarSimon Kirby <sim@hostway.ca>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twins
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      bcd5cff7
  2. 30 Sep, 2011 1 commit
    • Peter Zijlstra's avatar
      posix-cpu-timers: Cure SMP wobbles · d670ec13
      Peter Zijlstra authored
      
      David reported:
      
        Attached below is a watered-down version of rt/tst-cpuclock2.c from
        GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
        similar.
      
        Run it several times, and you will see cases where the main thread
        will measure a process clock difference before and after the nanosleep
        which is smaller than the cpu-burner thread's individual thread clock
        difference.  This doesn't make any sense since the cpu-burner thread
        is part of the top-level process's thread group.
      
        I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
        64-bit binaries).
      
        For example:
      
        [davem@boricha build-x86_64-linux]$ ./test
        process: before(0.001221967) after(0.498624371) diff(497402404)
        thread:  before(0.000081692) after(0.498316431) diff(498234739)
        self:    before(0.001223521) after(0.001240219) diff(16698)
        [davem@boricha build-x86_64-linux]$ 
      
        The diff of 'process' should always be >= the diff of 'thread'.
      
        I make sure to wrap the 'thread' clock measurements the most tightly
        around the nanosleep() call, and that the 'process' clock measurements
        are the outer-most ones.
      
        ---
        #include <unistd.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <time.h>
        #include <fcntl.h>
        #include <string.h>
        #include <errno.h>
        #include <pthread.h>
      
        static pthread_barrier_t barrier;
      
        static void *chew_cpu(void *arg)
        {
      	  pthread_barrier_wait(&barrier);
      	  while (1)
      		  __asm__ __volatile__("" : : : "memory");
      	  return NULL;
        }
      
        int main(void)
        {
      	  clockid_t process_clock, my_thread_clock, th_clock;
      	  struct timespec process_before, process_after;
      	  struct timespec me_before, me_after;
      	  struct timespec th_before, th_after;
      	  struct timespec sleeptime;
      	  unsigned long diff;
      	  pthread_t th;
      	  int err;
      
      	  err = clock_getcpuclockid(0, &process_clock);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_init(&barrier, NULL, 2);
      	  err = pthread_create(&th, NULL, chew_cpu, NULL);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(th, &th_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_wait(&barrier);
      
      	  err = clock_gettime(process_clock, &process_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(th_clock, &th_before);
      	  if (err)
      		  return 1;
      
      	  sleeptime.tv_sec = 0;
      	  sleeptime.tv_nsec = 500000000;
      	  nanosleep(&sleeptime, NULL);
      
      	  err = clock_gettime(th_clock, &th_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(process_clock, &process_after);
      	  if (err)
      		  return 1;
      
      	  diff = process_after.tv_nsec - process_before.tv_nsec;
      	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 process_before.tv_sec, process_before.tv_nsec,
      		 process_after.tv_sec, process_after.tv_nsec, diff);
      	  diff = th_after.tv_nsec - th_before.tv_nsec;
      	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 th_before.tv_sec, th_before.tv_nsec,
      		 th_after.tv_sec, th_after.tv_nsec, diff);
      	  diff = me_after.tv_nsec - me_before.tv_nsec;
      	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 me_before.tv_sec, me_before.tv_nsec,
      		 me_after.tv_sec, me_after.tv_nsec, diff);
      
      	  return 0;
        }
      
      This is due to us using p->se.sum_exec_runtime in
      thread_group_cputime() where we iterate the thread group and sum all
      data. This does not take time since the last schedule operation (tick
      or otherwise) into account. We can cure this by using
      task_sched_runtime() at the cost of having to take locks.
      
      This also means we can (and must) do away with
      thread_group_sched_runtime() since the modified thread_group_cputime()
      is now more accurate and would deadlock when called from
      thread_group_sched_runtime().
      
      Aside of that it makes the function safe on 32 bit systems. The old
      code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
      64bit value and could be changed on another cpu at the same time.
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
      
      Tested-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      d670ec13
  3. 13 Sep, 2011 1 commit
  4. 08 Sep, 2011 1 commit
    • Peter Zijlstra's avatar
      posix-cpu-timers: Cure SMP accounting oddities · e8abccb7
      Peter Zijlstra authored
      
      David reported:
      
        Attached below is a watered-down version of rt/tst-cpuclock2.c from
        GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
        similar.
      
        Run it several times, and you will see cases where the main thread
        will measure a process clock difference before and after the nanosleep
        which is smaller than the cpu-burner thread's individual thread clock
        difference.  This doesn't make any sense since the cpu-burner thread
        is part of the top-level process's thread group.
      
        I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
        64-bit binaries).
      
        For example:
      
        [davem@boricha build-x86_64-linux]$ ./test
        process: before(0.001221967) after(0.498624371) diff(497402404)
        thread:  before(0.000081692) after(0.498316431) diff(498234739)
        self:    before(0.001223521) after(0.001240219) diff(16698)
        [davem@boricha build-x86_64-linux]$
      
        The diff of 'process' should always be >= the diff of 'thread'.
      
        I make sure to wrap the 'thread' clock measurements the most tightly
        around the nanosleep() call, and that the 'process' clock measurements
        are the outer-most ones.
      
        ---
        #include <unistd.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <time.h>
        #include <fcntl.h>
        #include <string.h>
        #include <errno.h>
        #include <pthread.h>
      
        static pthread_barrier_t barrier;
      
        static void *chew_cpu(void *arg)
        {
      	  pthread_barrier_wait(&barrier);
      	  while (1)
      		  __asm__ __volatile__("" : : : "memory");
      	  return NULL;
        }
      
        int main(void)
        {
      	  clockid_t process_clock, my_thread_clock, th_clock;
      	  struct timespec process_before, process_after;
      	  struct timespec me_before, me_after;
      	  struct timespec th_before, th_after;
      	  struct timespec sleeptime;
      	  unsigned long diff;
      	  pthread_t th;
      	  int err;
      
      	  err = clock_getcpuclockid(0, &process_clock);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_init(&barrier, NULL, 2);
      	  err = pthread_create(&th, NULL, chew_cpu, NULL);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(th, &th_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_wait(&barrier);
      
      	  err = clock_gettime(process_clock, &process_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(th_clock, &th_before);
      	  if (err)
      		  return 1;
      
      	  sleeptime.tv_sec = 0;
      	  sleeptime.tv_nsec = 500000000;
      	  nanosleep(&sleeptime, NULL);
      
      	  err = clock_gettime(th_clock, &th_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(process_clock, &process_after);
      	  if (err)
      		  return 1;
      
      	  diff = process_after.tv_nsec - process_before.tv_nsec;
      	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 process_before.tv_sec, process_before.tv_nsec,
      		 process_after.tv_sec, process_after.tv_nsec, diff);
      	  diff = th_after.tv_nsec - th_before.tv_nsec;
      	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 th_before.tv_sec, th_before.tv_nsec,
      		 th_after.tv_sec, th_after.tv_nsec, diff);
      	  diff = me_after.tv_nsec - me_before.tv_nsec;
      	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 me_before.tv_sec, me_before.tv_nsec,
      		 me_after.tv_sec, me_after.tv_nsec, diff);
      
      	  return 0;
        }
      
      This is due to us using p->se.sum_exec_runtime in
      thread_group_cputime() where we iterate the thread group and sum all
      data. This does not take time since the last schedule operation (tick
      or otherwise) into account. We can cure this by using
      task_sched_runtime() at the cost of having to take locks.
      
      This also means we can (and must) do away with
      thread_group_sched_runtime() since the modified thread_group_cputime()
      is now more accurate and would deadlock when called from
      thread_group_sched_runtime().
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
      
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      e8abccb7
  5. 23 May, 2011 1 commit
  6. 31 Mar, 2011 1 commit
  7. 02 Feb, 2011 7 commits
  8. 10 Nov, 2010 1 commit
    • Sergey Senozhatsky's avatar
      posix-cpu-timers: Rcu_read_lock/unlock protect find_task_by_vpid call · c0deae8c
      Sergey Senozhatsky authored
      Commit 4221a991 "Add RCU check for
      find_task_by_vpid()" introduced rcu_lockdep_assert to find_task_by_pid_ns.
      Add rcu_read_lock/rcu_read_unlock to call find_task_by_vpid.
      
      Tetsuo Handa wrote:
      | Quoting from one of posts in that thead
      | http://kerneltrap.org/mailarchive/linux-kernel/2010/2/8/4536388
      
      
      |
      || Usually tasklist gives enough protection, but if copy_process() fails
      || it calls free_pid() lockless and does call_rcu(delayed_put_pid().
      || This means, without rcu lock find_pid_ns() can't scan the hash table
      || safely.
      
      Thomas Gleixner wrote:
      | We can remove the tasklist_lock while at it. rcu_read_lock is enough.
      
      Patch also replaces thread_group_leader with has_group_leader_pid
      in accordance to comment by Oleg Nesterov:
      
      | ... thread_group_leader() check is not relaible without 
      | tasklist. If we race with de_thread() find_task_by_vpid() can find
      | the new leader before it updates its ->group_leader.
      |
      | perhaps it makes sense to change posix_cpu_timer_create() to use 
      | has_group_leader_pid() instead, just to make this code not look racy
      | and avoid adding new problems.
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      LKML-Reference: <20101103165256.GD30053@swordfish.minsk.epam.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c0deae8c
  9. 16 Jul, 2010 1 commit
  10. 18 Jun, 2010 3 commits
    • Oleg Nesterov's avatar
      sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check() · 8d1f431c
      Oleg Nesterov authored
      
      fastpath_timer_check()->thread_group_cputimer() is racy and
      unneeded.
      
      It is racy because another thread can clear ->running before
      thread_group_cputimer() takes cputimer->lock. In this case
      thread_group_cputimer() will set ->running = true again and call
      thread_group_cputime(). But since we do not hold tasklist or
      siglock, we can race with fork/exit and copy the wrong results
      into cputimer->cputime.
      
      It is unneeded because if ->running == true we can just use
      the numbers in cputimer->cputime we already have.
      
      Change fastpath_timer_check() to copy cputimer->cputime into
      the local variable under cputimer->lock. We do not re-check
      ->running under cputimer->lock, run_posix_cpu_timers() does
      this check later.
      
      Note: we can add more optimizations on top of this change.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100611180446.GA13025@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8d1f431c
    • Oleg Nesterov's avatar
      sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand() · 0bdd2ed4
      Oleg Nesterov authored
      run_posix_cpu_timers() doesn't work if current has already passed
      exit_notify(). This was needed to prevent the races with do_wait().
      
      Since ea6d290c
      
       ->signal is always valid and can't go away. We can
      remove the "tsk->exit_state == 0" in fastpath_timer_check() and
      convert run_posix_cpu_timers() to use lock_task_sighand().
      
      Note: it makes sense to take group_leader's sighand instead, the
      sub-thread still uses CPU after release_task(). But we need more
      changes to do this.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100610231018.GA25942@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0bdd2ed4
    • Oleg Nesterov's avatar
      sched: thread_group_cputime: Simplify, document the "alive" check · bfac7009
      Oleg Nesterov authored
      thread_group_cputime() looks as if it is rcu-safe, but in fact this
      was wrong until ea6d290c which pins task->signal to task_struct.
      It checks ->sighand != NULL under rcu, but this can't help if ->signal
      can go away. Fortunately the caller either holds ->siglock, or it is
      fastpath_timer_check() which uses current and checks exit_state == 0.
      
      - Since ea6d290c
      
       commit tsk->signal is stable, we can read it first
        and avoid the initialization from INIT_CPUTIME.
      
      - Even if tsk->signal is always valid, we still have to check it
        is safe to use next_thread() under rcu_read_lock(). Currently
        the code checks ->sighand != NULL, change it to use pid_alive()
        which is commonly used to ensure the task wasn't unhashed before
        we take rcu_read_lock().
      
        Add the comment to explain this check.
      
      - Change the main loop to use the while_each_thread() helper.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100610230956.GA25921@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bfac7009
  11. 27 May, 2010 1 commit
    • Oleg Nesterov's avatar
      posix-cpu-timers: avoid "task->signal != NULL" checks · d30fda35
      Oleg Nesterov authored
      
      Preparation to make task->signal immutable, no functional changes.
      
      posix-cpu-timers.c checks task->signal != NULL to ensure this task is
      alive and didn't pass __exit_signal().  This is correct but we are going
      to change the lifetime rules for ->signal and never reset this pointer.
      
      Change the code to check ->sighand instead, it doesn't matter which
      pointer we check under tasklist, they both are cleared simultaneously.
      
      As Roland pointed out, some of these changes are not strictly needed and
      probably it makes sense to revert them later, when ->signal will be pinned
      to task_struct.  But this patch tries to ensure the subsequent changes in
      fork/exit can't make any visible impact on posix cpu timers.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d30fda35
  12. 10 May, 2010 1 commit
    • Stanislaw Gruszka's avatar
      posix-cpu-timers: Optimize run_posix_cpu_timers() · 29f87b79
      Stanislaw Gruszka authored
      
      We can optimize and simplify things taking into account signal->cputimer
      is always running when we have configured any process wide cpu timer.
      
      In check_process_timers(), we don't have to check if new updated value of
      signal->cputime_expires is smaller, since we maintain new first expiration
      time ({prof,virt,sched}_expires) in code flow and all other writes to
      expiration cache are protected by sighand->siglock .
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      29f87b79
  13. 12 Mar, 2010 6 commits
    • Stanislaw Gruszka's avatar
      cpu-timers: Avoid iterating over all threads in fastpath_timer_check() · c2873937
      Stanislaw Gruszka authored
      
      Spread p->sighand->siglock locking scope to make sure that
      fastpath_timer_check() never iterates over all threads. Without
      locking there is small possibility that signal->cputimer will stop
      running while we write values to signal->cputime_expires.
      
      Calling thread_group_cputime() from fastpath_timer_check() is not only
      bad because it is slow, also it is racy with __exit_signal() which can
      lead to invalid signal->{s,u}time values.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c2873937
    • Stanislaw Gruszka's avatar
      cpu-timers: Change SIGEV_NONE timer implementation · 1f169f84
      Stanislaw Gruszka authored
      
      When user sets up a timer without associated signal and process does
      not use any other cpu timers and does not exit, tsk->signal->cputimer
      is enabled and running forever.
      
      Avoid running the timer for no reason.
      
      I used below program to check patch does not break current user space
      visible behavior.
      
       #include <sys/time.h>
       #include <signal.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <time.h>
       #include <unistd.h>
       #include <assert.h>
      
       void consume_cpu(void)
       {
      	int i = 0;
      	int count = 0;
      
      	for(i=0; i<100000000; i++)
      		count++;
       }
      
       int main(void)
       {
      	int i;
      	struct sigaction act;
      	struct sigevent evt = { };
      	timer_t tid;
      	struct itimerspec spec = { };
      
      	evt.sigev_notify = SIGEV_NONE;
      	assert(timer_create(CLOCK_PROCESS_CPUTIME_ID, &evt,  &tid) == 0);
      
      	spec.it_value.tv_sec = 10;
      	assert(timer_settime(tid, 0, &spec,  NULL) == 0);
      
      	for (i = 0; i < 30; i++) {
      		consume_cpu();
      		memset(&spec, 0, sizeof(spec));
      		assert(timer_gettime(tid, &spec) == 0);
      		printf("%lu.%09lu\n",
      			(unsigned long) spec.it_value.tv_sec,
      			(unsigned long) spec.it_value.tv_nsec);
      	}
      
      	assert(timer_delete(tid) == 0);
      	return 0;
       }
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1f169f84
    • Stanislaw Gruszka's avatar
      cpu-timers: Return correct previous timer reload value · ae1a78ee
      Stanislaw Gruszka authored
      
      According POSIX we need to correctly set old timer it_interval value when
      user request that in timer_settime().  Tested using below program.
      
       #include <sys/time.h>
       #include <signal.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <time.h>
       #include <unistd.h>
       #include <assert.h>
      
       int main(void)
       {
      	struct sigaction act;
      	struct sigevent evt = { };
      	timer_t tid;
      	struct itimerspec spec, u_spec, k_spec;
      
      	evt.sigev_notify = SIGEV_SIGNAL;
      	evt.sigev_signo = SIGPROF;
      	assert(timer_create(CLOCK_PROCESS_CPUTIME_ID, &evt,  &tid) == 0);
      
      	spec.it_value.tv_sec = 1;
      	spec.it_value.tv_nsec = 2;
      	spec.it_interval.tv_sec = 3;
      	spec.it_interval.tv_nsec = 4;
      	u_spec = spec;
      	assert(timer_settime(tid, 0, &spec,  NULL) == 0);
      
      	spec.it_value.tv_sec = 5;
      	spec.it_value.tv_nsec = 6;
      	spec.it_interval.tv_sec = 7;
      	spec.it_interval.tv_nsec = 8;
      	assert(timer_settime(tid, 0, &spec,  &k_spec) == 0);
      
       #define PRT(val) printf(#val ":\t%d/%d\n", (int) u_spec.val, (int) k_spec.val)
      	PRT(it_value.tv_sec);
      	PRT(it_value.tv_nsec);
      	PRT(it_interval.tv_sec);
      	PRT(it_interval.tv_nsec);
      
      	return 0;
       }
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      ae1a78ee
    • Stanislaw Gruszka's avatar
      cpu-timers: Cleanup arm_timer() · 5eb9aa64
      Stanislaw Gruszka authored
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5eb9aa64
    • Stanislaw Gruszka's avatar
      cpu-timers: Simplify RLIMIT_CPU handling · f55db609
      Stanislaw Gruszka authored
      Let always set signal->cputime_expires expiration cache when setting
      new itimer, POSIX 1.b timer, and RLIMIT_CPU.  Since we are
      initializing prof_exp expiration cache during fork(), this allows to
      remove "RLIMIT_CPU != inf" check from fastpath_timer_check() and do
      some other cleanups.
      
      Checked against regression using test cases from:
      http://marc.info/?l=linux-kernel&m=123749066504641&w=4
      http://marc.info/?l=linux-kernel&m=123811277916642&w=2
      
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      f55db609
    • Stanislaw Gruszka's avatar
      posix-cpu-timers: Reset expire cache when no timer is running · 15365c10
      Stanislaw Gruszka authored
      
      When a process deletes cpu timer or a timer expires we do not clear
      the expiration cache sig->cputimer_expires.
      
      As a result the fastpath_timer_check() which prevents us to loop over
      all threads in case no timer is active is not working and we run the
      slow path needlessly on every tick.
      
      Zero sig->cputimer_expires in stop_process_timers().
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      15365c10
  14. 06 Mar, 2010 2 commits
  15. 18 Nov, 2009 1 commit
  16. 29 Aug, 2009 1 commit
    • Xiao Guangrong's avatar
      itimers: Add tracepoints for itimer · 3f0a525e
      Xiao Guangrong authored
      
      Add tracepoints for all itimer variants: ITIMER_REAL, ITIMER_VIRTUAL
      and ITIMER_PROF.
      
      [ tglx: Fixed comments and made the output more readable, parseable
        	and consistent. Replaced pid_vnr by pid_nr because the hrtimer
        	callback can happen in any namespace ]
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Zhaolei <zhaolei@cn.fujitsu.com>
      LKML-Reference: <4A7F8B6E.2010109@cn.fujitsu.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      3f0a525e
  17. 08 Aug, 2009 1 commit
  18. 03 Aug, 2009 4 commits
  19. 30 Apr, 2009 1 commit
  20. 08 Apr, 2009 1 commit
    • Oleg Nesterov's avatar
      posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF) · 8f2e5865
      Oleg Nesterov authored
      
      update_rlimit_cpu() tries to optimize out set_process_cpu_timer() in case
      when we already have CPUCLOCK_PROF timer which should expire first. But it
      uses cputime_lt() instead of cputime_gt().
      
      Test case:
      
      	int main(void)
      	{
      		struct itimerval it = {
      			.it_value = { .tv_sec = 1000 },
      		};
      
      		assert(!setitimer(ITIMER_PROF, &it, NULL));
      
      		struct rlimit rl = {
      			.rlim_cur = 1,
      			.rlim_max = 1,
      		};
      
      		assert(!setrlimit(RLIMIT_CPU, &rl));
      
      		for (;;)
      			;
      
      		return 0;
      	}
      
      Without this patch, the task is not killed as RLIMIT_CPU demands.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Peter Lojkin <ia6432@inbox.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: stable@kernel.org
      LKML-Reference: <20090327000610.GA10108@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8f2e5865
  21. 01 Apr, 2009 1 commit
    • Hidetoshi Seto's avatar
      posixtimers, sched: Fix posix clock monotonicity · c5f8d995
      Hidetoshi Seto authored
      
      Impact: Regression fix (against clock_gettime() backwarding bug)
      
      This patch re-introduces a couple of functions, task_sched_runtime
      and thread_group_sched_runtime, which was once removed at the
      time of 2.6.28-rc1.
      
      These functions protect the sampling of thread/process clock with
      rq lock.  This rq lock is required not to update rq->clock during
      the sampling.
      
      i.e.
        The clock_gettime() may return
         ((accounted runtime before update) + (delta after update))
        that is less than what it should be.
      
      v2 -> v3:
      	- Rename static helper function __task_delta_exec()
      	  to do_task_delta_exec() since -tip tree already has
      	  a __task_delta_exec() of different version.
      
      v1 -> v2:
      	- Revises comments of function and patch description.
      	- Add note about accuracy of thread group's runtime.
      Signed-off-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org	[2.6.28.x][2.6.29.x]
      LKML-Reference: <49D1CC93.4080401@jp.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c5f8d995
  22. 23 Mar, 2009 1 commit
    • Oleg Nesterov's avatar
      posix timers: fix RLIMIT_CPU && fork() · 37bebc70
      Oleg Nesterov authored
      See http://bugzilla.kernel.org/show_bug.cgi?id=12911
      
      
      
      copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because
      posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus
      fastpath_timer_check() returns false unless we have other cpu timers.
      
      This is the minimal fix for 2.6.29 (tested) and 2.6.28. The patch is not
      optimal, we need further cleanups here. With this patch update_rlimit_cpu()
      is not really needed, but I don't think it should be removed.
      
      The proper fix (I think) is:
      
      	- set_process_cpu_timer() should just start the cputimer->running
      	  logic (it does), no need to change cputime_expires.xxx_exp
      
      	- posix_cpu_timers_init_group() should set ->running when needed
      
      	- fastpath_timer_check() can check ->running instead of
      	  task_cputime_zero(signal->cputime_expires)
      Reported-by: default avatarPeter Lojkin <ia6432@inbox.ru>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: <stable@kernel.org> [for 2.6.29.x]
      LKML-Reference: <20090323193411.GA17514@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      37bebc70
  23. 13 Feb, 2009 1 commit
    • Peter Zijlstra's avatar
      timers: more consistently use clock vs timer · 3997ad31
      Peter Zijlstra authored
      
      While reviewing the manpages, I noticed I'd missed some clock vs timer sites.
      
      Make sure that all timer functions call cpu_timer_sample_group() and not
      cpu_clock_sample_group(). This ensures that we enable the process wide timer
      in time, and therefore pay the O(n) thread group cost from the syscall.
      
      Not doing it here, will result in the first jiffy tick after setting the timer
      doing this, resulting in a very expensive tick (but only once) and a delay in
      actually starting the timer.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3997ad31