1. 08 Dec, 2006 4 commits
  2. 07 Dec, 2006 6 commits
    • Oleg Nesterov's avatar
      [PATCH] taskstats: cleanup ->signal->stats allocation · 34ec1234
      Oleg Nesterov authored
      
      Allocate ->signal->stats on demand in taskstats_exit(), this allows us to
      remove taskstats_tgid_alloc() (the last non-trivial inline) from taskstat's
      public interface.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      34ec1234
    • Roland McGrath's avatar
      [PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit · fec1d011
      Roland McGrath authored
      
      The CLONE_CHILD_CLEARTID flag is used by NPTL to have its threads
      communicate via memory/futex when they exit, so pthread_join can
      synchronize using a simple futex wait.  The word of user memory where NPTL
      stores a thread's own TID is what it passes; this gets reset to zero at
      thread exit.
      
      It is not desireable to touch this user memory when threads are dying due
      to a fatal signal.  A core dump is more usefully representative of the
      dying program state if the threads live at the time of the crash have their
      NPTL data structures unperturbed.  The userland expectation of
      CLONE_CHILD_CLEARTID has only ever been that it works for a thread making
      an _exit system call.
      
      This problem was identified by Ernie Petrides <petrides@redhat.com>.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Ernie Petrides <petrides@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fec1d011
    • Christoph Lameter's avatar
      [PATCH] slab: remove kmem_cache_t · e18b890b
      Christoph Lameter authored
      
      Replace all uses of kmem_cache_t with struct kmem_cache.
      
      The patch was generated using the following script:
      
      	#!/bin/sh
      	#
      	# Replace one string by another in all the kernel sources.
      	#
      
      	set -e
      
      	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
      		quilt add $file
      		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
      		mv /tmp/$$ $file
      		quilt refresh
      	done
      
      The script was run like this
      
      	sh replace kmem_cache_t "struct kmem_cache"
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e18b890b
    • Christoph Lameter's avatar
      [PATCH] slab: remove SLAB_KERNEL · e94b1766
      Christoph Lameter authored
      
      SLAB_KERNEL is an alias of GFP_KERNEL.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e94b1766
    • Ashwin Chaugule's avatar
      [PATCH] new scheme to preempt swap token · 7602bdf2
      Ashwin Chaugule authored
      
      The new swap token patches replace the current token traversal algo.  The old
      algo had a crude timeout parameter that was used to handover the token from
      one task to another.  This algo, transfers the token to the tasks that are in
      need of the token.  The urgency for the token is based on the number of times
      a task is required to swap-in pages.  Accordingly, the priority of a task is
      incremented if it has been badly affected due to swap-outs.  To ensure that
      the token doesnt bounce around rapidly, the token holders are given a priority
      boost.  The priority of tasks is also decremented, if their rate of swap-in's
      keeps reducing.  This way, the condition to check whether to pre-empt the swap
      token, is a matter of comparing two task's priority fields.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: default avatarAshwin Chaugule <ashwin.chaugule@celunite.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7602bdf2
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Use %gs as the PDA base-segment in the kernel · f95d47ca
      Jeremy Fitzhardinge authored
      
      This patch is the meat of the PDA change.  This patch makes several related
      changes:
      
      1: Most significantly, %gs is now used in the kernel.  This means that on
         entry, the old value of %gs is saved away, and it is reloaded with
         __KERNEL_PDA.
      
      2: entry.S constructs the stack in the shape of struct pt_regs, and this
         is passed around the kernel so that the process's saved register
         state can be accessed.
      
         Unfortunately struct pt_regs doesn't currently have space for %gs
         (or %fs). This patch extends pt_regs to add space for gs (no space
         is allocated for %fs, since it won't be used, and it would just
         complicate the code in entry.S to work around the space).
      
      3: Because %gs is now saved on the stack like %ds, %es and the integer
         registers, there are a number of places where it no longer needs to
         be handled specially; namely context switch, and saving/restoring the
         register state in a signal context.
      
      4: And since kernel threads run in kernel space and call normal kernel
         code, they need to be created with their %gs == __KERNEL_PDA.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      f95d47ca
  3. 25 Nov, 2006 1 commit
  4. 14 Nov, 2006 1 commit
    • Linus Torvalds's avatar
      Revert "[PATCH] fix Data Acess error in dup_fd" · 9a3a04ac
      Linus Torvalds authored
      This reverts commit 0130b0b3
      
      .
      
      Sergey Vlasov points out (and Vadim Lobanov concurs) that the bug it was
      supposed to fix must be some unrelated memory corruption, and the "fix"
      actually causes more problems:
      
        "However, the new code does not look safe in all cases.  If some other
         task has opened more files while dup_fd() released oldf->file_lock, the
         new code will update open_files to the new larger value.  But newf was
         allocated with the old smaller value of open_files, therefore subsequent
         accesses to newf may try to write into unallocated memory."
      
      so revert it.
      
      Cc: Sharyathi Nagesh <sharyath@in.ibm.com>
      Cc: Sergey Vlasov <vsu@altlinux.ru>
      Cc: Vadim Lobanov <vlobanov@speakeasy.net>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9a3a04ac
  5. 13 Nov, 2006 1 commit
    • Sharyathi Nagesh's avatar
      [PATCH] fix Data Acess error in dup_fd · 0130b0b3
      Sharyathi Nagesh authored
      
      On running the Stress Test on machine for more than 72 hours following
      error message was observed.
      
      0:mon> e
      cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
          pc: c000000000060d90: .dup_fd+0x240/0x39c
          lr: c000000000060d6c: .dup_fd+0x21c/0x39c
          sp: c00000007ce2fa70
         msr: 800000000000b032
         dar: ffffffff00000028
       dsisr: 40000000
        current = 0xc000000074950980
        paca    = 0xc000000000454500
          pid   = 27330, comm = bash
      
      0:mon> t
      [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
      [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
      [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
      [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
      [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
      [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
      
      The problem is because of race window.  When if(expand) block is executed in
      dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be
      modified.  So actual open_files in oldf may not match with open_files
      variable.
      
      Cc: Vadim Lobanov <vlobanov@speakeasy.net>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0130b0b3
  6. 28 Oct, 2006 2 commits
  7. 17 Oct, 2006 1 commit
    • Peter Zijlstra's avatar
      [PATCH] rt-mutex: fixup rt-mutex debug code · bea493a0
      Peter Zijlstra authored
      
      BUG: warning at kernel/rtmutex-debug.c:125/rt_mutex_debug_task_free() (Not tainted)
       [<c04051e3>] show_trace_log_lvl+0x58/0x16a
       [<c04057f0>] show_trace+0xd/0x10
       [<c0405900>] dump_stack+0x19/0x1b
       [<c043f03d>] rt_mutex_debug_task_free+0x35/0x6a
       [<c04224c0>] free_task+0x15/0x24
       [<c042378c>] copy_process+0x12bd/0x1324
       [<c0423835>] do_fork+0x42/0x113
       [<c04021dd>] sys_fork+0x19/0x1b
       [<c0403fb7>] syscall_call+0x7/0xb
      
      In copy_process(), dup_task_struct() also duplicates the ->pi_lock,
      ->pi_waiters and ->pi_blocked_on members.  rt_mutex_debug_task_free()
      called from free_task() validates these members.  However free_task() can
      be invoked before these members are reset for the new task.
      
      Move the initialization code before the first bail that can hit free_task().
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bea493a0
  8. 02 Oct, 2006 6 commits
  9. 01 Oct, 2006 1 commit
    • Jay Lan's avatar
      [PATCH] csa: convert CONFIG tag for extended accounting routines · 8f0ab514
      Jay Lan authored
      
      There were a few accounting data/macros that are used in CSA but are #ifdef'ed
      inside CONFIG_BSD_PROCESS_ACCT.  This patch is to change those ifdef's from
      CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT.  A few defines are moved from
      kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
      include/linux/tsacct_kern.h.
      Signed-off-by: default avatarJay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8f0ab514
  10. 29 Sep, 2006 2 commits
  11. 26 Sep, 2006 1 commit
  12. 20 Sep, 2006 1 commit
  13. 01 Sep, 2006 1 commit
    • Shailabh Nagar's avatar
      [PATCH] task delay accounting fixes · 35df17c5
      Shailabh Nagar authored
      Cleanup allocation and freeing of tsk->delays used by delay accounting.
      This solves two problems reported for delay accounting:
      
      1. oops in __delayacct_blkio_ticks
      http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1844.html
      
      Currently tsk->delays is getting freed too early in task exit which can
      cause a NULL tsk->delays to get accessed via reading of /proc/<tgid>/stats.
       The patch fixes this problem by freeing tsk->delays closer to when
      task_struct itself is freed up.  As a result, it also eliminates the use of
      tsk->delays_lock which was only being used (inadequately) to safeguard
      access to tsk->delays while a task was exiting.
      
      2. Possible memory leak in kernel/delayacct.c
      http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1389.html
      
      
      
      The patch cleans up tsk->delays allocations after a bad fork which was
      missing earlier.
      
      The patch has been tested to fix the problems listed above and stress
      tested with rapid calls to delay accounting's taskstats command interface
      (which is the other path that can access the same data, besides the /proc
      interface causing the oops above).
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      35df17c5
  14. 06 Aug, 2006 1 commit
  15. 15 Jul, 2006 2 commits
    • Shailabh Nagar's avatar
      [PATCH] delay accounting taskstats interface send tgid once · ad4ecbcb
      Shailabh Nagar authored
      
      Send per-tgid data only once during exit of a thread group instead of once
      with each member thread exit.
      
      Currently, when a thread exits, besides its per-tid data, the per-tgid data
      of its thread group is also sent out, if its thread group is non-empty.
      The per-tgid data sent consists of the sum of per-tid stats for all
      *remaining* threads of the thread group.
      
      This patch modifies this sending in two ways:
      
      - the per-tgid data is sent only when the last thread of a thread group
        exits.  This cuts down heavily on the overhead of sending/receiving
        per-tgid data, especially when other exploiters of the taskstats
        interface aren't interested in per-tgid stats
      
      - the semantics of the per-tgid data sent are changed.  Instead of being
        the sum of per-tid data for remaining threads, the value now sent is the
        true total accumalated statistics for all threads that are/were part of
        the thread group.
      
      The patch also addresses a minor issue where failure of one accounting
      subsystem to fill in the taskstats structure was causing the send of
      taskstats to not be sent at all.
      
      The patch has been tested for stability and run cerberus for over 4 hours
      on an SMP.
      
      [akpm@osdl.org: bugfixes]
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Signed-off-by: default avatarBalbir Singh <balbir@in.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ad4ecbcb
    • Shailabh Nagar's avatar
      [PATCH] per-task-delay-accounting: setup · ca74e92b
      Shailabh Nagar authored
      
      Initialization code related to collection of per-task "delay" statistics which
      measure how long it had to wait for cpu, sync block io, swapping etc.  The
      collection of statistics and the interface are in other patches.  This patch
      sets up the data structures and allows the statistics collection to be
      disabled through a kernel boot parameter.
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Signed-off-by: default avatarBalbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
      Cc: Erich Focht <efocht@ess.nec.de>
      Cc: Levent Serinol <lserinol@gmail.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ca74e92b
  16. 10 Jul, 2006 1 commit
  17. 03 Jul, 2006 5 commits
    • Ingo Molnar's avatar
      [PATCH] sched: cleanup, remove task_t, convert to struct task_struct · 36c8b586
      Ingo Molnar authored
      
      cleanup: remove task_t and convert all the uses to struct task_struct. I
      introduced it for the scheduler anno and it was a mistake.
      
      Conversion was mostly scripted, the result was reviewed and all
      secondary whitespace and style impact (if any) was fixed up by hand.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      36c8b586
    • Ingo Molnar's avatar
      [PATCH] lockdep: annotate ->mmap_sem · ad339451
      Ingo Molnar authored
      
      Teach special (recursive) locking code to the lock validator.  Has no effect
      on non-lockdep kernels.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ad339451
    • Ingo Molnar's avatar
      [PATCH] lockdep: core · fbb9ce95
      Ingo Molnar authored
      Do 'make oldconfig' and accept all the defaults for new config options -
      reboot into the kernel and if everything goes well it should boot up fine and
      you should have /proc/lockdep and /proc/lockdep_stats files.
      
      Typically if the lock validator finds some problem it will print out
      voluminous debug output that begins with "BUG: ..." and which syslog output
      can be used by kernel developers to figure out the precise locking scenario.
      
      What does the lock validator do?  It "observes" and maps all locking rules as
      they occur dynamically (as triggered by the kernel's natural use of spinlocks,
      rwlocks, mutexes and rwsems).  Whenever the lock validator subsystem detects a
      new locking scenario, it validates this new rule against the existing set of
      rules.  If this new rule is consistent with the existing set of rules then the
      new rule is added transparently and the kernel continues as normal.  If the
      new rule could create a deadlock scenario then this condition is printed out.
      
      When determining validity of locking, all possible "deadlock scenarios" are
      considered: assuming arbitrary number of CPUs, arbitrary irq context and task
      context constellations, running arbitrary combinations of all the existing
      locking scenarios.  In a typical system this means millions of separate
      scenarios.  This is why we call it a "locking correctness" validator - for all
      rules that are observed the lock validator proves it with mathematical
      certainty that a deadlock could not occur (assuming that the lock validator
      implementation itself is correct and its internal data structures are not
      corrupted by some other kernel subsystem).  [see more details and conditionals
      of this statement in include/linux/lockdep.h and
      Documentation/lockdep-design.txt]
      
      Furthermore, this "all possible scenarios" property of the validator also
      enables the finding of complex, highly unlikely multi-CPU multi-context races
      via single single-context rules, increasing the likelyhood of finding bugs
      drastically.  In practical terms: the lock validator already found a bug in
      the upstream kernel that could only occur on systems with 3 or more CPUs, and
      which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
      That bug was found and reported on a single-CPU system (!).  So in essence a
      race will be found "piecemail-wise", triggering all the necessary components
      for the race, without having to reproduce the race scenario itself!  In its
      short existence the lock validator found and reported many bugs before they
      actually caused a real deadlock.
      
      To further increase the efficiency of the validator, the mapping is not per
      "lock instance", but per "lock-class".  For example, all struct inode objects
      in the kernel have inode->inotify_mutex.  If there are 10,000 inodes cached,
      then there are 10,000 lock objects.  But ->inotify_mutex is a single "lock
      type", and all locking activities that occur against ->inotify_mutex are
      "unified" into this single lock-class.  The advantage of the lock-class
      approach is that all historical ->inotify_mutex uses are mapped into a single
      (and as narrow as possible) set of locking rules - regardless of how many
      different tasks or inode structures it took to build this set of rules.  The
      set of rules persist during the lifetime of the kernel.
      
      To see the rough magnitude of checking that the lock validator does, here's a
      portion of /proc/lockdep_stats, fresh after bootup:
      
       lock-classes:                            694 [max: 2048]
       direct dependencies:                  1598 [max: 8192]
       indirect dependencies:               17896
       all direct dependencies:             16206
       dependency chains:                    1910 [max: 8192]
       in-hardirq chains:                      17
       in-softirq chains:                     105
       in-process chains:                    1065
       stack-trace entries:                 38761 [max: 131072]
       combined max dependencies:         2033928
       hardirq-safe locks:                     24
       hardirq-unsafe locks:                  176
       softirq-safe locks:                     53
       softirq-unsafe locks:                  137
       irq-safe locks:                         59
       irq-unsafe locks:                      176
      
      The lock validator has observed 1598 actual single-thread locking patterns,
      and has validated all possible 2033928 distinct locking scenarios.
      
      More details about the design of the lock validator can be found in
      Documentation/lockdep-design.txt, which can also found at:
      
         http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
      
      
      
      [bunk@stusta.de: cleanups]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fbb9ce95
    • Ingo Molnar's avatar
      [PATCH] lockdep: irqtrace subsystem, core · de30a2b3
      Ingo Molnar authored
      
      Accurate hard-IRQ-flags and softirq-flags state tracing.
      
      This allows us to attach extra functionality to IRQ flags on/off
      events (such as trace-on/off).
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      de30a2b3
    • Ingo Molnar's avatar
      [PATCH] lockdep: better lock debugging · 9a11b49a
      Ingo Molnar authored
      Generic lock debugging:
      
       - generalized lock debugging framework. For example, a bug in one lock
         subsystem turns off debugging in all lock subsystems.
      
       - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
         the mutex/rtmutex debugging code: it caused way too much prototype
         hackery, and lockdep will give the same information anyway.
      
       - ability to do silent tests
      
       - check lock freeing in vfree too.
      
       - more finegrained debugging options, to allow distributions to
         turn off more expensive debugging features.
      
      There's no separate 'held mutexes' list anymore - but there's a 'held locks'
      stack within lockdep, which unifies deadlock detection across all lock
      classes.  (this is independent of the lockdep validation stuff - lockdep first
      checks whether we are holding a lock already)
      
      Here are the current debugging options:
      
      CONFIG_DEBUG_MUTEXES=y
      CONFIG_DEBUG_LOCK_ALLOC=y
      
      which do:
      
       config DEBUG_MUTEXES
                bool "Mutex debugging, basic ch...
      9a11b49a
  18. 30 Jun, 2006 1 commit
  19. 28 Jun, 2006 2 commits