1. 23 Mar, 2011 1 commit
  2. 13 Jan, 2011 1 commit
  3. 08 Oct, 2010 1 commit
  4. 06 Mar, 2010 5 commits
    • Masami Hiramatsu's avatar
      coredump: pass mm->flags as a coredump parameter for consistency · 30736a4d
      Masami Hiramatsu authored
      
      Pass mm->flags as a coredump parameter for consistency.
      
       ---
      1787         if (mm->core_state || !get_dumpable(mm)) {  <- (1)
      1788                 up_write(&mm->mmap_sem);
      1789                 put_cred(cred);
      1790                 goto fail;
      1791         }
      1792
      [...]
      1798         if (get_dumpable(mm) == 2) {    /* Setuid core dump mode */ <-(2)
      1799                 flag = O_EXCL;          /* Stop rewrite attacks */
      1800                 cred->fsuid = 0;        /* Dump root private */
      1801         }
       ---
      
      Since dumpable bits are not protected by lock, there is a chance to change
      these bits between (1) and (2).
      
      To solve this issue, this patch copies mm->flags to
      coredump_params.mm_flags at the beginning of do_coredump() and uses it
      instead of get_dumpable() while dumping core.
      
      This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
      dump_filter bits in mm->flags.
      
      [akpm@linux-foundation.org: fix merge]
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30736a4d
    • Daisuke HATAYAMA's avatar
      elf coredump: add extended numbering support · 8d9032bb
      Daisuke HATAYAMA authored
      The current ELF dumper implementation can produce broken corefiles if
      program headers exceed 65535.  This number is determined by the number of
      vmas which the process have.  In particular, some extreme programs may use
      more than 65535 vmas.  (If you google max_map_count, you can find some
      users facing this problem.) This kind of program never be able to generate
      correct coredumps.
      
      This patch implements ``extended numbering'' that uses sh_info field of
      the first section header instead of e_phnum field in order to represent
      upto 4294967295 vmas.
      
      This is supported by
      AMD64-ABI(http://www.x86-64.org/documentation.html) and
      Solaris(http://docs.sun.com/app/docs/doc/817-1984/
      
      ).
      Of course, we are preparing patches for gdb and binutils.
      Signed-off-by: default avatarDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d9032bb
    • Daisuke HATAYAMA's avatar
      elf coredump: make offset calculation process and writing process explicit · 93eb211e
      Daisuke HATAYAMA authored
      
      By the next patch, elf_core_dump() and elf_fdpic_core_dump() will support
      extended numbering and so will produce the corefiles with section header
      table in a special case.
      
      The problem is the process of writing a file header offset of the section
      header table into e_shoff field of the ELF header.  ELF header is
      positioned at the beginning of the corefile, while section header at the
      end.  So, we need to take which of the following ways:
      
       1. Seek backward to retry writing operation for ELF header
          after writing process for a whole part
      
       2. Make offset calculation process and writing process
          totally sequential
      
      The clause 1.  is not always possible: one cannot assume that file system
      supports seek function.  Consider the no_llseek case.
      
      Therefore, this patch adopts the clause 2.
      Signed-off-by: default avatarDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93eb211e
    • Daisuke HATAYAMA's avatar
      elf coredump: replace ELF_CORE_EXTRA_* macros by functions · 1fcccbac
      Daisuke HATAYAMA authored
      
      elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
      macro for hiding _multiline_ logics in functions.  This patch removes
      #ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions.  For
      architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
      order to reduce a range of modification.
      
      This cleanup is for my next patches, but I think this cleanup itself is
      worth doing regardless of my firnal purpose.
      Signed-off-by: default avatarDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: Andrew Morton <akpm@linu...
      1fcccbac
    • Daisuke HATAYAMA's avatar
      coredump: move dump_write() and dump_seek() into a header file · 088e7af7
      Daisuke HATAYAMA authored
      
      My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
      them into other newly created *.c files.  Then, each files will contain
      dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
      same.  So, this patch moves them into a header file with dump_seek().
      Also, the patch deletes confusing DUMP_WRITE macros in each files.
      Signed-off-by: default avatarDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      088e7af7
  5. 29 Jan, 2010 1 commit
    • Linus Torvalds's avatar
      Split 'flush_old_exec' into two functions · 221af7f8
      Linus Torvalds authored
      
      'flush_old_exec()' is the point of no return when doing an execve(), and
      it is pretty badly misnamed.  It doesn't just flush the old executable
      environment, it also starts up the new one.
      
      Which is very inconvenient for things like setting up the new
      personality, because we want the new personality to affect the starting
      of the new environment, but at the same time we do _not_ want the new
      personality to take effect if flushing the old one fails.
      
      As a result, the x86-64 '32-bit' personality is actually done using this
      insane "I'm going to change the ABI, but I haven't done it yet" bit
      (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
      personality, but just the "pending" bit, so that "flush_thread()" can do
      the actual personality magic.
      
      This patch in no way changes any of that insanity, but it does split the
      'flush_old_exec()' function up into a preparatory part that can fail
      (still called flush_old_exec()), and a new part that will actually set
      up the new exec environment (setup_new_exec()).  All callers are changed
      to trivially comply with the new world order.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      221af7f8
  6. 17 Dec, 2009 1 commit
  7. 16 Dec, 2009 1 commit
  8. 04 Dec, 2009 1 commit
  9. 24 Sep, 2009 1 commit
  10. 22 Sep, 2009 1 commit
    • Hugh Dickins's avatar
      mm: add get_dump_page · f3e8fccd
      Hugh Dickins authored
      
      In preparation for the next patch, add a simple get_dump_page(addr)
      interface for the CONFIG_ELF_CORE dumpers to use, instead of calling
      get_user_pages() directly.  They're not interested in errors: they
      just want to use holes as much as possible, to save space and make
      sure that the data is aligned where the headers said it would be.
      
      Oh, and don't use that horrid DUMP_SEEK(off) macro!
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3e8fccd
  11. 10 Sep, 2009 2 commits
    • Roland McGrath's avatar
      binfmt_elf: fix PT_INTERP bss handling · 9f0ab4a3
      Roland McGrath authored
      
      In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
      the PT_LOAD has no PROT_WRITE and no .bss.  This generates EFAULT.
      
      Here is a small test case.  (Yes, there are other, useful PT_INTERP
      which have only .text and no .data/.bss.)
      
      	----- ptinterp.S
      	_start: .globl _start
      		 nop
      		 int3
      	-----
      	$ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
      	$ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
      	$ ./hello
      	Segmentation fault  # during execve() itself
      
      	After applying the patch:
      	$ ./hello
      	Trace trap  # user-mode execution after execve() finishes
      
      If the ELF headers are actually self-inconsistent, then dying is fine.
      But having no PROT_WRITE segment is perfectly normal and correct if
      there is no segment with p_memsz > p_filesz (i.e. bss).  John Reiser
      suggested checking for PROT_WRITE in the bss logic.  I think it makes
      most sense to simply apply the bss logic only when there is bss.
      
      This patch looks less trivial than it is due to some reindentation.
      It just moves the "if (last_bss > elf_bss) {" test up to include the
      partial-page bss logic as well as the more-pages bss logic.
      Reported-by: default avatarJohn Reiser <jreiser@bitwagon.com>
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      9f0ab4a3
    • Roland McGrath's avatar
      binfmt_elf: fix PT_INTERP bss handling · 752015d1
      Roland McGrath authored
      
      In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
      the PT_LOAD has no PROT_WRITE and no .bss.  This generates EFAULT.
      
      Here is a small test case.  (Yes, there are other, useful PT_INTERP
      which have only .text and no .data/.bss.)
      
      	----- ptinterp.S
      	_start: .globl _start
      		 nop
      		 int3
      	-----
      	$ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
      	$ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
      	$ ./hello
      	Segmentation fault  # during execve() itself
      
      	After applying the patch:
      	$ ./hello
      	Trace trap  # user-mode execution after execve() finishes
      
      If the ELF headers are actually self-inconsistent, then dying is fine.
      But having no PROT_WRITE segment is perfectly normal and correct if
      there is no segment with p_memsz > p_filesz (i.e. bss).  John Reiser
      suggested checking for PROT_WRITE in the bss logic.  I think it makes
      most sense to simply apply the bss logic only when there is bss.
      
      This patch looks less trivial than it is due to some reindentation.
      It just moves the "if (last_bss > elf_bss) {" test up to include the
      partial-page bss logic as well as the more-pages bss logic.
      Reported-by: default avatarJohn Reiser <jreiser@bitwagon.com>
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      752015d1
  12. 01 Jul, 2009 2 commits
  13. 18 Jun, 2009 1 commit
  14. 01 Apr, 2009 2 commits
  15. 07 Feb, 2009 1 commit
    • Roland McGrath's avatar
      elf core dump: fix get_user use · 92dc07b1
      Roland McGrath authored
      
      The elf_core_dump() code does its work with set_fs(KERNEL_DS) in force,
      so vma_dump_size() needs to switch back with set_fs(USER_DS) to safely
      use get_user() for a normal user-space address.
      
      Checking for VM_READ optimizes out the case where get_user() would fail
      anyway.  The vm_file check here was already superfluous given the control
      flow earlier in the function, so that is a cleanup/optimization unrelated
      to other changes but an obvious and trivial one.
      Reported-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      92dc07b1
  16. 08 Jan, 2009 1 commit
    • Kees Cook's avatar
      ELF: implement AT_RANDOM for glibc PRNG seeding · f06295b4
      Kees Cook authored
      While discussing[1] the need for glibc to have access to random bytes
      during program load, it seems that an earlier attempt to implement
      AT_RANDOM got stalled.  This implements a random 16 byte string, available
      to every ELF program via a new auxv AT_RANDOM vector.
      
      [1] http://sourceware.org/ml/libc-alpha/2008-10/msg00006.html
      
      
      
      Ulrich said:
      
      glibc needs right after startup a bit of random data for internal
      protections (stack canary etc).  What is now in upstream glibc is that we
      always unconditionally open /dev/urandom, read some data, and use it.  For
      every process startup.  That's slow.
      
      ...
      
      The solution is to provide a limited amount of random data to the
      starting process in the aux vector.  I suggested 16 bytes and this is
      what the patch implements.  If we need only 16 bytes or less we use the
      data directly.  If we need more we'll use the 16 bytes to see a PRNG.
      This avoids the costly /dev/urandom use and it allows the kernel to use
      the most adequate source of random data for this purpose.  It might not
      be the same pool as that for /dev/urandom.
      
      Concerns were expressed about the depletion of the randomness pool.  But
      this patch doesn't make the situation worse, it doesn't deplete entropy
      more than happens now.
      Signed-off-by: default avatarKees Cook <kees.cook@canonical.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f06295b4
  17. 25 Dec, 2008 1 commit
    • Martin Schwidefsky's avatar
      [S390] arch_setup_additional_pages arguments · fc5243d9
      Martin Schwidefsky authored
      
      arch_setup_additional_pages currently gets two arguments, the binary
      format descripton and an indication if the process uses an executable
      stack or not. The second argument is not used by anybody, it could
      be removed without replacement.
      
      What actually does make sense is to pass an indication if the process
      uses the elf interpreter or not. The glibc code will not use anything
      from the vdso if the process does not use the dynamic linker, so for
      statically linked binaries the architecture backend can choose not
      to map the vdso.
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      fc5243d9
  18. 13 Nov, 2008 4 commits
    • David Howells's avatar
      CRED: Make execve() take advantage of copy-on-write credentials · a6f76f23
      David Howells authored
      
      Make execve() take advantage of copy-on-write credentials, allowing it to set
      up the credentials in advance, and then commit the whole lot after the point
      of no return.
      
      This patch and the preceding patches have been tested with the LTP SELinux
      testsuite.
      
      This patch makes several logical sets of alteration:
      
       (1) execve().
      
           The credential bits from struct linux_binprm are, for the most part,
           replaced with a single credentials pointer (bprm->cred).  This means that
           all the creds can be calculated in advance and then applied at the point
           of no return with no possibility of failure.
      
           I would like to replace bprm->cap_effective with:
      
      	cap_isclear(bprm->cap_effective)
      
           but this seems impossible due to special behaviour for processes of pid 1
           (they always retain their parent's capability masks where normally they'd
           be changed - see cap_bprm_set_creds()).
      
           The following sequence of events now happens:
      
           (a) At the start of do_execve, the current task's cred_exec_mutex is
           	 locked to prevent PTRACE_ATTACH from obsoleting the calculation of
           	 creds that we make.
      
           (a) prepare_exec_creds() is then called to make a copy of the current
           	 task's credentials and prepare it.  This copy is then assigned to
           	 bprm->cred.
      
        	 This renders security_bprm_alloc() and security_bprm_free()
           	 unnecessary, and so they've been removed.
      
           (b) The determination of unsafe execution is now performed immediately
           	 after (a) rather than later on in the code.  The result is stored in
           	 bprm->unsafe for future reference.
      
           (c) prepare_binprm() is called, possibly multiple times.
      
           	 (i) This applies the result of set[ug]id binaries to the new creds
           	     attached to bprm->cred.  Personality bit clearance is recorded,
           	     but now deferred on the basis that the exec procedure may yet
           	     fail.
      
               (ii) This then calls the new security_bprm_set_creds().  This should
      	     calculate the new LSM and capability credentials into *bprm->cred.
      
      	     This folds together security_bprm_set() and parts of
      	     security_bprm_apply_creds() (these two have been removed).
      	     Anything that might fail must be done at this point.
      
               (iii) bprm->cred_prepared is set to 1.
      
      	     bprm->cred_prepared is 0 on the first pass of the security
      	     calculations, and 1 on all subsequent passes.  This allows SELinux
      	     in (ii) to base its calculations only on the initial script and
      	     not on the interpreter.
      
           (d) flush_old_exec() is called to commit the task to execution.  This
           	 performs the following steps with regard to credentials:
      
      	 (i) Clear pdeath_signal and set dumpable on certain circumstances that
      	     may not be covered by commit_creds().
      
               (ii) Clear any bits in current->personality that were deferred from
                   (c.i).
      
           (e) install_exec_creds() [compute_creds() as was] is called to install the
           	 new credentials.  This performs the following steps with regard to
           	 credentials:
      
               (i) Calls security_bprm_committing_creds() to apply any security
                   requirements, such as flushing unauthorised files in SELinux, that
                   must be done before the credentials are changed.
      
      	     This is made up of bits of security_bprm_apply_creds() and
      	     security_bprm_post_apply_creds(), both of which have been removed.
      	     This function is not allowed to fail; anything that might fail
      	     must have been done in (c.ii).
      
               (ii) Calls commit_creds() to apply the new credentials in a single
                   assignment (more or less).  Possibly pdeath_signal and dumpable
                   should be part of struct creds.
      
      	 (iii) Unlocks the task's cred_replace_mutex, thus allowing
      	     PTRACE_ATTACH to take place.
      
               (iv) Clears The bprm->cred pointer as the credentials it was holding
                   are now immutable.
      
               (v) Calls security_bprm_committed_creds() to apply any security
                   alterations that must be done after the creds have been changed.
                   SELinux uses this to flush signals and signal handlers.
      
           (f) If an error occurs before (d.i), bprm_free() will call abort_creds()
           	 to destroy the proposed new credentials and will then unlock
           	 cred_replace_mutex.  No changes to the credentials will have been
           	 made.
      
       (2) LSM interface.
      
           A number of functions have been changed, added or removed:
      
           (*) security_bprm_alloc(), ->bprm_alloc_security()
           (*) security_bprm_free(), ->bprm_free_security()
      
           	 Removed in favour of preparing new credentials and modifying those.
      
           (*) security_bprm_apply_creds(), ->bprm_apply_creds()
           (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()
      
           	 Removed; split between security_bprm_set_creds(),
           	 security_bprm_committing_creds() and security_bprm_committed_creds().
      
           (*) security_bprm_set(), ->bprm_set_security()
      
           	 Removed; folded into security_bprm_set_creds().
      
           (*) security_bprm_set_creds(), ->bprm_set_creds()
      
           	 New.  The new credentials in bprm->creds should be checked and set up
           	 as appropriate.  bprm->cred_prepared is 0 on the first call, 1 on the
           	 second and subsequent calls.
      
           (*) security_bprm_committing_creds(), ->bprm_committing_creds()
           (*) security_bprm_committed_creds(), ->bprm_committed_creds()
      
           	 New.  Apply the security effects of the new credentials.  This
           	 includes closing unauthorised files in SELinux.  This function may not
           	 fail.  When the former is called, the creds haven't yet been applied
           	 to the process; when the latter is called, they have.
      
       	 The former may access bprm->cred, the latter may not.
      
       (3) SELinux.
      
           SELinux has a number of changes, in addition to those to support the LSM
           interface changes mentioned above:
      
           (a) The bprm_security_struct struct has been removed in favour of using
           	 the credentials-under-construction approach.
      
           (c) flush_unauthorized_files() now takes a cred pointer and passes it on
           	 to inode_has_perm(), file_has_perm() and dentry_open().
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJames Morris <jmorris@namei.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      a6f76f23
    • David Howells's avatar
      CRED: Use RCU to access another task's creds and to release a task's own creds · c69e8d9c
      David Howells authored
      
      Use RCU to access another task's creds and to release a task's own creds.
      This means that it will be possible for the credentials of a task to be
      replaced without another task (a) requiring a full lock to read them, and (b)
      seeing deallocated memory.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJames Morris <jmorris@namei.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      c69e8d9c
    • David Howells's avatar
      CRED: Wrap current->cred and a few other accessors · 86a264ab
      David Howells authored
      
      Wrap current->cred and a few other accessors to hide their actual
      implementation.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJames Morris <jmorris@namei.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      86a264ab
    • David Howells's avatar
      CRED: Separate task security context from task_struct · b6dff3ec
      David Howells authored
      
      Separate the task security context from task_struct.  At this point, the
      security data is temporarily embedded in the task_struct with two pointers
      pointing to it.
      
      Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
      entry.S via asm-offsets.
      
      With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJames Morris <jmorris@namei.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      b6dff3ec
  19. 20 Oct, 2008 1 commit
    • KOSAKI Motohiro's avatar
      coredump_filter: add hugepage dumping · e575f111
      KOSAKI Motohiro authored
      
      Presently hugepage's vma has a VM_RESERVED flag in order not to be
      swapped.  But a VM_RESERVED vma isn't core dumped because this flag is
      often used for some kernel vmas (e.g.  vmalloc, sound related).
      
      Thus hugepages are never dumped and it can't be debugged easily.  Many
      developers want hugepages to be included into core-dump.
      
      However, We can't read generic VM_RESERVED area because this area is often
      IO mapping area.  then these area reading may change device state.  it is
      definitly undesiable side-effect.
      
      So adding a hugepage specific bit to the coredump filter is better.  It
      will be able to hugepage core dumping and doesn't cause any side-effect to
      any i/o devices.
      
      In additional, libhugetlb use hugetlb private mapping pages as anonymous
      page.  Then, hugepage private mapping pages should be core dumped by
      default.
      
      Then, /proc/[pid]/core_dump_filter has two new bits.
      
       - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes)
       - bit 6 mean hugetlb shared mapping pages are dumped or not.  (default: no)
      
      I tested by following method.
      
      % ulimit -c unlimited
      % ./crash_hugepage  50
      % ./crash_hugepage  50  -p
      % ls -lh
      % gdb ./crash_hugepage core
      %
      % echo 0x43 > /proc/self/coredump_filter
      % ./crash_hugepage  50
      % ./crash_hugepage  50  -p
      % ls -lh
      % gdb ./crash_hugepage core
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/mman.h>
      #include <string.h>
      
      #include "hugetlbfs.h"
      
      int main(int argc, char** argv){
      	char* p;
      	int ch;
      	int mmap_flags = MAP_SHARED;
      	int fd;
      	int nr_pages;
      
      	while((ch = getopt(argc, argv, "p")) != -1) {
      		switch (ch) {
      		case 'p':
      			mmap_flags &= ~MAP_SHARED;
      			mmap_flags |= MAP_PRIVATE;
      			break;
      		default:
      			/* nothing*/
      			break;
      		}
      	}
      	argc -= optind;
      	argv += optind;
      
      	if (argc == 0){
      		printf("need # of pages\n");
      		exit(1);
      	}
      
      	nr_pages = atoi(argv[0]);
      	if (nr_pages < 2) {
      		printf("nr_pages must >2\n");
      		exit(1);
      	}
      
      	fd = hugetlbfs_unlinked_fd();
      	p = mmap(NULL, nr_pages * gethugepagesize(),
      		 PROT_READ|PROT_WRITE, mmap_flags, fd, 0);
      
      	sleep(2);
      
      	*(p + gethugepagesize()) = 1; /* COW */
      	sleep(2);
      
      	/* crash! */
      	*(int*)0 = 1;
      
      	return 0;
      }
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarKawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: William Irwin <wli@holomorphy.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e575f111
  20. 16 Oct, 2008 1 commit
  21. 14 Sep, 2008 1 commit
    • Frank Mayhar's avatar
      timers: fix itimer/many thread hang · f06febc9
      Frank Mayhar authored
      Overview
      
      This patch reworks the handling of POSIX CPU timers, including the
      ITIMER_PROF, ITIMER_VIRT timers and rlimit handling.  It was put together
      with the help of Roland McGrath, the owner and original writer of this code.
      
      The problem we ran into, and the reason for this rework, has to do with using
      a profiling timer in a process with a large number of threads.  It appears
      that the performance of the old implementation of run_posix_cpu_timers() was
      at least O(n*3) (where "n" is the number of threads in a process) or worse.
      Everything is fine with an increasing number of threads until the time taken
      for that routine to run becomes the same as or greater than the tick time, at
      which point things degrade rather quickly.
      
      This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF."
      
      Code Changes
      
      This rework corrects the implementation of run_posix_cpu_timers() to make it
      run in constant time for a particular machine.  (Performance may vary between
      one machine and a...
      f06febc9
  22. 26 Jul, 2008 1 commit
    • Roland McGrath's avatar
      tracehook: exec · 6341c393
      Roland McGrath authored
      
      This moves all the ptrace hooks related to exec into tracehook.h inlines.
      
      This also lifts the calls for tracing out of the binfmt load_binary hooks
      into search_binary_handler() after it calls into the binfmt module.  This
      change has no effect, since all the binfmt modules' load_binary functions
      did the call at the end on success, and now search_binary_handler() does
      it immediately after return if successful.  We consolidate the repeated
      code, and binfmt modules no longer need to import ptrace_notify().
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6341c393
  23. 25 Jul, 2008 3 commits
  24. 22 Jul, 2008 1 commit
    • John Reiser's avatar
      execve filename: document and export via auxiliary vector · 65191087
      John Reiser authored
      The Linux kernel puts the filename argument of execve() into the new
      address space.  Many developers are surprised to learn this.  Those who
      know and could use it, object "But it's not documented."
      
      Those who want to use it dislike the expression
        (char *)(1+ strlen(env[-1+ n_env]) + env[-1+ n_env])
      because it requires locating the last original environment variable,
      and assumes that the filename follows the characters.
      
      This patch documents the insertion of the filename, and makes it easier
      to find by adding a new tag AT_EXECFN in the ElfXX_auxv_t; see <elf.h>.
      
      In many cases readlink("/proc/self/exe",) gives the same answer.  But if
      all the original pages get unmapped, then the kernel erases the symlink
      for /proc/self/exe.  This can happen when a program decompressor does a
      good job of cleaning up after uncompressing directly to memory, so that
      the address space of the target program looks the same as if compression
      had never happened.  One example is http://upx.sourceforge.net
      
       .
      
      One notable use of the underlying concept (what path containED the
      executable) is glibc expanding $ORIGIN in DT_RUNPATH.  In practice for
      the near term, it may be a good idea for user-mode code to use both
      /proc/self/exe and AT_EXECFN as fall-back methods for each other.
      /proc/self/exe can fail due to unmapping, AT_EXECFN can fail because it
      won't be present on non-new systems.  The auxvec or {AT_EXECFN}.d_val
      also can get overwritten, although in nearly all cases this would be the
      result of a bug.
      
      The runtime cost is one NEW_AUX_ENT using two words of stack space.  The
      underlying value is maintained already as bprm->exec; setup_arg_pages()
      in fs/exec.c slides it for stack_shift, etc.
      Signed-off-by: default avatarJohn Reiser <jreiser@BitWagon.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65191087
  25. 16 Jun, 2008 1 commit
  26. 16 May, 2008 2 commits
  27. 29 Apr, 2008 1 commit