1. 23 Mar, 2006 3 commits
    • Eric Dumazet's avatar
      [PATCH] Shrinks sizeof(files_struct) and better layout · 0c9e63fd
      Eric Dumazet authored
      
      1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
         platforms, lowering kmalloc() allocated space by 50%.
      
      2) Reduce the size of (files_struct), using a special 32 bits (or
         64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
         close_on_exec_init and open_fds_init fields.  This save some ram (248
         bytes per task) as most tasks dont open more than 32 files.  D-Cache
         footprint for such tasks is also reduced to the minimum.
      
      3) Reduce size of allocated fdset.  Currently two full pages are
         allocated, that is 32768 bits on x86 for example, and way too much.  The
         minimum is now L1_CACHE_BYTES.
      
      UP and SMP should benefit from this patch, because most tasks will touch
      only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
      (next_fd, close_on_exec_init, open_fds_init, fd_array[0 ..  2] being in the
      same cache line)
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0c9e63fd
    • Andrew Morton's avatar
      [PATCH] ext3_readdir: use generic readahead · d8733c29
      Andrew Morton authored
      
      Linus points out that ext3_readdir's readahead only cuts in when
      ext3_readdir() is operating at the very start of the directory.  So for large
      directories we end up performing no readahead at all and we suck.
      
      So take it all out and use the core VM's page_cache_readahead().  This means
      that ext3 directory reads will use all of readahead's dynamic sizing goop.
      
      Note that we're using the directory's filp->f_ra to hold the readahead state,
      but readahead is actually being performed against the underlying blockdev's
      address_space.  Fortunately the readahead code is all set up to handle this.
      
      Tested with printk.  It works.  I was struggling to find a real workload which
      actually cared.
      
      (The patch also exports page_cache_readahead() to GPL modules)
      
      Cc: "Stephen C. Tweedie" <sct@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d8733c29
    • Neil Horman's avatar
      [PATCH] proc: fix duplicate line in /proc/devices · 5be0e951
      Neil Horman authored
      
      Fix a duplicate block device line printed after the "Block device" header
      in /proc/devices.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5be0e951
  2. 22 Mar, 2006 6 commits
    • Christoph Lameter's avatar
      [PATCH] page migration reorg · b20a3503
      Christoph Lameter authored
      
      Centralize the page migration functions in anticipation of additional
      tinkering.  Creates a new file mm/migrate.c
      
      1. Extract buffer_migrate_page() from fs/buffer.c
      
      2. Extract central migration code from vmscan.c
      
      3. Extract some components from mempolicy.c
      
      4. Export pageout() and remove_from_swap() from vmscan.c
      
      5. Make it possible to configure NUMA systems without page migration
         and non-NUMA systems with page migration.
      
      I had to so some #ifdeffing in mempolicy.c that may need a cleanup.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b20a3503
    • Chen, Kenneth W's avatar
      [PATCH] convert hugetlbfs_counter to atomic · bba1e9b2
      Chen, Kenneth W authored
      
      Implementation of hugetlbfs_counter() is functionally equivalent to
      atomic_inc_return().  Use the simpler atomic form.
      Signed-off-by: default avatarKen Chen <kenneth.w.chen@intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bba1e9b2
    • David Gibson's avatar
      [PATCH] hugepage: Strict page reservation for hugepage inodes · b45b5bd6
      David Gibson authored
      
      These days, hugepages are demand-allocated at first fault time.  There's a
      somewhat dubious (and racy) heuristic when making a new mmap() to check if
      there are enough available hugepages to fully satisfy that mapping.
      
      A particularly obvious case where the heuristic breaks down is where a
      process maps its hugepages not as a single chunk, but as a bunch of
      individually mmap()ed (or shmat()ed) blocks without touching and
      instantiating the pages in between allocations.  In this case the size of
      each block is compared against the total number of available hugepages.
      It's thus easy for the process to become overcommitted, because each block
      mapping will succeed, although the total number of hugepages required by
      all blocks exceeds the number available.  In particular, this defeats such
      a program which will detect a mapping failure and adjust its hugepage usage
      downward accordingly.
      
      The patch below addresses this problem, by strictly reserving a number of
      physical hugepages for hugepage inodes which have been mapped, but not
      instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
      mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
      trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
      only if the sysadmin explicitly reduces the hugepage pool between mapping
      and instantiation)
      
      This patch appears to address the problem at hand - it allows DB2 to start
      correctly, for instance, which previously suffered the failure described
      above.
      
      This patch causes no regressions on the libhugetblfs testsuite, and makes a
      test (designed to catch this problem) pass which previously failed (ppc64,
      POWER5).
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b45b5bd6
    • Nick Piggin's avatar
      [PATCH] mm: nommu use compound pages · 84097518
      Nick Piggin authored
      
      Now that compound page handling is properly fixed in the VM, move nommu
      over to using compound pages rather than rolling their own refcounting.
      
      nommu vm page refcounting is broken anyway, but there is no need to have
      divergent code in the core VM now, nor when it gets fixed.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      
      (Needs testing, please).
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      84097518
    • Christoph Lameter's avatar
      [PATCH] slab: Remove SLAB_NO_REAP option · ac2b898c
      Christoph Lameter authored
      
      SLAB_NO_REAP is documented as an option that will cause this slab not to be
      reaped under memory pressure.  However, that is not what happens.  The only
      thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
      slab elements that were allocated in batch in cache_reap().  Cache_reap()
      is run every few seconds independently of memory pressure.
      
      Could we remove the whole thing?  Its only used by three slabs anyways and
      I cannot find a reason for having this option.
      
      There is an additional problem with SLAB_NO_REAP.  If set then the recovery
      of objects from alien caches is switched off.  Objects not freed on the
      same node where they were initially allocated will only be reused if a
      certain amount of objects accumulates from one alien node (not very likely)
      or if the cache is explicitly shrunk.  (Strangely __cache_shrink does not
      check for SLAB_NO_REAP)
      
      Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ac2b898c
    • Latchesar Ionkov's avatar
      [PATCH] v9fs: assign dentry ops to negative dentries · 5e7a99ac
      Latchesar Ionkov authored
      
      If a file is not found in v9fs_vfs_lookup, the function creates negative
      dentry, but doesn't assign any dentry ops.  This leaves the negative entry
      in the cache (there is no d_delete to mark it for removal).  If the file is
      created outside of the mounted v9fs filesystem, the file shows up in the
      directory with weird permissions.
      
      This patch assigns the default v9fs dentry ops to the negative dentry.
      Signed-off-by: default avatarLatchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: default avatarEric Van Hensbergen <ericvh@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5e7a99ac
  3. 21 Mar, 2006 1 commit
  4. 20 Mar, 2006 7 commits
  5. 17 Mar, 2006 2 commits
  6. 16 Mar, 2006 1 commit
    • Al Viro's avatar
      [PATCH] Fix ext2 readdir f_pos re-validation logic · 2d7f2ea9
      Al Viro authored
      
      This fixes not one, but _two_, silly (but admittedly hard to hit) bugs
      in the ext2 filesystem "readdir()" function.  It also cleans up the code
      to avoid the unnecessary goto mess.
      
      The bugs were related to re-valiating the f_pos value after somebody had
      either done an "lseek()" on the directory to an invalid offset, or when
      the offset had become invalid due to a file being unlinked in the
      directory.  The code would not only set the f_version too eagerly, it
      would also not update f_pos appropriately for when the offset fixup took
      place.
      
      When that happened, we'd occasionally subsequently fail the readdir()
      even when we shouldn't (no real harm done, but an ugly printk, and
      obviously you would end up not necessarily seeing all entries).
      
      Thanks to Masoud Sharbiani <masouds@google.com> who noticed the problem
      and had a test-case for it, and also fixed up a thinko in the first
      version of this patch.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarMasoud Sharbiani <masouds@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2d7f2ea9
  7. 15 Mar, 2006 3 commits
  8. 14 Mar, 2006 4 commits
  9. 11 Mar, 2006 2 commits
  10. 10 Mar, 2006 1 commit
  11. 09 Mar, 2006 2 commits
    • Dave Kleikamp's avatar
      JFS: add uid, gid, and umask mount options · 69eb66d7
      Dave Kleikamp authored
      
      OS/2 doesn't initialize the uid, gid, or unix-style permission bits.  The
      uid, gid, & umask mount options perform pretty much like those for the fat
      file system, overriding what is stored on disk.  This is useful for users
      sharing the file system with OS/2.
      
      I implemented a little feature so that if you mask the execute bit, it
      will be re-enabled on directories when the appropriate read bit is unmasked.
      I didn't want to implement an fmask & dmask option.
      Signed-off-by: default avatarDave Kleikamp <shaggy@austin.ibm.com>
      69eb66d7
    • Randy Dunlap's avatar
      [NET] compat ifconf: fix limits · 1efa3c05
      Randy Dunlap authored
      
      A recent change to compat. dev_ifconf() in fs/compat_ioctl.c
      causes ifconf data to be truncated 1 entry too early when copying it
      to userspace.  The correct amount of data (length) is returned,
      but the final entry is empty (zero, not filled in).
      The for-loop 'i' check should use <= to allow the final struct
      ifreq32 to be copied.  I also used the ifconf-corruption program
      in kernel bugzilla #4746 to make sure that this change does not
      re-introduce the corruption.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1efa3c05
  12. 08 Mar, 2006 7 commits
  13. 07 Mar, 2006 1 commit