1. 29 Oct, 2010 1 commit
  2. 26 Oct, 2010 3 commits
  3. 18 Aug, 2010 1 commit
  4. 10 Aug, 2010 2 commits
    • Shaohua Li's avatar
      shmem: reduce pagefault lock contention · ff36b801
      Shaohua Li authored
      
      I'm running a shmem pagefault test case (see attached file) under a 64 CPU
      system.  Profile shows shmem_inode_info->lock is heavily contented and
      100% CPUs time are trying to get the lock.  In the pagefault (no swap)
      case, shmem_getpage gets the lock twice, the last one is avoidable if we
      prealloc a page so we could reduce one time of locking.  This is what
      below patch does.
      
      The result of the test case:
      2.6.35-rc3: ~20s
      2.6.35-rc3 + patch: ~12s
      so this is 40% improvement.
      
      One might argue if we could have better locking for shmem.  But even shmem
      is lockless, the pagefault will soon have pagecache lock heavily contented
      because shmem must add new page to pagecache.  So before we have better
      locking for pagecache, improving shmem locking doesn't have too much
      improvement.  I did a similar pagefault test against a ramfs file, the
      test result is ~10.5s.
      
      [akpm@linux-foundation.org: fix comment, clean up code layout, elimintate code duplication]
      Signed-off-by: default avatarShaohua Li <shaohua.li@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff36b801
    • Tim Chen's avatar
      tmpfs: make tmpfs scalable with percpu_counter for used blocks · 7e496299
      Tim Chen authored
      
      The current implementation of tmpfs is not scalable.  We found that
      stat_lock is contended by multiple threads when we need to get a new page,
      leading to useless spinning inside this spin lock.
      
      This patch makes use of the percpu_counter library to maintain local count
      of used blocks to speed up getting and returning of pages.  So the
      acquisition of stat_lock is unnecessary for getting and returning blocks,
      improving the performance of tmpfs on system with large number of cpus.
      On a 4 socket 32 core NHM-EX system, we saw improvement of 270%.
      
      The implementation below has a slight chance of race between threads
      causing a slight overshoot of the maximum configured blocks.  However, any
      overshoot is small, and is bounded by the number of cpus.  This happens
      when the number of used blocks is slightly below the maximum configured
      blocks when a thread checks the used block count, and another thread
      allocates the last block before the current thread does.  This should not
      be a problem for tmpfs, as the overshoot is most likely to be a few blocks
      and bounded.  If a strict limit is really desired, then configured the max
      blocks to be the limit less the number of cpus in system.
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e496299
  5. 09 Aug, 2010 4 commits
  6. 04 Jun, 2010 1 commit
    • Nick Piggin's avatar
      fix truncate inode time modification breakage · af5a30d8
      Nick Piggin authored
      mtime and ctime should be changed only if the file size has actually
      changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate
      sequence has caused regressions where they always update timestamps.
      
      There is some strange cases in POSIX where truncate(2) must not update
      times unless the size has acutally changed, see 6e656be8
      
      .
      
      This area is all still rather buggy in different ways in a lot of
      filesystems and needs a cleanup and audit (ideally the vfs will provide
      a simple attribute or call to direct all filesystems exactly which
      attributes to change). But coming up with the best solution will take a
      while and is not appropriate for rc anyway.
      
      So fix recent regression for now.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      af5a30d8
  7. 28 May, 2010 2 commits
  8. 27 May, 2010 1 commit
    • Daisuke Nishimura's avatar
      memcg: move charge of file pages · 87946a72
      Daisuke Nishimura authored
      
      This patch adds support for moving charge of file pages, which include
      normal file, tmpfs file and swaps of tmpfs file.  It's enabled by setting
      bit 1 of <target cgroup>/memory.move_charge_at_immigrate.
      
      Unlike the case of anonymous pages, file pages(and swaps) in the range
      mmapped by the task will be moved even if the task hasn't done page fault,
      i.e.  they might not be the task's "RSS", but other task's "RSS" that maps
      the same file.  And mapcount of the page is ignored(the page can be moved
      even if page_mapcount(page) > 1).  So, conditions that the page/swap
      should be met to be moved is that it must be in the range mmapped by the
      target task and it must be charged to the old cgroup.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      87946a72
  9. 25 May, 2010 1 commit
  10. 21 May, 2010 2 commits
  11. 17 Dec, 2009 1 commit
  12. 16 Dec, 2009 5 commits
  13. 15 Dec, 2009 1 commit
  14. 27 Sep, 2009 1 commit
  15. 25 Sep, 2009 1 commit
  16. 22 Sep, 2009 4 commits
    • Pekka Enberg's avatar
      shmem: initialize struct shmem_sb_info to zero · 425fbf04
      Pekka Enberg authored
      
      Fixes the following kmemcheck false positive (the compiler is using
      a 32-bit mov to load the 16-bit sbinfo->mode in shmem_fill_super):
      
      [    0.337000] Total of 1 processors activated (3088.38 BogoMIPS).
      [    0.352000] CPU0 attaching NULL sched-domain.
      [    0.360000] WARNING: kmemcheck: Caught 32-bit read from uninitialized
      memory (9f8020fc)
      [    0.361000]
      a44240820000000041f6998100000000000000000000000000000000ff030000
      [    0.368000]  i i i i i i i i i i i i i i i i u u u u i i i i i i i i i i u
      u
      [    0.375000]                                                          ^
      [    0.376000]
      [    0.377000] Pid: 9, comm: khelper Not tainted (2.6.31-tip #206) P4DC6
      [    0.378000] EIP: 0060:[<810a3a95>] EFLAGS: 00010246 CPU: 0
      [    0.379000] EIP is at shmem_fill_super+0xb5/0x120
      [    0.380000] EAX: 00000000 EBX: 9f845400 ECX: 824042a4 EDX: 8199f641
      [    0.381000] ESI: 9f8020c0 EDI: 9f845400 EBP: 9f81af68 ESP: 81cd6eec
      [    0.382000]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [    0.383000] CR0: 8005003b CR2: 9f806200 CR3: 01ccd000 CR4: 000006d0
      [    0.384000] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      [    0.385000] DR6: ffff4ff0 DR7: 00000400
      [    0.386000]  [<810c25fc>] get_sb_nodev+0x3c/0x80
      [    0.388000]  [<810a3514>] shmem_get_sb+0x14/0x20
      [    0.390000]  [<810c207f>] vfs_kern_mount+0x4f/0x120
      [    0.392000]  [<81b2849e>] init_tmpfs+0x7e/0xb0
      [    0.394000]  [<81b11597>] do_basic_setup+0x17/0x30
      [    0.396000]  [<81b11907>] kernel_init+0x57/0xa0
      [    0.398000]  [<810039b7>] kernel_thread_helper+0x7/0x10
      [    0.400000]  [<ffffffff>] 0xffffffff
      [    0.402000] khelper used greatest stack depth: 2820 bytes left
      [    0.407000] calling  init_mmap_min_addr+0x0/0x10 @ 1
      [    0.408000] initcall init_mmap_min_addr+0x0/0x10 returned 0 after 0 usecs
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Analysed-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      425fbf04
    • Hugh Dickins's avatar
      tmpfs: depend on shmem · 3f96b79a
      Hugh Dickins authored
      
      CONFIG_SHMEM off gives you (ramfs masquerading as) tmpfs, even when
      CONFIG_TMPFS is off: that's a little anomalous, and I'd intended to make
      more sense of it by removing CONFIG_TMPFS altogether, always enabling its
      code when CONFIG_SHMEM; but so many defconfigs have CONFIG_SHMEM on
      CONFIG_TMPFS off that we'd better leave that as is.
      
      But there is no point in asking for CONFIG_TMPFS if CONFIG_SHMEM is off:
      make TMPFS depend on SHMEM, which also prevents TMPFS_POSIX_ACL
      shmem_acl.o being pointlessly built into the kernel when SHMEM is off.
      
      And a selfish change, to prevent the world from being rebuilt when I
      switch between CONFIG_SHMEM on and off: the only CONFIG_SHMEM in the
      header files is mm.h shmem_lock() - give that a shmem.c stub instead.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f96b79a
    • Jaswinder Singh Rajput's avatar
      mm: includecheck fix for mm/shmem.c · cff397e6
      Jaswinder Singh Rajput authored
      
      Fix the following 'make includecheck' warning:
      
        mm/shmem.c: linux/vfs.h is included more than once.
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cff397e6
    • Daisuke Nishimura's avatar
      mm: add_to_swap_cache() does not return -EEXIST · 2ca4532a
      Daisuke Nishimura authored
      After commit 355cfa73
      
       ("mm: modify swap_map and add SWAP_HAS_CACHE flag"),
      only the context which have set SWAP_HAS_CACHE flag by swapcache_prepare()
      or get_swap_page() would call add_to_swap_cache().  So add_to_swap_cache()
      doesn't return -EEXIST any more.
      
      Even though it doesn't return -EEXIST, it's not good behavior conceptually
      to call swapcache_prepare() in the -EEXIST case, because it means clearing
      SWAP_HAS_CACHE flag while the entry is on swap cache.
      
      This patch removes redundant codes and comments from callers of it, and
      adds VM_BUG_ON() in error path of add_to_swap_cache() and some comments.
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2ca4532a
  17. 16 Sep, 2009 2 commits
    • Andi Kleen's avatar
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen authored
      
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      aa261f54
    • Wu Fengguang's avatar
      HWPOISON: shmem: call set_page_dirty() with locked page · 6746aff7
      Wu Fengguang authored
      
      The dirtying of page and set_page_dirty() can be moved into the page lock.
      
      - In shmem_write_end(), the page was dirtied while the page lock was held,
        but it's being marked dirty just after dropping the page lock.
      - In shmem_symlink(), both dirtying and marking can be moved into page lock.
      
      It's valuable for the hwpoison code to know whether one bad page can be dropped
      without losing data. It mainly judges by testing the PG_dirty bit after taking
      the page lock. So it becomes important that the dirtying of page and the
      marking of dirtiness are both done inside the page lock. Which is a common
      practice, but sadly not a rule.
      
      The noticeable exceptions are
      - mapped pages
      - pages with buffer_heads
      The above pages could go dirty at any time. Fortunately the hwpoison will
      unmap the page and release the buffer_heads beforehand anyway.
      
      Many other types of pages (eg. metadata pages) can also be dirtied at will by
      their owners, the hwpoison code cannot do meaningful things to them anyway.
      Only the dirtiness of pagecache pages owned by regular files are interested.
      
      v2: AK: Add comment about set_page_dirty rules (suggested by Peter Zijlstra)
      Acked-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Reviewed-by: default avatarWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      6746aff7
  18. 15 Sep, 2009 1 commit
    • Kay Sievers's avatar
      Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev · 2b2af54a
      Kay Sievers authored
      
      Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
      very early at kernel initialization, before any driver-core device
      is registered. Every device with a major/minor will provide a
      device node in devtmpfs.
      
      Devtmpfs can be changed and altered by userspace at any time,
      and in any way needed - just like today's udev-mounted tmpfs.
      Unmodified udev versions will run just fine on top of it, and will
      recognize an already existing kernel-created device node and use it.
      The default node permissions are root:root 0600. Proper permissions
      and user/group ownership, meaningful symlinks, all other policy still
      needs to be applied by userspace.
      
      If a node is created by devtmps, devtmpfs will remove the device node
      when the device goes away. If the device node was created by
      userspace, or the devtmpfs created node was replaced by userspace, it
      will no longer be removed by devtmpfs.
      
      If it is requested to auto-mount it, it makes init=/bin/sh work
      without any further userspace support. /dev will be fully populated
      and dynamic, and always reflect the current device state of the kernel.
      With the commonly used dynamic device numbers, it solves the problem
      where static devices nodes may point to the wrong devices.
      
      It is intended to make the initial bootup logic simpler and more robust,
      by de-coupling the creation of the inital environment, to reliably run
      userspace processes, from a complex userspace bootstrap logic to provide
      a working /dev.
      Signed-off-by: default avatarKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: default avatarJan Blunck <jblunck@suse.de>
      Tested-By: default avatarHarald Hoyer <harald@redhat.com>
      Tested-By: default avatarScott James Remnant <scott@ubuntu.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2b2af54a
  19. 08 Sep, 2009 1 commit
  20. 24 Jun, 2009 2 commits
  21. 17 Jun, 2009 2 commits
  22. 21 May, 2009 1 commit
    • Mimi Zohar's avatar
      integrity: move ima_counts_get · c9d9ac52
      Mimi Zohar authored
      
      Based on discussion on lkml (Andrew Morton and Eric Paris),
      move ima_counts_get down a layer into shmem/hugetlb__file_setup().
      Resolves drm shmem_file_setup() usage case as well.
      
      HD comment:
        I still think you're doing this at the wrong level, but recognize
        that you probably won't be persuaded until a few more users of
        alloc_file() emerge, all wanting your ima_counts_get().
      
        Resolving GEM's shmem_file_setup() is an improvement, so I'll say
      Acked-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarMimi Zohar <zohar@us.ibm.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      c9d9ac52