1. 29 Oct, 2010 6 commits
  2. 28 Oct, 2010 2 commits
    • Ingo Molnar's avatar
      Fix compile brekage with !CONFIG_BLOCK · b31d42a5
      Ingo Molnar authored
      Today's git tree fails to build on !CONFIG_BLOCK, due to upstream commit
      367a51a3
      
       ("fs: Add FITRIM ioctl"):
      
       include/linux/fs.h:36: error: expected specifier-qualifier-list before ‘uint64_t’
       include/linux/fs.h:36: error: expected specifier-qualifier-list before ‘uint64_t’
       include/linux/fs.h:36: error: expected specifier-qualifier-list before ‘uint64_t’
      
      The commit adds uint64_t type usage to fs.h, but linux/types.h is not included
      explicitly - it's only included implicitly via linux/blk_types.h, and there only if
      CONFIG_BLOCK is enabled.
      
      Add the explicit #include to fix this.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b31d42a5
    • Lukas Czerner's avatar
      fs: Add FITRIM ioctl · 367a51a3
      Lukas Czerner authored
      
      Adds an filesystem independent ioctl to allow implementation of file
      system batched discard support. I takes fstrim_range structure as an
      argument. fstrim_range is definec in the include/fs.h and its
      definition is as follows.
      
      struct fstrim_range {
      	start;
      	len;
      	minlen;
      }
      
      start	- first Byte to trim
      len	- number of Bytes to trim from start
      minlen	- minimum extent length to trim, free extents shorter than this
      	  number of Bytes will be ignored. This will be rounded up to fs
      	  block size.
      
      It is also possible to specify NULL as an argument. In this case the
      arguments will set itself as follows:
      
      start = 0;
      len = ULLONG_MAX;
      minlen = 0;
      
      So it will trim the whole file system at one run.
      
      After the FITRIM is done, the number of actually discarded Bytes is stored
      in fstrim_range.len to give the user better insight on how much storage
      space has been really released for wear-leveling.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reviewed-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      367a51a3
  3. 27 Oct, 2010 2 commits
  4. 26 Oct, 2010 16 commits
    • Eric Dumazet's avatar
      fs: allow for more than 2^31 files · 518de9b3
      Eric Dumazet authored
      
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of atomic_t.
      
      get_max_files() is changed to return an unsigned long.  get_nr_files() is
      changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: default avatarRobin Holt <holt@sgi.com>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarDavid Miller <davem@davemloft.net>
      Reviewed-by: default avatarRobin Holt <holt@sgi.com>
      Tested-by: default avatarRobin Holt <holt@sgi.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      518de9b3
    • Eric Paris's avatar
      IMA: explicit IMA i_flag to remove global lock on inode_delete · 196f5181
      Eric Paris authored
      
      Currently for every removed inode IMA must take a global lock and search
      the IMA rbtree looking for an associated integrity structure.  Instead
      we explicitly mark an inode when we add an integrity structure so we
      only have to take the global lock and do the removal if it exists.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Acked-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      196f5181
    • Eric Paris's avatar
      IMA: move read counter into struct inode · a178d202
      Eric Paris authored
      
      IMA currently allocated an inode integrity structure for every inode in
      core.  This stucture is about 120 bytes long.  Most files however
      (especially on a system which doesn't make use of IMA) will never need
      any of this space.  The problem is that if IMA is enabled we need to
      know information about the number of readers and the number of writers
      for every inode on the box.  At the moment we collect that information
      in the per inode iint structure and waste the rest of the space.  This
      patch moves those counters into the struct inode so we can eventually
      stop allocating an IMA integrity structure except when absolutely
      needed.
      
      This patch does the minimum needed to move the location of the data.
      Further cleanups, especially the location of counter updates, may still
      be possible.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Acked-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a178d202
    • Nick Piggin's avatar
      fs: inode split IO and LRU lists · 7ccf19a8
      Nick Piggin authored
      
      The use of the same inode list structure (inode->i_list) for two
      different list constructs with different lifecycles and purposes
      makes it impossible to separate the locking of the different
      operations. Therefore, to enable the separation of the locking of
      the writeback and reclaim lists, split the inode->i_list into two
      separate lists dedicated to their specific tracking functions.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7ccf19a8
    • Christoph Hellwig's avatar
      fs: use percpu counter for nr_dentry and nr_dentry_unused · 312d3ca8
      Christoph Hellwig authored
      
      The nr_dentry stat is a globally touched cacheline and atomic operation
      twice over the lifetime of a dentry. It is used for the benfit of userspace
      only. Turn it into a per-cpu counter and always decrement it in d_free instead
      of doing various batching operations to reduce lock hold times in the callers.
      
      Based on an earlier patch from Nick Piggin <npiggin@suse.de>.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      312d3ca8
    • Christoph Hellwig's avatar
      fs: do not assign default i_ino in new_inode · 85fe4025
      Christoph Hellwig authored
      
      Instead of always assigning an increasing inode number in new_inode
      move the call to assign it into those callers that actually need it.
      For now callers that need it is estimated conservatively, that is
      the call is added to all filesystems that do not assign an i_ino
      by themselves.  For a few more filesystems we can avoid assigning
      any inode number given that they aren't user visible, and for others
      it could be done lazily when an inode number is actually needed,
      but that's left for later patches.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      85fe4025
    • Al Viro's avatar
      new helper: ihold() · 7de9c6ee
      Al Viro authored
      
      Clones an existing reference to inode; caller must already hold one.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7de9c6ee
    • Christoph Hellwig's avatar
      fs: remove inode_add_to_list/__inode_add_to_list · 646ec461
      Christoph Hellwig authored
      
      Split up inode_add_to_list/__inode_add_to_list.  Locking for the two
      lists will be split soon so these helpers really don't buy us much
      anymore.
      
      The __ prefixes for the sb list helpers will go away soon, but until
      inode_lock is gone we'll need them to distinguish between the locked
      and unlocked variants.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      646ec461
    • Nick Piggin's avatar
      fs: Implement lazy LRU updates for inodes · 9e38d86f
      Nick Piggin authored
      
      Convert the inode LRU to use lazy updates to reduce lock and
      cacheline traffic.  We avoid moving inodes around in the LRU list
      during iget/iput operations so these frequent operations don't need
      to access the LRUs. Instead, we defer the refcount checks to
      reclaim-time and use a per-inode state flag, I_REFERENCED, to tell
      reclaim that iget has touched the inode in the past. This means that
      only reclaim should be touching the LRU with any frequency, hence
      significantly reducing lock acquisitions and the amount contention
      on LRU updates.
      
      This also removes the inode_in_use list, which means we now only
      have one list for tracking the inode LRU status. This makes it much
      simpler to split out the LRU list operations under it's own lock.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9e38d86f
    • Dave Chinner's avatar
      fs: Convert nr_inodes and nr_unused to per-cpu counters · cffbc8aa
      Dave Chinner authored
      
      The number of inodes allocated does not need to be tied to the
      addition or removal of an inode to/from a list. If we are not tied
      to a list lock, we could update the counters when inodes are
      initialised or destroyed, but to do that we need to convert the
      counters to be per-cpu (i.e. independent of a lock). This means that
      we have the freedom to change the list/locking implementation
      without needing to care about the counters.
      
      Based on a patch originally from Eric Dumazet.
      
      [AV: cleaned up a bit, fixed build breakage on weird configs
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cffbc8aa
    • Al Viro's avatar
      new helper: inode_unhashed() · 1d3382cb
      Al Viro authored
      
      note: for race-free uses you inode_lock held
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1d3382cb
    • Al Viro's avatar
      unexport invalidate_inodes · a8dade34
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a8dade34
    • KAMEZAWA Hiroyuki's avatar
      vfs: introduce FMODE_UNSIGNED_OFFSET for allowing negative f_pos · 4a3956c7
      KAMEZAWA Hiroyuki authored
      
      Now, rw_verify_area() checsk f_pos is negative or not.  And if negative,
      returns -EINVAL.
      
      But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..  has
      negative offsets.  And we can't do any access via read/write to the
      file(device).
      
      So introduce FMODE_UNSIGNED_OFFSET to allow negative file offsets.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4a3956c7
    • Eric Dumazet's avatar
      fs: allow for more than 2^31 files · 7e360c38
      Eric Dumazet authored
      
      Andrew,
      
      Could you please review this patch, you probably are the right guy to
      take it, because it crosses fs and net trees.
      
      Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt
      depend on previous patch (sysctl: fix min/max handling in
      __do_proc_doulongvec_minmax())
      
      Thanks !
      
      [PATCH V4] fs: allow for more than 2^31 files
      
      Robin Holt tried to boot a 16TB system and found af_unix was overflowing
      a 32bit value :
      
      <quote>
      
      We were seeing a failure which prevented boot.  The kernel was incapable
      of creating either a named pipe or unix domain socket.  This comes down
      to a common kernel function called unix_create1() which does:
      
              atomic_inc(&unix_nr_socks);
              if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                      goto out;
      
      The function get_max_files() is a simple return of files_stat.max_files.
      files_stat.max_files is a signed integer and is computed in
      fs/file_table.c's files_init().
      
              n = (mempages * (PAGE_SIZE / 1024)) / 10;
              files_stat.max_files = n;
      
      In our case, mempages (total_ram_pages) is approx 3,758,096,384
      (0xe0000000).  That leaves max_files at approximately 1,503,238,553.
      This causes 2 * get_max_files() to integer overflow.
      
      </quote>
      
      Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
      integers, and change af_unix to use an atomic_long_t instead of
      atomic_t.
      
      get_max_files() is changed to return an unsigned long.
      get_nr_files() is changed to return a long.
      
      unix_nr_socks is changed from atomic_t to atomic_long_t, while not
      strictly needed to address Robin problem.
      
      Before patch (on a 64bit kernel) :
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      -18446744071562067968
      
      After patch:
      # echo 2147483648 >/proc/sys/fs/file-max
      # cat /proc/sys/fs/file-max
      2147483648
      # cat /proc/sys/fs/file-nr
      704     0       2147483648
      Reported-by: default avatarRobin Holt <holt@sgi.com>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarDavid Miller <davem@davemloft.net>
      Reviewed-by: default avatarRobin Holt <holt@sgi.com>
      Tested-by: default avatarRobin Holt <holt@sgi.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      7e360c38
    • Christoph Hellwig's avatar
      fs: mark destroy_inode static · 56b0dacf
      Christoph Hellwig authored
      
      Hugetlbfs used to need it, but after the destroy_inode and evict_inode
      changes it's not required anymore.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      56b0dacf
    • Christoph Hellwig's avatar
      fs: add sync_inode_metadata · c3765016
      Christoph Hellwig authored
      
      Add a new helper to write out the inode using the writeback code,
      that is including the correct dirty bit and list manipulation.  A few
      of filesystems already opencode this, and a lot of others should be
      using it instead of using write_inode_now which also writes out the
      data.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c3765016
  5. 05 Oct, 2010 1 commit
    • Arnd Bergmann's avatar
      fs/locks.c: prepare for BKL removal · b89f4321
      Arnd Bergmann authored
      
      This prepares the removal of the big kernel lock from the
      file locking code. We still use the BKL as long as fs/lockd
      uses it and ceph might sleep, but we can flip the definition
      to a private spinlock as soon as that's done.
      All users outside of fs/lockd get converted to use
      lock_flocks() instead of lock_kernel() where appropriate.
      
      Based on an earlier patch to use a spinlock from Matthew
      Wilcox, who has attempted this a few times before, the
      earliest patch from over 10 years ago turned it into
      a semaphore, which ended up being slower than the BKL
      and was subsequently reverted.
      
      Someone should do some serious performance testing when
      this becomes a spinlock, since this has caused problems
      before. Using a spinlock should be at least as good
      as the BKL in theory, but who knows...
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarMatthew Wilcox <willy@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Miklos Szeredi <mszeredi@suse.cz>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      b89f4321
  6. 22 Sep, 2010 1 commit
  7. 16 Sep, 2010 1 commit
  8. 10 Sep, 2010 4 commits
    • Patrick J. LoPresti's avatar
      ext3/ext4: Factor out disk addressability check · 30ca22c7
      Patrick J. LoPresti authored
      
      As part of adding support for OCFS2 to mount huge volumes, we need to
      check that the sector_t and page cache of the system are capable of
      addressing the entire volume.
      
      An identical check already appears in ext3 and ext4.  This patch moves
      the addressability check into its own function in fs/libfs.c and
      modifies ext3 and ext4 to invoke it.
      
      [Edited to -EINVAL instead of BUG_ON() for bad blocksize_bits -- Joel]
      Signed-off-by: default avatarPatrick LoPresti <lopresti@gmail.com>
      Cc: linux-ext4@vger.kernel.org
      Acked-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
      30ca22c7
    • Christoph Hellwig's avatar
      block: remove the BLKDEV_IFL_BARRIER flag · 8c555367
      Christoph Hellwig authored
      
      Remove support for barriers on discards, which is unused now.  Also
      remove the DISCARD_NOBARRIER I/O type in favour of just setting the
      rw flags up locally in blkdev_issue_discard.
      
      tj: Also remove DISCARD_SECURE and use REQ_SECURE directly.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      8c555367
    • Christoph Hellwig's avatar
      block: remove the WRITE_BARRIER flag · 31725e65
      Christoph Hellwig authored
      
      It's unused now.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      31725e65
    • Tejun Heo's avatar
      block: implement REQ_FLUSH/FUA based interface for FLUSH/FUA requests · 4fed947c
      Tejun Heo authored
      
      Now that the backend conversion is complete, export sequenced
      FLUSH/FUA capability through REQ_FLUSH/FUA flags.  REQ_FLUSH means the
      device cache should be flushed before executing the request.  REQ_FUA
      means that the data in the request should be on non-volatile media on
      completion.
      
      Block layer will choose the correct way of implementing the semantics
      and execute it.  The request may be passed to the device directly if
      the device can handle it; otherwise, it will be sequenced using one or
      more proxy requests.  Devices will never see REQ_FLUSH and/or FUA
      which it doesn't support.
      
      Also, unlike the original REQ_HARDBARRIER, REQ_FLUSH/FUA requests are
      never failed with -EOPNOTSUPP.  If the underlying device doesn't
      support FLUSH/FUA, the block layer simply make those noop.  IOW, it no
      longer distinguishes between writeback cache which doesn't support
      cache flush and writethrough/no cache.  Devices which have WB cache
      w/o flush are very difficult to come by these days and there's nothing
      much we can do anyway, so it doesn't make sense to require everyone to
      implement -EOPNOTSUPP handling.  This will simplify filesystems and
      block drivers as they can drop -EOPNOTSUPP retry logic for barriers.
      
      * QUEUE_ORDERED_* are removed and QUEUE_FSEQ_* are moved into
        blk-flush.c.
      
      * REQ_FLUSH w/o data can also be directly passed to drivers without
        sequencing but some drivers assume that zero length requests don't
        have rq->bio which isn't true for these requests requiring the use
        of proxy requests.
      
      * REQ_COMMON_MASK now includes REQ_FLUSH | REQ_FUA so that they are
        copied from bio to request.
      
      * WRITE_BARRIER is marked deprecated and WRITE_FLUSH, WRITE_FUA and
        WRITE_FLUSH_FUA are added.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      4fed947c
  9. 20 Aug, 2010 1 commit
  10. 18 Aug, 2010 4 commits
    • Nick Piggin's avatar
      fs: scale files_lock · 6416ccb7
      Nick Piggin authored
      
      fs: scale files_lock
      
      Improve scalability of files_lock by adding per-cpu, per-sb files lists,
      protected with an lglock. The lglock provides fast access to the per-cpu lists
      to add and remove files. It also provides a snapshot of all the per-cpu lists
      (although this is very slow).
      
      One difficulty with this approach is that a file can be removed from the list
      by another CPU. We must track which per-cpu list the file is on with a new
      variale in the file struct (packed into a hole on 64-bit archs). Scalability
      could suffer if files are frequently removed from different cpu's list.
      
      However loads with frequent removal of files imply short interval between
      adding and removing the files, and the scheduler attempts to avoid moving
      processes too far away. Also, even in the case of cross-CPU removal, the
      hardware has much more opportunity to parallelise cacheline transfers with N
      cachelines than with 1.
      
      A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
      degenerates to contending on a single lock, which is no worse than before. When
      more than one CPU are allocating files, even if they are always freed by
      different CPUs, there will be more parallelism than the single-lock case.
      
      Testing results:
      
      On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
      to remove the file, the number of times it is removed by the same CPU that
      added it, and the number of times it is removed by the same node that added it.
      
      Booting:    locks=  25049 cpu-hits=  23174 (92.5%) node-hits=  23945 (95.6%)
      kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
      dbench 64   locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)
      
      So a file is removed from the same CPU it was added by over 90% of the time.
      It remains within the same node 95% of the time.
      
      Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
      
                      throughput
      2.6.34-rc2      24.5
      +patch          24.9
      
                      us      sys     idle    IO wait (in %)
      2.6.34-rc2      51.25   28.25   17.25   3.25
      +patch          53.75   18.5    19      8.75
      
      So significantly less CPU time spent in kernel code, higher idle time and
      slightly higher throughput.
      
      Single threaded performance difference was within the noise of microbenchmarks.
      That is not to say penalty does not exist, the code is larger and more memory
      accesses required so it will be slightly slower.
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6416ccb7
    • Nick Piggin's avatar
      tty: fix fu_list abuse · d996b62a
      Nick Piggin authored
      
      tty: fix fu_list abuse
      
      tty code abuses fu_list, which causes a bug in remount,ro handling.
      
      If a tty device node is opened on a filesystem, then the last link to the inode
      removed, the filesystem will be allowed to be remounted readonly. This is
      because fs_may_remount_ro does not find the 0 link tty inode on the file sb
      list (because the tty code incorrectly removed it to use for its own purpose).
      This can result in a filesystem with errors after it is marked "clean".
      
      Taking idea from Christoph's initial patch, allocate a tty private struct
      at file->private_data and put our required list fields in there, linking
      file and tty. This makes tty nodes behave the same way as other device nodes
      and avoid meddling with the vfs, and avoids this bug.
      
      The error handling is not trivial in the tty code, so for this bugfix, I take
      the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
      This is not a problem because our allocator doesn't fail small allocs as a rule
      anyway. So proper error handling is left as an exercise for tty hackers.
      
      [ Arguably filesystem's device inode would ideally be divorced from the
      driver's pseudo inode when it is opened, but in practice it's not clear whether
      that will ever be worth implementing. ]
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d996b62a
    • Nick Piggin's avatar
      fs: cleanup files_lock locking · ee2ffa0d
      Nick Piggin authored
      
      fs: cleanup files_lock locking
      
      Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
      manipulate the per-sb files list; unexport the files_lock spinlock.
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Acked-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ee2ffa0d
    • Christoph Hellwig's avatar
      remove SWRITE* I/O types · 9cb569d6
      Christoph Hellwig authored
      
      These flags aren't real I/O types, but tell ll_rw_block to always
      lock the buffer instead of giving up on a failed trylock.
      
      Instead add a new write_dirty_buffer helper that implements this semantic
      and use it from the existing SWRITE* callers.  Note that the ll_rw_block
      code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
      this patch fixes.
      
      In the ufs code clean up the helper that used to call ll_rw_block
      to mirror sync_dirty_buffer, which is the function it implements for
      compound buffers.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9cb569d6
  11. 13 Aug, 2010 2 commits