1. 12 Jun, 2009 2 commits
    • Jan Kara's avatar
      vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815
      Jan Kara authored
      
      It is unnecessarily fragile to have two places (fsync_super() and do_sync())
      doing data integrity sync of the filesystem. Alter __fsync_super() to
      accommodate needs of both callers and use it. So after this patch
      __fsync_super() is the only place where we gather all the calls needed to
      properly send all data on a filesystem to disk.
      
      Nice bonus is that we get a complete livelock avoidance and write_supers()
      is now only used for periodic writeback of superblocks.
      
      sync_blockdevs() introduced a couple of patches ago is gone now.
      
      [build fixes folded]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5cee5815
    • Jan Kara's avatar
      vfs: Make __fsync_super() a static function (version 4) · 429479f0
      Jan Kara authored
      
      __fsync_super() does the same thing as fsync_super(). So change the only
      caller to use fsync_super() and make __fsync_super() static. This removes
      unnecessarily duplicated call to sync_blockdev() and prepares ground
      for the changes to __fsync_super() in the following patches.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      429479f0
  2. 11 Jun, 2009 1 commit
  3. 04 Jun, 2009 1 commit
  4. 22 May, 2009 1 commit
  5. 28 Apr, 2009 1 commit
  6. 01 Apr, 2009 1 commit
  7. 27 Mar, 2009 1 commit
  8. 10 Jan, 2009 1 commit
    • Takashi Sato's avatar
      filesystem freeze: implement generic freeze feature · fcccf502
      Takashi Sato authored
      
      The ioctls for the generic freeze feature are below.
      o Freeze the filesystem
        int ioctl(int fd, int FIFREEZE, arg)
          fd: The file descriptor of the mountpoint
          FIFREEZE: request code for the freeze
          arg: Ignored
          Return value: 0 if the operation succeeds. Otherwise, -1
      
      o Unfreeze the filesystem
        int ioctl(int fd, int FITHAW, arg)
          fd: The file descriptor of the mountpoint
          FITHAW: request code for unfreeze
          arg: Ignored
          Return value: 0 if the operation succeeds. Otherwise, -1
          Error number: If the filesystem has already been unfrozen,
                        errno is set to EINVAL.
      
      [akpm@linux-foundation.org: fix CONFIG_BLOCK=n]
      Signed-off-by: default avatarTakashi Sato <t-sato@yk.jp.nec.com>
      Signed-off-by: default avatarMasayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
      Cc: <xfs-masters@oss.sgi.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fcccf502
  9. 08 Jan, 2009 1 commit
    • NeilBrown's avatar
      md: make devices disappear when they are no longer needed. · d3374825
      NeilBrown authored
      
      Currently md devices, once created, never disappear until the module
      is unloaded.  This is essentially because the gendisk holds a
      reference to the mddev, and the mddev holds a reference to the
      gendisk, this a circular reference.
      
      If we drop the reference from mddev to gendisk, then we need to ensure
      that the mddev is destroyed when the gendisk is destroyed.  However it
      is not possible to hook into the gendisk destruction process to enable
      this.
      
      So we drop the reference from the gendisk to the mddev and destroy the
      gendisk when the mddev gets destroyed.  However this has a
      complication.
      Between the call
         __blkdev_get->get_gendisk->kobj_lookup->md_probe
      and the call
         __blkdev_get->md_open
      
      there is no obvious way to hold a reference on the mddev any more, so
      unless something is done, it will disappear and gendisk will be
      destroyed prematurely.
      
      Also, once we decide to destroy the mddev, there will be an unlockable
      moment before the gendisk is unlinked (blk_unregister_region) during
      which a new reference to the gendisk can be created.  We need to
      ensure that this reference can not be used.  i.e. the ->open must
      fail.
      
      So:
       1/  in md_probe we set a flag in the mddev (hold_active) which
           indicates that the array should be treated as active, even
           though there are no references, and no appearance of activity.
           This is cleared by md_release when the device is closed if it
           is no longer needed.
           This ensures that the gendisk will survive between md_probe and
           md_open.
      
       2/  In md_open we check if the mddev we expect to open matches
           the gendisk that we did open.
           If there is a mismatch we return -ERESTARTSYS and modify
           __blkdev_get to retry from the top in that case.
           In the -ERESTARTSYS sys case we make sure to wait until
           the old gendisk (that we succeeded in opening) is really gone so
           we loop at most once.
      
      Some udev configurations will always open an md device when it first
      appears.   If we allow an md device that was just created by an open
      to disappear on an immediate close, then this can race with such udev
      configurations and result in an infinite loop the device being opened
      and closed, then re-open due to the 'ADD' even from the first open,
      and then close and so on.
      So we make sure an md device, once created by an open, remains active
      at least until some md 'ioctl' has been made on it.  This means that
      all normal usage of md devices will allow them to disappear promptly
      when not needed, but the worst that an incorrect usage will do it
      cause an inactive md device to be left in existence (it can easily be
      removed).
      
      As an array can be stopped by writing to a sysfs attribute
        echo clear > /sys/block/mdXXX/md/array_state
      we need to use scheduled work for deleting the gendisk and other
      kobjects.  This allows us to wait for any pending gendisk deletion to
      complete by simply calling flush_scheduled_work().
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      d3374825
  10. 06 Jan, 2009 1 commit
  11. 03 Jan, 2009 1 commit
  12. 31 Dec, 2008 1 commit
  13. 04 Dec, 2008 2 commits
  14. 06 Nov, 2008 1 commit
  15. 23 Oct, 2008 1 commit
  16. 21 Oct, 2008 8 commits
  17. 17 Oct, 2008 1 commit
    • Randy Dunlap's avatar
      block: fix current kernel-doc warnings · 496aa8a9
      Randy Dunlap authored
      
      Fix block kernel-doc warnings:
      
      Warning(linux-2.6.27-git4//fs/block_dev.c:1272): No description found for parameter 'path'
      Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'cpu'
      Warning(linux-2.6.27-git4//block/blk-core.c:1021): No description found for parameter 'part'
      Warning(/var/linsrc/linux-2.6.27-git4//block/genhd.c:544): No description found for parameter 'partno'
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      496aa8a9
  18. 09 Oct, 2008 12 commits
    • Randy Dunlap's avatar
      block_dev: fix kernel-doc in new functions · 57d1b536
      Randy Dunlap authored
      
      Fix kernel-doc in new functions:
      
      Error(mmotm-2008-1002-1617//fs/block_dev.c:895): duplicate section name 'Description'
      Error(mmotm-2008-1002-1617//fs/block_dev.c:924): duplicate section name 'Description'
      Warning(mmotm-2008-1002-1617//fs/block_dev.c:1282): No description found for parameter 'pathname'
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      cc: Andrew Patterson <andrew.patterson@hp.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      57d1b536
    • Andrew Patterson's avatar
      Call flush_disk() after detecting an online resize. · 608aeef1
      Andrew Patterson authored
      
      We call flush_disk() to make sure the buffer cache for the disk is
      flushed after a disk resize. There are two resize cases, growing and
      shrinking. Given that users can shrink/then grow a disk before
      revalidate_disk() is called, we treat the grow case identically to
      shrinking. We need to flush the buffer cache after an online shrink
      because, as James Bottomley puts it,
      
           The two use cases for shrinking I can see are
      
           1. planned: the fs is already shrunk to within the new boundaries
              and all data is relocated, so invalidate is fine (any dirty
              buffers that might exist in the shrunk region are there only
              because they were relocated but not yet written to their
              original location).
           2. unplanned:  In this case, the fs is probably toast, so whether
              we invalidate or not isn't going to make a whole lot of
              difference; it's still going to try to read or write from
              sectors beyond the new size and get I/O errors.
      
      Immediately invalidating shrunk disks will cause errors for outstanding
      I/Os for reads/write beyond the new end of the disk to be generated
      earlier then if we waited for the normal buffer cache operation. It also
      removes a potential security hole where we might keep old data around
      from beyond the end of the shrunk disk if the disk was not invalidated.
      Signed-off-by: default avatarAndrew Patterson <andrew.patterson@hp.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      608aeef1
    • Andrew Patterson's avatar
      Added flush_disk to factor out common buffer cache flushing code. · 56ade44b
      Andrew Patterson authored
      
      We need to be able to flush the buffer cache for for more than
      just when a disk is changed, so we factor out common cache flush code
      in check_disk_change() to an internal flush_disk() routine.  This
      routine will then be used for both disk changes and disk resizes (in a
      later patch).
      
      Include the disk name in the text indicating that there are busy
      inodes on the device and increase the KERN severity of the message.
      Signed-off-by: default avatarAndrew Patterson <andrew.patterson@hp.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      56ade44b
    • Andrew Patterson's avatar
      Adjust block device size after an online resize of a disk. · c3279d14
      Andrew Patterson authored
      
      The revalidate_disk routine now checks if a disk has been resized by
      comparing the gendisk capacity to the bdev inode size.  If they are
      different (usually because the disk has been resized underneath the kernel)
      the bdev inode size is adjusted to match the capacity.
      Signed-off-by: default avatarAndrew Patterson <andrew.patterson@hp.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      c3279d14
    • Andrew Patterson's avatar
      Wrapper for lower-level revalidate_disk routines. · 0c002c2f
      Andrew Patterson authored
      
      This is a wrapper for the lower-level revalidate_disk call-backs such
      as sd_revalidate_disk(). It allows us to perform pre and post
      operations when calling them.
      
      We will use this wrapper in a later patch to adjust block device sizes
      after an online resize (a _post_ operation).
      Signed-off-by: default avatarAndrew Patterson <andrew.patterson@hp.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      0c002c2f
    • Tejun Heo's avatar
      block: always set bdev->bd_part · 0762b8bd
      Tejun Heo authored
      
      Till now, bdev->bd_part is set only if the bdev was for parts other
      than part0.  This patch makes bdev->bd_part always set so that code
      paths don't have to differenciate common handling.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      0762b8bd
    • Tejun Heo's avatar
      block: move holder_dir from disk to part0 · 4c46501d
      Tejun Heo authored
      
      Move disk->holder_dir to part0->holder_dir.  Kill now mostly
      superflous bdev_get_holder().
      
      While at it, kill superflous kobject_get/put() around holder_dir,
      slave_dir and cmd_filter creation and collapse
      disk_sysfs_add_subdirs() into register_disk().  These serve no purpose
      but obfuscating the code.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      4c46501d
    • Tejun Heo's avatar
      block: introduce partition 0 · b5d0b9df
      Tejun Heo authored
      
      genhd and partition code handled disk and partitions separately.  All
      information about the whole disk was in struct genhd and partitions in
      struct hd_struct.  However, the whole disk (part0) and other
      partitions have a lot in common and the data structures end up having
      good number of common fields and thus separate code paths doing the
      same thing.  Also, the partition array was indexed by partno - 1 which
      gets pretty confusing at times.
      
      This patch introduces partition 0 and makes the partition array
      indexed by partno.  Following patches will unify the handling of disk
      and parts piece-by-piece.
      
      This patch also implements disk_partitionable() which tests whether a
      disk is partitionable.  With coming dynamic partition array change,
      the most common usage of disk_max_parts() will be testing whether a
      disk is partitionable and the number of max partitions will become
      much less important.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      b5d0b9df
    • Tejun Heo's avatar
      block: implement and use {disk|part}_to_dev() · ed9e1982
      Tejun Heo authored
      
      Implement {disk|part}_to_dev() and use them to access generic device
      instead of directly dereferencing {disk|part}->dev.  To make sure no
      user is left behind, rename generic devices fields to __dev.
      
      This is in preparation of unifying partition 0 handling with other
      partitions.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      ed9e1982
    • Tejun Heo's avatar
      block: fix disk->part[] dereferencing race · e71bf0d0
      Tejun Heo authored
      
      disk->part[] is protected by its matching bdev's lock.  However,
      non-critical accesses like collecting stats and printing out sysfs and
      proc information used to be performed without any locking.  As
      partitions can come and go dynamically, partitions can go away
      underneath those non-critical accesses.  As some of those accesses are
      writes, this theoretically can lead to silent corruption.
      
      This patch fixes the race by using RCU for the partition array and dev
      reference counter to hold partitions.
      
      * Rename disk->part[] to disk->__part[] to make sure no one outside
        genhd layer proper accesses it directly.
      
      * Use RCU for disk->__part[] dereferencing.
      
      * Implement disk_{get|put}_part() which can be used to get and put
        partitions from gendisk respectively.
      
      * Iterators are implemented to help iterate through all partitions
        safely.
      
      * Functions which require RCU readlock are marked with _rcu suffix.
      
      * Use disk_put_part() in __blkdev_put() instead of directly putting
        the contained kobject.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      e71bf0d0
    • Tejun Heo's avatar
      block: don't depend on consecutive minor space · f331c029
      Tejun Heo authored
      
      * Implement disk_devt() and part_devt() and use them to directly
        access devt instead of computing it from ->major and ->first_minor.
      
        Note that all references to ->major and ->first_minor outside of
        block layer is used to determine devt of the disk (the part0) and as
        ->major and ->first_minor will continue to represent devt for the
        disk, converting these users aren't strictly necessary.  However,
        convert them for consistency.
      
      * Implement disk_max_parts() to avoid directly deferencing
        genhd->minors.
      
      * Update bdget_disk() such that it doesn't assume consecutive minor
        space.
      
      * Move devt computation from register_disk() to add_disk() and make it
        the only one (all other usages use the initially determined value).
      
      These changes clean up the code and will help disk->part dereference
      fix and extended block device numbers.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      f331c029
    • Tejun Heo's avatar
      block: make variable and argument names more consistent · cf771cb5
      Tejun Heo authored
      
      In hd_struct, @partno is used to denote partition number and a number
      of other places use @part to denote hd_struct.  Functions use @part
      and @index instead.  This causes confusion and makes it difficult to
      use consistent variable names for hd_struct.  Always use @partno if a
      variable represents partition number.
      
      Also, print out functions use @f or @part for seq_file argument.  Use
      @seqf uniformly instead.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      cf771cb5
  19. 01 Aug, 2008 2 commits