1. 31 Mar, 2009 1 commit
  2. 28 Mar, 2009 1 commit
  3. 24 Feb, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Automatically allocate delay allocated blocks on close · 7d8f9f7d
      Theodore Ts'o authored
      
      When closing a file that had been previously truncated, force any
      delay allocated blocks that to be allocated so that if the filesystem
      is mounted with data=ordered, the data blocks will be pushed out to
      disk along with the journal commit.  Many application programs expect
      this, so we do this to avoid zero length files if the system crashes
      unexpectedly.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      7d8f9f7d
  4. 26 Feb, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl · ccd2506b
      Theodore Ts'o authored
      
      Add an ioctl which forces all of the delay allocated blocks to be
      allocated.  This also provides a function ext4_alloc_da_blocks() which
      will be used by the following commits to force files to be fully
      allocated to preserve application-expected ext3 behaviour.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ccd2506b
  5. 23 Feb, 2009 3 commits
  6. 27 Mar, 2009 1 commit
  7. 12 Mar, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: New inode/block allocation algorithms for flex_bg filesystems · a4912123
      Theodore Ts'o authored
      
      The find_group_flex() inode allocator is now only used if the
      filesystem is mounted using the "oldalloc" mount option.  It is
      replaced with the original Orlov allocator that has been updated for
      flex_bg filesystems (it should behave the same way if flex_bg is
      disabled).  The inode allocator now functions by taking into account
      each flex_bg group, instead of each block group, when deciding whether
      or not it's time to allocate a new directory into a fresh flex_bg.
      
      The block allocator has also been changed so that the first block
      group in each flex_bg is preferred for use for storing directory
      blocks.  This keeps directory blocks close together, which is good for
      speeding up e2fsck since large directories are more likely to look
      like this:
      
      debugfs:  stat /home/tytso/Maildir/cur
      Inode: 1844562   Type: directory    Mode:  0700   Flags: 0x81000
      Generation: 1132745781    Version: 0x00000000:0000ad71
      User: 15806   Group: 15806   Size: 1060864
      File ACL: 0    Directory ACL: 0
      Links: 2   Blockcount: 2072
      Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x499c0ff4:164961f4 -- Wed Feb 18 08:41:08 2009
       atime: 0x499c0ff4:00000000 -- Wed Feb 18 08:41:08 2009
       mtime: 0x49957f51:00000000 -- Fri Feb 13 09:10:25 2009
      crtime: 0x499c0f57:00d51440 -- Wed Feb 18 08:38:31 2009
      Size of extra inode fields: 28
      BLOCKS:
      (0):7348651, (1-258):7348654-7348911
      TOTAL: 259
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a4912123
  8. 26 Mar, 2009 2 commits
  9. 26 Feb, 2009 1 commit
  10. 23 Feb, 2009 1 commit
  11. 14 Feb, 2009 1 commit
  12. 10 Feb, 2009 1 commit
    • Jan Kara's avatar
      jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() · 7f5aa215
      Jan Kara authored
      
      If we race with commit code setting i_transaction to NULL, we could
      possibly dereference it.  Proper locking requires the journal pointer
      (to access journal->j_list_lock), which we don't have.  So we have to
      change the prototype of the function so that filesystem passes us the
      journal pointer.  Also add a more detailed comment about why the
      function jbd2_journal_begin_ordered_truncate() does what it does and
      how it should be used.
      
      Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
      suspitious code.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: default avatarJoel Becker <joel.becker@oracle.com>
      CC: linux-ext4@vger.kernel.org
      CC: ocfs2-devel@oss.oracle.com
      CC: mfasheh@suse.de
      CC: Dan Carpenter <error27@gmail.com>
      7f5aa215
  13. 30 Jan, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Remove bogus BUG() check in ext4_bmap() · b9ec63f7
      Theodore Ts'o authored
      The code to support journal-less ext4 operation added a BUG to
      ext4_bmap() which fired if there was no journal and the
      EXT4_STATE_JDATA bit was set in the i_state field.  This caused
      running the filefrag program (which uses the FIMBAP ioctl) to trigger
      a BUG().
      
      The EXT4_STATE_JDATA bit is only used for ext4_bmap(), and it's
      harmless for the bit to be set.  We could add a check in
      __ext4_journalled_writepage() and ext4_journalled_write_end() to only
      set the EXT4_STATE_JDATA bit if the journal is present, but that adds
      an extra test and jump instruction.  It's easier to simply remove the
      BUG check.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12568
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      b9ec63f7
  14. 20 Jan, 2009 1 commit
  15. 17 Jan, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: only use i_size_high for regular files · 06a279d6
      Theodore Ts'o authored
      Directories are not allowed to be bigger than 2GB, so don't use
      i_size_high for anything other than regular files.  E2fsck should
      complain about these inodes, but the simplest thing to do for the
      kernel is to only use i_size_high for regular files.
      
      This prevents an intentially corrupted filesystem from causing the
      kernel to burn a huge amount of CPU and issuing error messages such
      as:
      
      EXT4-fs warning (device loop0): ext4_block_to_path: block 135090028 > max
      
      Thanks to David Maciejak from Fortinet's FortiGuard Global Security
      Research Team for reporting this issue.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12375
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      06a279d6
  16. 06 Jan, 2009 1 commit
  17. 04 Jan, 2009 2 commits
    • Nick Piggin's avatar
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin authored
      
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
    • Theodore Ts'o's avatar
      ext4: Add markers for better debuggability · ba80b101
      Theodore Ts'o authored
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ba80b101
  18. 06 Jan, 2009 1 commit
  19. 31 Dec, 2008 1 commit
  20. 07 Nov, 2008 1 commit
  21. 22 Nov, 2008 1 commit
  22. 06 Nov, 2008 1 commit
    • Theodore Ts'o's avatar
      ext4: calculate journal credits correctly · ac51d837
      Theodore Ts'o authored
      This fixes a 2.6.27 regression which was introduced in commit a02908f1.
      
      We weren't passing the chunk parameter down to the two subections,
      ext4_indirect_trans_blocks() and ext4_ext_index_trans_blocks(), with
      the result that massively overestimate the amount of credits needed by
      ext4_da_writepages, especially in the non-extents case.  This causes
      failures especially on /boot partitions, which tend to be small and
      non-extent using since GRUB doesn't handle extents.
      
      This patch fixes the bug reported by Joseph Fannin at:
      http://bugzilla.kernel.org/show_bug.cgi?id=11964
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ac51d837
  23. 05 Nov, 2008 1 commit
    • Theodore Ts'o's avatar
      ext4: Change unsigned long to unsigned int · 498e5f24
      Theodore Ts'o authored
      
      Convert the unsigned longs that are most responsible for bloating the
      stack usage on 64-bit systems.
      
      Nearly all places in the ext3/4 code which uses "unsigned long" is
      probably a bug, since on 32-bit systems a ulong a 32-bits, which means
      we are wasting stack space on 64-bit systems.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      498e5f24
  24. 07 Jan, 2009 1 commit
    • Frank Mayhar's avatar
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar authored
      
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: default avatarFrank Mayhar <fmayhar@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  25. 06 Jan, 2009 1 commit
  26. 05 Nov, 2008 1 commit
    • Theodore Ts'o's avatar
      ext4: tone down ext4_da_writepages warnings · 2a21e37e
      Theodore Ts'o authored
      
      If the filesystem has errors, ext4_da_writepages() will return a *lot*
      of errors, including lots and lots of stack dumps.  While it's true
      that we are dropping user data on the floor, which is unfortunate, the
      stack dumps aren't helpful, and they tend to obscure the true original
      root cause of the problem.  So in the case where the filesystem has
      aborted, return an EROFS right away.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      2a21e37e
  27. 02 Jan, 2009 1 commit
  28. 29 Oct, 2008 1 commit
  29. 17 Oct, 2008 1 commit
  30. 16 Oct, 2008 1 commit
  31. 14 Oct, 2008 1 commit
  32. 11 Oct, 2008 1 commit
  33. 07 Oct, 2008 1 commit
  34. 10 Oct, 2008 2 commits
  35. 13 Sep, 2008 1 commit
    • Aneesh Kumar K.V's avatar
      ext4: Properly update i_disksize. · cf17fea6
      Aneesh Kumar K.V authored
      
      With delayed allocation we use i_data_sem to update i_disksize.  We need
      to update i_disksize only if the new size specified is greater than the
      current value and we need to make sure we don't race with other
      i_disksize update.  With delayed allocation we will switch to the
      write_begin function for non-delayed allocation if we are low on free
      blocks.  This means the write_begin function for non-delayed allocation
      also needs to use the same locking.
      
      We also need to check and update i_disksize even if the new size is less
      that inode.i_size because of delayed allocation.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      cf17fea6