- 09 Jan, 2012 8 commits
-
-
Al Viro authored
A new helper: btrfs_alloc_root(fs_info); allocates btrfs_root and sets ->fs_info. All places allocating the suckers converted to it. At that point we *never* reassign ->fs_info of btrfs_root; it's set before anyone sees the address of newly allocated struct btrfs_root and never assigned anywhere else. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
move assignments to ->fs_info in open_ctree() up, to the place just after the original allocations. Assignment for tree_root becomes a no-op - we'd obtained fs_info from tree_root->fs_info in the first place. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
lift assignment to callers of find_and_setup_root() Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
take assignment of ->fs_info to callers of __setup_root() Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
pathname resolution under a global mutex, taken on some paths in ->mount() is a Bad Idea(tm) - think what happens if said pathname resolution triggers automount of some btrfs instance and walks into attempt to grab the same mutex. Deadlock - we are waiting for daemon to finish walking the path, daemon is waiting for us to release the mutex... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Do *NOT* skip doomed superblocks in btrfs_test_super(); we want sget() to wait for their shutdown to complete. Since we don't mutilate ->s_fs_info in ->put_super() anymore (or free what it used to point to until the superblock is past being findable by sget()), we can just DTRT there and report a match. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
... and move free_fs_info() to that, out of ->put_super(). Do NOT set ->s_fs_info to NULL in the latter; we need it for sget() to be able to see and wait for fs in the middle of umount if we get a mount/umount race. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
We need fs_info and root to live until the moment when the victim superblock leaves the list, so we need to postpone free_fs_info() until after ->put_super(). The call is buried in close_ctree(), though, so we need to lift it into the callers (including btrfs_put_super()) first. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 07 Jan, 2012 1 commit
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 04 Jan, 2012 8 commits
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
new helper (wrapper around mnt_drop_write()) to be used in pair with mnt_want_write_file(). Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
it's not needed anymore; we used to, back when we had to do mount_subtree() by hand, complete with put_mnt_ns() in it. No more... Apparmor didn't need it since the __d_path() fix. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
it's both faster (in case when file has been opened for write) and cleaner. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 23 Dec, 2011 2 commits
-
-
Al Viro authored
This closes races where btrfs is calling d_instantiate too soon during inode creation. All of the callers of btrfs_add_nondir are updated to instantiate after the inode is fully setup in memory. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
Dan Carpenter noticed that we were doing a double unlock on the worker lock, and sometimes picking a worker thread without the lock held. This fixes both errors. Signed-off-by:
Chris Mason <chris.mason@oracle.com> Reported-by:
Dan Carpenter <dan.carpenter@oracle.com>
-
- 16 Dec, 2011 1 commit
-
-
Wu Fengguang authored
Tests show that the original large intervals can easily make the dirty limit exceeded on 100 concurrent dd's. So adapt to as large as the next check point selected by the dirty throttling algorithm. Signed-off-by:
Wu Fengguang <fengguang.wu@intel.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
- 15 Dec, 2011 16 commits
-
-
Chris Mason authored
The btrfs io submission threads can build up massive plug lists. This keeps things more reasonable so we don't hand over huge dumps of IO at once. Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
btrfs_update_inode is sometimes called with a null reservation. Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Josef Bacik authored
A user reported a problem booting into a new kernel with the old format inodes. He was panicing in cow_file_range while writing out the inode cache. This is because if the block group is not cached we'll just skip writing out the cache, however if it gets dirtied again in the same transaction and it finished caching we'd go ahead and write it out, but since we set cache_generation to the transid we think we've already truncated it and will just carry on, running into cow_file_range and blowing up. We need to make sure we only set cache_generation if we've done the truncate. The user tested this patch and verified that the panic no longer occured. Thanks, Reported-and-Tested-by:
Klaus Bitto <klaus.bitto@gmail.com> Signed-off-by:
Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in a loop. This is because we will add an orphan item, do the truncate, the truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're left with an orphan item still in the fs. Then we come back later to do another truncate and it blows up because we already have an orphan item. This is ok so just fix the BUG_ON() to only BUG() if ret is not EEXIST. Thanks, Signed-off-by:
Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We were occasionaly leaking space when running xfstest 269. This is because if we failed to start the transaction in the truncate loop we'd just goto out, but we need to break so that the inode is removed from the orphan list and the space is properly freed. Thanks, Signed-off-by:
Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Running xfstests 269 with some tracing my scripts kept spitting out errors about releasing bytes that we didn't actually have reserved. This took me down a huge rabbit hole and it turns out the way we deal with reserved_extents is wrong, we need to only be setting it if the reservation succeeds, otherwise the free() method will come in and unreserve space that isn't actually reserved yet, which can lead to other warnings and such. The math was all working out right in the end, but it caused all sorts of other issues in addition to making my scripts yell and scream and generally make it impossible for me to track down the original issue I was looking for. The other problem is with our error handling in the reservation code. There are two cases that we need to deal with 1) We raced with free. In this case free won't free anything because csum_bytes is modified before we dro the lock in our reservation path, so free rightly doesn't release any space because the reservation code may be depending on that reservation. However if we fail, we need the reservation side to do the free at that point since that space is no longer in use. So as it stands the code was doing this fine and it worked out, except in case #2 2) We don't race with free. Nobody comes in and changes anything, and our reservation fails. In this case we didn't reserve anything anyway and we just need to clean up csum_bytes but not free anything. So we keep track of csum_bytes before we drop the lock and if it hasn't changed we know we can just decrement csum_bytes and carry on. Because of the case where we can race with free()'s since we have to drop our spin_lock to do the reservation, I'm going to serialize all reservations with the i_mutex. We already get this for free in the heavy use paths, truncate and file write all hold the i_mutex, just needed to add it to page_mkwrite and various ioctl/balance things. With this patch my space leak scripts no longer scream bloody murder. Thanks, Signed-off-by:
Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Now that we're properly keeping track of delayed inode space we've been getting a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is because a bunch of people call mark_inode_dirty, which is void so we can't return ENOSPC. This needs to be fixed in a few areas 1) file_update_time - this updates the mtime and such when writing to a file, which will call mark_inode_dirty. So copy file_update_time into btrfs so we can call btrfs_dirty_inode directly and return an error if we get one appropriately. 2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't setting ->setattr for symlinks, even though we should have been. This catches one of the cases where we were getting errors in mark_inode_dirty. 3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly instead of mark_inode_dirty. This lets us return errors properly for truncate and chown/anything related to setattr. 4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and print an error if we have one. The only remaining user we can't control for this is touch_atime(), but we don't really want to keep people from walking down the tree if we don't have space to save the atime update, so just complain but don't worry about it. With this patch xfstests 83 complains a handful of times instead of hundreds of times. Thanks, Signed-off-by:
Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Al pointed out we have some random problems with the way we account for num_workers_starting in the async thread stuff. First of all we need to make sure to decrement num_workers_starting if we fail to start the worker, so make __btrfs_start_workers do this. Also fix __btrfs_start_workers so that it doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we failed to create a worker. Also check_pending_worker_creates needs to call __btrfs_start_work in it's work function since it already increments num_workers_starting. People only start one worker at a time, so get rid of the num_workers argument everywhere, and make btrfs_queue_worker a void since it will always succeed. Thanks, Signed-off-by:
Josef Bacik <josef@redhat.com>
-
Casey Schaufler authored
The Smack LSM hook for security_d_instantiate checks the inode's i_op->getxattr value to determine if the containing filesystem supports extended attributes. The BTRFS filesystem sets the inode's i_op value only after it has instantiated the inode. This results in Smack incorrectly giving new BTRFS inodes attributes from the filesystem defaults on the assumption that values can't be stored on the filesystem. This patch moves the assignment of inode operation vectors ahead of the calls to d_instantiate, letting Smack know that the filesystem supports extended attributes. There should be no impact on the performance or behavior of BTRFS. Signed-off-by:
Casey Schaufler <casey@schaufler-ca.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Chris Mason authored
If we have a constant stream of end_io completions or crc work, we can hit softlockup messages from the async helper threads. This adds a cond_resched() into the loop to avoid them. Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Li Zefan authored
To reproduce the bug: # touch /mnt/tmp # stat /mnt/tmp | grep Change Change: 2011-12-09 09:32:23.412105981 +0800 # chattr +i /mnt/tmp # stat /mnt/tmp | grep Change Change: 2011-12-09 09:32:43.198105295 +0800 # umount /mnt # mount /dev/loop1 /mnt # stat /mnt/tmp | grep Change Change: 2011-12-09 09:32:23.412105981 +0800 We should update ctime of in-memory inode before calling btrfs_update_inode(). Signed-off-by:
Li Zefan <lizf@cn.fujitsu.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Arne Jansen authored
Since we have the free space caches, btrfs_orphan_cleanup also runs for the tree_root. Unfortunately this also cleans up the orphans used to mark subvol deletions in progress. Currently if a subvol deletion gets interrupted twice by umount/mount, the deletion will not be continued and the space permanently lost, though it would be possible to write a tool to recover those lost subvol deletions. This patch checks if the orphan belongs to a subvol (dead root) and skips the deletion. Signed-off-by:
Arne Jansen <sensille@gmx.net> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Miao Xie authored
When we use raid0 as the data profile, df command may show us a very inaccurate value of the available space, which may be much less than the real one. It may make the users puzzled. Fix it by changing the calculation of the available space, and making it be more similar to a fake chunk allocation. Signed-off-by:
Miao Xie <miaox@cn.fujitsu.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Miao Xie authored
Btrfsck report errors after the 83th case of xfstests was run, The error number is 400, it means the used disk space of the file is wrong. The reason of this bug is that: The file truncation may fail when the space of the file system is not enough, and leave some file extents, whose offset are beyond the end of the files. When we want to expand those files, we will drop those file extents, and put in dummy file extents, and then we should update the i-node. But btrfs forgets to do it. This patch adds the forgotten i-node update. Signed-off-by:
Miao Xie <miaox@cn.fujitsu.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Miao Xie authored
Btrfsck report error 100 after the 83th case of xfstests was run, it means the i_size of the file is wrong. The reason of this bug is that: Btrfs increased i_size of the file at the beginning, but it failed to expand the file, and failed to update the i_size to the old size because there is no enough space in the file system, so we found a wrong i_size. This patch fixes this bug by updating the i_size just when we pass the file expanding and get enough space to update i-node. Signed-off-by:
Miao Xie <miaox@cn.fujitsu.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Justin P. Mattock authored
The patch below removes an extra semicolon. Signed-off-by:
Justin P. Mattock <justinmattock@gmail.com> CC: Chris Mason <chris.mason@oracle.com> CC: linux-btrfs@vger.kernel.org Signed-off-by:
Jiri Kosina <jkosina@suse.cz>
-
- 09 Dec, 2011 1 commit
-
-
Chris Mason authored
btrfs_end_bio checks the number of errors on a bio against the max number of errors allowed before sending any EIOs up to the higher levels. If we got enough copies of the bio done for a given raid level, it is supposed to clear the bio error flag and return success. We have pointers to the original bio sent down by the higher layers and pointers to any cloned bios we made for raid purposes. If the original bio happens to be the one that got an io error, but not the last one to finish, it might not have the BIO_UPTODATE bit set. Then, when the last bio does finish, we'll call bio_end_io on the original bio. It won't have the uptodate bit set and we'll end up sending EIO to the higher layers. We already had a check for this, it just was conditional on getting the IO error on the very last bio. Make the check unconditional so we eat the EIOs properly. Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
- 08 Dec, 2011 3 commits
-
-
Liu Bo authored
Drop spin lock in convert_extent_bit() when memory alloc fails, otherwise, it will be a deadlock. Signed-off-by:
Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Li Zefan authored
If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding a readonly device to a btrfs filesystem, and btrfs will write to that device, emitting kernel errors: [ 3109.833692] lost page write due to I/O error on loop2 [ 3109.833720] lost page write due to I/O error on loop2 ... Signed-off-by:
Li Zefan <lizf@cn.fujitsu.com> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-
Alexandre Oliva authored
When we find an existing cluster, we switch to its block group as the current block group, possibly skipping multiple blocks in the process. Furthermore, under heavy contention, multiple threads may fail to allocate from a cluster and then release just-created clusters just to proceed to create new ones in a different block group. This patch tries to allocate from an existing cluster regardless of its block group, and doesn't switch to that group, instead proceeding to try to allocate a cluster from the group it was iterating before the attempt. Signed-off-by:
Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by:
Chris Mason <chris.mason@oracle.com>
-