Commits · b9473439d3e84d9fc1a0a83faca69cc1b7566341 · Bricked / flo

24 Mar, 2009 1 commit

Btrfs: leave btree locks spinning more often · b9473439

Chris Mason authored 16 years ago

btrfs_mark_buffer dirty would set dirty bits in the extent_io tree
for the buffers it was dirtying. This may require a kmalloc and it
was not atomic. So, anyone who called btrfs_mark_buffer_dirty had to
set any btree locks they were holding to blocking first.

This commit changes dirty tracking for extent buffers to just use a flag
in the extent buffer. Now that we have one and only one extent buffer
per page, this can be safely done without losing dirty bits along the way.

This also introduces a path->leave_spinning flag that callers of
btrfs_search_slot can use to indicate they will properly deal with a
path returned where all the locks are spinning instead of blocking.

Many of the btree search callers now expect spinning paths,
resulting in better btree concurrency overall.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b9473439

12 Feb, 2009 1 commit

Btrfs: make a lockdep class for the extent buffer locks · 4008c04a

Chris Mason authored 16 years ago

Btrfs is currently using spin_lock_nested with a nested value based
on the tree depth of the block. But, this doesn't quite work because
the max tree depth is bigger than what spin_lock_nested can deal with,
and because locks are sometimes taken before the level field is filled in.

The solution here is to use lockdep_set_class_and_name instead, and to
set the class before unlocking the pages when the block is read from the
disk and just after init of a freshly allocated tree block.

btrfs_clear_path_blocking is also changed to take the locks in the proper
order, and it also makes sure all the locks currently held are properly
set to blocking before it tries to retake the spinlocks. Otherwise, lockdep
gets upset about bad lock orderin.

The lockdep magic cam from Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4008c04a

21 Jan, 2009 1 commit

Btrfs: fix tree logs parallel sync · 7237f183

Yan Zheng authored 16 years ago


To improve performance, btrfs_sync_log merges tree log sync
requests. But it wrongly merges sync requests for different
tree logs. If multiple tree logs are synced at the same time,
only one of them actually gets synced.

This patch has following changes to fix the bug:

Move most tree log related fields in btrfs_fs_info to
btrfs_root. This allows merging sync requests separately
for each tree log.

Don't insert root item into the log root tree immediately
after log tree is allocated. Root item for log tree is
inserted when log tree get synced for the first time. This
allows syncing the log root tree without first syncing all
log trees.

At tree-log sync, btrfs_sync_log first sync the log tree;
then updates corresponding root item in the log root tree;
sync the log root tree; then update the super block.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>

7237f183

08 Dec, 2008 1 commit

Btrfs: superblock duplication · a512bbf8

Yan Zheng authored 16 years ago


This patch implements superblock duplication. Superblocks
are stored at offset 16K, 64M and 256G on every devices.
Spaces used by superblocks are preserved by the allocator,
which uses a reverse mapping function to find the logical
addresses that correspond to superblocks. Thank you,
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>

a512bbf8

12 Nov, 2008 1 commit

Btrfs: mount ro and remount support · c146afad

Yan Zheng authored 16 years ago


This patch adds mount ro and remount support. The main
changes in patch are: adding btrfs_remount and related
helper function; splitting the transaction related code
out of close_ctree into btrfs_commit_super; updating
allocator to properly handle read only block group.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>

c146afad

07 Nov, 2008 1 commit

Btrfs: Add ordered async work queues · 4a69a410

Chris Mason authored 16 years ago


Btrfs uses kernel threads to create async work queues for cpu intensive
operations such as checksumming and decompression.  These work well,
but they make it difficult to keep IO order intact.

A single writepages call from pdflush or fsync will turn into a number
of bios, and each bio is checksummed in parallel.  Once the checksum is
computed, the bio is sent down to the disk, and since we don't control
the order in which the parallel operations happen, they might go down to
the disk in almost any order.

The code deals with this somewhat by having deep work queues for a single
kernel thread, making it very likely that a single thread will process all
the bios for a single inode.

This patch introduces an explicitly ordered work queue.  As work structs
are placed into the queue they are put onto the tail of a list.  They have
three callbacks:

->func (cpu intensive processing here)
->ordered_func (order sensitive processing here)
->ordered_free (free the work struct, all processing is done)

The work struct has three callbacks.  The func callback does the cpu intensive
work, and when it completes the work struct is marked as done.

Every time a work struct completes, the list is checked to see if the head
is marked as done.  If so the ordered_func callback is used to do the
order sensitive processing and the ordered_free callback is used to do
any cleanup.  Then we loop back and check the head of the list again.

This patch also changes the checksumming code to use the ordered workqueues.
One a 4 drive array, it increases streaming writes from 280MB/s to 350MB/s.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4a69a410

29 Oct, 2008 1 commit

Btrfs: Add zlib compression support · c8b97818

Chris Mason authored 16 years ago

This is a large change for adding compression on reading and writing,
both for inline and regular extents.  It does some fairly large
surgery to the writeback paths.

Compression is off by default and enabled by mount -o compress.  Even
when the -o compress mount option is not used, it is possible to read
compressed extents off the disk.

If compression for a given set of pages fails to make them smaller, the
file is flagged to avoid future compression attempts later.

* While finding delalloc extents, the pages are locked before being sent down
to the delalloc handler.  This allows the delalloc handler to do complex things
such as cleaning the pages, marking them writeback and starting IO on their
behalf.

* Inline extents are inserted at delalloc time now.  This allows us to compress
the data before inserting the inline extent, and it allows us to insert
an inline extent that spans multiple pages.

* All of the in-memory extent representations (extent_map.c, ordered-d...

c8b97818

25 Sep, 2008 23 commits

Btrfs: Tree logging fixes · 4bef0848

Chris Mason authored 16 years ago


* Pin down data blocks to prevent them from being reallocated like so:

trans 1: allocate file extent
trans 2: free file extent
trans 3: free file extent during old snapshot deletion
trans 3: allocate file extent to new file
trans 3: fsync new file

Before the tree logging code, this was legal because the fsync
would commit the transation that did the final data extent free
and the transaction that allocated the extent to the new file
at the same time.

With the tree logging code, the tree log subtransaction can commit
before the transaction that freed the extent.  If we crash,
we're left with two different files using the extent.

* Don't wait in start_transaction if log replay is going on.  This
avoids deadlocks from iput while we're cleaning up link counts in the
replay code.

* Don't deadlock in replay_one_name by trying to read an inode off
the disk while holding paths for the directory

* Hold the buffer lock while we mark a buffer as written.  This
closes a race where someone is changing a buffer while we write it.
They are supposed to mark it dirty again after they change it, but
this violates the cow rules.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4bef0848

Btrfs: Add a write ahead tree log to optimize synchronous operations · e02119d5

Chris Mason authored 16 years ago

File syncs and directory syncs are optimized by copying their
items into a special (copy-on-write) log tree. There is one log tree per
subvolume and the btrfs super block points to a tree of log tree roots.

After a crash, items are copied out of the log tree and back into the
subvolume. See tree-log.c for all the details.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e02119d5

Btrfs: Wait for async bio submissions to make some progress at queue time · b64a2851

Chris Mason authored 16 years ago


Before, the btrfs bdi congestion function was used to test for too many
async bios.  This keeps that check to throttle pdflush, but also
adds a check while queuing bios.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b64a2851

Btrfs: Transaction commit: don't use filemap_fdatawait · 777e6bd7

Chris Mason authored 16 years ago


After writing out all the remaining btree blocks in the transaction,
the commit code would use filemap_fdatawait to make sure it was all
on disk.  This means it would wait for blocks written by other procs
as well.

The new code walks the list of blocks for this transaction again
and waits only for those required by this transaction.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

777e6bd7

Btrfs: Online btree defragmentation fixes · 3f157a2f

Chris Mason authored 16 years ago

The btree defragger wasn't making forward progress because the new key wasn't
being saved by the btrfs_search_forward function.

This also disables the automatic btree defrag, it wasn't scaling well to
huge filesystems. The auto-defrag needs to be done differently.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3f157a2f

Add btrfs_end_transaction_throttle to force writers to wait for pending commits · 89ce8a63

Chris Mason authored 16 years ago


The existing throttle mechanism was often not sufficient to prevent
new writers from coming in and making a given transaction run forever.
This adds an explicit wait at the end of most operations so they will
allow the current transaction to close.

There is no wait inside file_write, inode updates, or cow filling, all which
have different deadlock possibilities.

This is a temporary measure until better asynchronous commit support is
added.  This code leads to stalls as it waits for data=ordered
writeback, and it really needs to be fixed.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

89ce8a63

Btrfs: Add mount -o degraded to allow mounts to continue with missing devices · dfe25020
Chris Mason authored 16 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
dfe25020

Btrfs: Handle write errors on raid1 and raid10 · 1259ab75

Chris Mason authored 16 years ago


When duplicate copies exist, writes are allowed to fail to one of those
copies.  This changeset includes a few changes that allow the FS to
continue even when some IOs fail.

It also adds verification of the parent generation number for btree blocks.
This generation is stored in the pointer to a block, and it ensures
that missed writes to are detected.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1259ab75

Btrfs: Pass down the expected generation number when reading tree blocks · ca7a79ad
Chris Mason authored 16 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
ca7a79ad

Btrfs: Create a work queue for bio writes · 44b8bd7e

Chris Mason authored 16 years ago


This allows checksumming to happen in parallel among many cpus, and
keeps us from bogging down pdflush with the checksumming code.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

44b8bd7e

Btrfs: Write out all super blocks on commit, and bring back proper barrier support · f2984462
Chris Mason authored 16 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
f2984462

Btrfs: Handle data block end_io through the async work queue · 22c59948

Chris Mason authored 16 years ago


Before it was done by the bio end_io routine, the work queue code is able
to scale much better with faster IO subsystems.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

22c59948

Btrfs: Verify checksums on tree blocks found without read_tree_block · 0999df54

Chris Mason authored 17 years ago


Checksums were only verified by btrfs_read_tree_block, which meant the
functions to probe the page cache for blocks were not validating checksums.
Normally this is fine because the buffers will only be in cache if they
have already been validated.

But, there is a window while the buffer is being read from disk where
it could be up to date in the cache but not yet verified.  This patch
makes sure all buffers go through checksum verification before they
are used.

This is safer, and it prevents modification of buffers before they go
through the csum code.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

0999df54

Btrfs: Add support for device scanning and detection ioctls · 8a4b83cc
Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
8a4b83cc
Btrfs: Add support for multiple devices per filesystem · 0b86a832
Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
0b86a832
Btrfs: Add some simple throttling to wait for data=ordered and snapshot deletion · e2008b61
Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
e2008b61

Btrfs: Add data=ordered support · dc17ff8f

Chris Mason authored 17 years ago

This forces file data extents down the disk along with the metadata that
references them. The current implementation is fairly simple, and just
writes out all of the dirty pages in an inode before the commit.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

dc17ff8f

Btrfs: Support for online FS resize (grow and shrink) · edbd8d4e
Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
edbd8d4e
Btrfs: Add back file data checksumming · ff79f819
Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
ff79f819
Btrfs: Add back the online defragging code · 6b80053d
Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
6b80053d
Btrfs: Allow tree blocks larger than the page size · db94535d
Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
db94535d
Btrfs: Create extent_buffer interface for large blocksizes · 5f39d397
Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
5f39d397

Btrfs: Use balance_dirty_pages_nr on btree blocks · d3c2fdcf

Chris Mason authored 17 years ago


btrfs_btree_balance_dirty is changed to pass the number of pages dirtied
for more accurate dirty throttling.  This lets the VM make better decisions
about when to force some writeback.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d3c2fdcf

10 Sep, 2007 1 commit
- Add support for defragging files via btrfsctl -d. Avoid OOM on extent tree · 86479a04
  Chris Mason authored 17 years ago
```
defrag.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
  86479a04
29 Aug, 2007 1 commit
- Btrfs: Add per-root block accounting and sysfs entries · 58176a96
  Josef Bacik authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
  58176a96
27 Aug, 2007 1 commit
- Btrfs: Extent based page cache code. This uses an rbtree of extents and tests · a52d9a80
  Chris Mason authored 17 years ago
```
instead of buffer heads.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
  a52d9a80
10 Aug, 2007 1 commit

Btrfs: Add BH_Defrag to mark buffers that are in need of defragging · f2183bde

Chris Mason authored 17 years ago


This allows the tree walking code to defrag only the newly allocated
buffers, it seems to be a good balance between perfect defragging and the
performance hit of repeatedly reallocating blocks.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

f2183bde

28 Jun, 2007 1 commit
- Btrfs: crash recovery fixes · ccd467d6
  Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
  ccd467d6
22 Jun, 2007 1 commit
- Btrfs: Add the ability to find and remove dead roots after a crash. · 5eda7b5e
  Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
  5eda7b5e
12 Jun, 2007 1 commit
- Btrfs: add GPLv2 · 6cbd5570
  Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
  6cbd5570
02 May, 2007 1 commit
- Btrfs: fix page cache memory leak · 35b7e476
  Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
  35b7e476
01 May, 2007 1 commit
- Btrfs: directory readahead · 090d1875
  Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
  090d1875
12 Apr, 2007 1 commit
- Btrfs: add a device id to device items · b4100d64
  Chris Mason authored 17 years ago
```
Signed-off-by: Chris Mason <chris.mason@oracle.com>
```
  b4100d64