1. 23 Jun, 2006 3 commits
    • David Howells's avatar
      [PATCH] add page_mkwrite() vm_operations method · 9637a5ef
      David Howells authored
      
      Add a new VMA operation to notify a filesystem or other driver about the
      MMU generating a fault because userspace attempted to write to a page
      mapped through a read-only PTE.
      
      This facility permits the filesystem or driver to:
      
       (*) Implement storage allocation/reservation on attempted write, and so to
           deal with problems such as ENOSPC more gracefully (perhaps by generating
           SIGBUS).
      
       (*) Delay making the page writable until the contents have been written to a
           backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS.
           It permits the filesystem to have some guarantee about the state of the
           cache.
      
       (*) Account and limit number of dirty pages. This is one piece of the puzzle
           needed to make shared writable mapping work safely in FUSE.
      
      Needed by cachefs (Or is it cachefiles?  Or fscache? <head spins>).
      
      At least four other groups have stated an interest in it or a desire to use
      the functionality it provides: FUSE, OCFS2, NTFS and JFFS2.  Also, things like
      EXT3 really ought to use it to deal with the case of shared-writable mmap
      encountering ENOSPC before we permit the page to be dirtied.
      
      From: Peter Zijlstra <a.p.zijlstra@chello.nl>
      
        get_user_pages(.write=1, .force=1) can generate COW hits on read-only
        shared mappings, this patch traps those as mkpage_write candidates and fails
        to handle them the old way.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Joel Becker <Joel.Becker@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9637a5ef
    • Christoph Lameter's avatar
      [PATCH] Swapless page migration: add R/W migration entries · 0697212a
      Christoph Lameter authored
      
      Implement read/write migration ptes
      
      We take the upper two swapfiles for the two types of migration ptes and define
      a series of macros in swapops.h.
      
      The VM is modified to handle the migration entries.  migration entries can
      only be encountered when the page they are pointing to is locked.  This limits
      the number of places one has to fix.  We also check in copy_pte_range and in
      mprotect_pte_range() for migration ptes.
      
      We check for migration ptes in do_swap_cache and call a function that will
      then wait on the page lock.  This allows us to effectively stop all accesses
      to apge.
      
      Migration entries are created by try_to_unmap if called for migration and
      removed by local functions in migrate.c
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration (I've no NUMA, just
        hacking it up to migrate recklessly while running load), I've hit the
        BUG_ON(!PageLocked(p)) in migration_entry_to_page.
      
        This comes from an orphaned migration entry, unrelated to the current
        correctly locked migration, but hit by remove_anon_migration_ptes as it
        checks an address in each vma of the anon_vma list.
      
        Such an orphan may be left behind if an earlier migration raced with fork:
        copy_one_pte can duplicate a migration entry from parent to child, after
        remove_anon_migration_ptes has checked the child vma, but before it has
        removed it from the parent vma.  (If the process were later to fault on this
        orphaned entry, it would hit the same BUG from migration_entry_wait.)
      
        This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
        not.  There's no such problem with file pages, because vma_prio_tree_add
        adds child vma after parent vma, and the page table locking at each end is
        enough to serialize.  Follow that example with anon_vma: add new vmas to the
        tail instead of the head.
      
        (There's no corresponding problem when inserting migration entries,
        because a missed pte will leave the page count and mapcount high, which is
        allowed for.  And there's no corresponding problem when migrating via swap,
        because a leftover swap entry will be correctly faulted.  But the swapless
        method has no refcounting of its entries.)
      
      From: Ingo Molnar <mingo@elte.hu>
      
        pte_unmap_unlock() takes the pte pointer as an argument.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration, gcc has tried to exec
        a pointer instead of a string: smells like COW mappings are not being
        properly write-protected on fork.
      
        The protection in copy_one_pte looks very convincing, until at last you
        realize that the second arg to make_migration_entry is a boolean "write",
        and SWP_MIGRATION_READ is 30.
      
        Anyway, it's better done like in change_pte_range, using
        is_write_migration_entry and make_migration_entry_read.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Remove unnecessary obfuscation from sys_swapon's range check on swap type,
        which blew up causing memory corruption once swapless migration made
        MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarChristoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      From: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0697212a
    • Christoph Lameter's avatar
      [PATCH] Page Migration: Make do_swap_page redo the fault · 4da5eda0
      Christoph Lameter authored
      
      It is better to redo the complete fault if do_swap_page() finds that the
      page is not in PageSwapCache() because the page migration code may have
      replaced the swap pte already with a pte pointing to valid memory.
      
      do_swap_page() may interpret an invalid swap entry without this patch
      because we do not reload the pte if we are looping back.  The page
      migration code may already have reused the swap entry referenced by our
      local swp_entry.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4da5eda0
  2. 31 Mar, 2006 1 commit
    • OGAWA Hirofumi's avatar
      [PATCH] Don't pass boot parameters to argv_init[] · 9b41046c
      OGAWA Hirofumi authored
      
      The boot cmdline is parsed in parse_early_param() and
      parse_args(,unknown_bootoption).
      
      And __setup() is used in obsolete_checksetup().
      
      	start_kernel()
      		-> parse_args()
      			-> unknown_bootoption()
      				-> obsolete_checksetup()
      
      If __setup()'s callback (->setup_func()) returns 1 in
      obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
      handled.
      
      If ->setup_func() returns 0, obsolete_checksetup() tries other
      ->setup_func().  If all ->setup_func() that matched a parameter returns 0,
      a parameter is seted to argv_init[].
      
      Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
      If the app doesn't ignore those arguments, it will warning and exit.
      
      This patch fixes a wrong usage of it, however fixes obvious one only.
      Signed-off-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9b41046c
  3. 26 Mar, 2006 2 commits
  4. 25 Mar, 2006 1 commit
  5. 22 Mar, 2006 4 commits
    • David Gibson's avatar
      [PATCH] hugepage: Fix hugepage logic in free_pgtables() harder · 4866920b
      David Gibson authored
      
      Turns out the hugepage logic in free_pgtables() was doubly broken.  The
      loop coalescing multiple normal page VMAs into one call to free_pgd_range()
      had an off by one error, which could mean it would coalesce one hugepage
      VMA into the same bundle (checking 'vma' not 'next' in the loop).  I
      transferred this bug into the new is_vm_hugetlb_page() based version.
      Here's the fix.
      
      This one didn't bite on powerpc previously for the same reason the
      is_hugepage_only_range() problem didn't: powerpc's hugetlb_free_pgd_range()
      is identical to free_pgd_range().  It didn't bite on ia64 because the
      hugepage region is distant enough from any other region that the separated
      PMD_SIZE distance test would always prevent coalescing the two together.
      
      No libhugetlbfs testsuite regressions (ppc64, POWER5).
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4866920b
    • David Gibson's avatar
      [PATCH] hugepage: Fix hugepage logic in free_pgtables() · 9da61aef
      David Gibson authored
      
      free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
      of the normal free_pgd_range() on hugepage VMAs.  However, the test it uses
      to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
      range at the start of the vma.  is_hugepage_only_range() will return true
      if the given range has any intersection with a hugepage address region, and
      in this case the given region need not be hugepage aligned.  So, for
      example, this test can return true if called on, say, a 4k VMA immediately
      preceding a (nicely aligned) hugepage VMA.
      
      At present we get away with this because the powerpc version of
      hugetlb_free_pgd_range() is just a call to free_pgd_range().  On ia64 (the
      only other arch with a non-trivial is_hugepage_only_range()) we get away
      with it for a different reason; the hugepage area is not contiguous with
      the rest of the user address space, and VMAs are not permitted in between,
      so the test can't return a false positive there.
      
      Nonetheless this should be fixed.  We do that in the patch below by
      replacing the is_hugepage_only_range() test with an explicit test of the
      VMA using is_vm_hugetlb_page().
      
      This in turn changes behaviour for platforms where is_hugepage_only_range()
      returns false always (everything except powerpc and ia64).  We address this
      by ensuring that hugetlb_free_pgd_range() is defined to be identical to
      free_pgd_range() (instead of a no-op) on everything except ia64.  Even so,
      it will prevent some otherwise possible coalescing of calls down to
      free_pgd_range().  Since this only happens for hugepage VMAs, removing this
      small optimization seems unlikely to cause any trouble.
      
      This patch causes no regressions on the libhugetlbfs testsuite - ppc64
      POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Acked-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9da61aef
    • Nick Piggin's avatar
      [PATCH] mm: more CONFIG_DEBUG_VM · b7ab795b
      Nick Piggin authored
      
      Put a few more checks under CONFIG_DEBUG_VM
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b7ab795b
    • Nick Piggin's avatar
      [PATCH] mm: split highorder pages · 8dfcc9ba
      Nick Piggin authored
      
      Have an explicit mm call to split higher order pages into individual pages.
       Should help to avoid bugs and be more explicit about the code's intention.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: default avatarYoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8dfcc9ba
  6. 17 Mar, 2006 1 commit
    • Hugh Dickins's avatar
      [PATCH] fix free swap cache latency · 6f5e6b9e
      Hugh Dickins authored
      
      Lee Revell reported 28ms latency when process with lots of swapped memory
      exits.
      
      2.6.15 introduced a latency regression when unmapping: in accounting the
      zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
      swap entry counted nothing at all.  We think of pages present as the slow
      case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
      can make a lot of work - and we could have been doing it many thousands of
      times without a latency break.
      
      Move the zap_work update up to account swap entries like pages present.
      This does account non-linear pte_file entries, and unmap_mapping_range
      skipping over swap entries, by the same amount even though they're quick:
      but neither of those cases deserves complicating the code (and they're
      treated no worse than they were in 2.6.14).
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarNick Piggin <npiggin@suse.de>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6f5e6b9e
  7. 17 Feb, 2006 1 commit
  8. 01 Feb, 2006 1 commit
  9. 09 Jan, 2006 2 commits
  10. 06 Jan, 2006 3 commits
    • Nick Piggin's avatar
      [PATCH] mm: pfault optimisation · 41e9b63b
      Nick Piggin authored
      
      This atomic operation is superfluous: the pte will be added with the
      referenced bit set, and the page will be referenced through this mapping after
      the page fault handler returns anyway.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      41e9b63b
    • Nick Piggin's avatar
      [PATCH] mm: rmap optimisation · 9617d95e
      Nick Piggin authored
      
      Optimise rmap functions by minimising atomic operations when we know there
      will be no concurrent modifications.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9617d95e
    • Badari Pulavarty's avatar
      [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store · f6b3ec23
      Badari Pulavarty authored
      
      Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
      given range of pages & its associated backing store.  Current
      implementation supports only shmfs/tmpfs and other filesystems return
      -ENOSYS.
      
      "Some app allocates large tmpfs files, then when some task quits and some
      client disconnect, some memory can be released.  However the only way to
      release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli
      
      Databases want to use this feature to drop a section of their bufferpool
      (shared memory segments) - without writing back to disk/swap space.
      
      This feature is also useful for supporting hot-plug memory on UML.
      
      Concerns raised by Andrew Morton:
      
      - "We have no plan for holepunching!  If we _do_ have such a plan (or
        might in the future) then what would the API look like?  I think
        sys_holepunch(fd, start, len), so we should start out with that."
      
      - Using madvise is very weird, because people will ask "why do I need to
        mmap my file before I can stick a hole in it?"
      
      - None of the other madvise operations call into the filesystem in this
        manner.  A broad question is: is this capability an MM operation or a
        filesytem operation?  truncate, for example, is a filesystem operation
        which sometimes has MM side-effects.  madvise is an mm operation and with
        this patch, it gains FS side-effects, only they're really, really
        significant ones."
      
      Comments:
      
      - Andrea suggested the fs operation too but then it's more efficient to
        have it as a mm operation with fs side effects, because they don't
        immediatly know fd and physical offset of the range.  It's possible to
        fixup in userland and to use the fs operation but it's more expensive,
        the vmas are already in the kernel and we can use them.
      
      Short term plan &  Future Direction:
      
      - We seem to need this interface only for shmfs/tmpfs files in the short
        term.  We have to add hooks into the filesystem for correctness and
        completeness.  This is what this patch does.
      
      - In the future, plan is to support both fs and mmap apis also.  This
        also involves (other) filesystem specific functions to be implemented.
      
      - Current patch doesn't support VM_NONLINEAR - which can be addressed in
        the future.
      Signed-off-by: default avatarBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andrea Arcangeli <andrea@suse.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f6b3ec23
  11. 16 Dec, 2005 1 commit
  12. 13 Dec, 2005 1 commit
    • Linus Torvalds's avatar
      get_user_pages: don't try to follow PFNMAP pages · 1ff80389
      Linus Torvalds authored
      
      Nick Piggin points out that a few drivers play games with VM_IO (why?
      who knows..) and thus a pfn-remapped area may not have that bit set even
      if remap_pfn_range() set it originally.
      
      So make it explicit in get_user_pages() that we don't follow VM_PFNMAP
      pages, since pretty much by definition they do not have a "struct page"
      associated with them.
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1ff80389
  13. 12 Dec, 2005 3 commits
  14. 04 Dec, 2005 1 commit
  15. 30 Nov, 2005 2 commits
  16. 29 Nov, 2005 7 commits
  17. 28 Nov, 2005 2 commits
    • Alan Stern's avatar
      [PATCH] Workaround for gcc 2.96 (undefined references) · e0f39591
      Alan Stern authored
      
        LD      .tmp_vmlinux1
      mm/built-in.o(.text+0x100d6): In function `copy_page_range':
      : undefined reference to `__pud_alloc'
      mm/built-in.o(.text+0x1010b): In function `copy_page_range':
      : undefined reference to `__pmd_alloc'
      mm/built-in.o(.text+0x11ef4): In function `__handle_mm_fault':
      : undefined reference to `__pud_alloc'
      fs/built-in.o(.text+0xc930): In function `install_arg_page':
      : undefined reference to `__pud_alloc'
      make: *** [.tmp_vmlinux1] Error 1
      
      Those missing references in mm/memory.c arise from this code in
      include/linux/mm.h, combined with the fact that __PGTABLE_PMD_FOLDED and
      __PGTABLE_PUD_FOLDED are both set and __ARCH_HAS_4LEVEL_HACK is not:
      
      /*
       * The following ifdef needed to get the 4level-fixup.h header to work.
       * Remove it when 4level-fixup.h has been removed.
       */
      #if defined(CONFIG_MMU) && !defined(__ARCH_HAS_4LEVEL_HACK)
      static inline pud_t *pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
      {
              return (unlikely(pgd_none(*pgd)) && __pud_alloc(mm, pgd, address))?
                      NULL: pud_offset(pgd, address);
      }
      
      static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
      {
              return (unlikely(pud_none(*pud)) && __pmd_alloc(mm, pud, address))?
                      NULL: pmd_offset(pud, address);
      }
      #endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */
      
      With my configuration the pgd_none and pud_none routines are inlines
      returning a constant 0.  Apparently the old compiler avoids generating
      calls to __pud_alloc and __pmd_alloc but still lists them as undefined
      references in the module's symbol table.
      
      I don't know which change caused this problem.  I think it was added
      somewhere between 2.6.14 and 2.6.15-rc1, because I remember building
      several 2.6.14-rc kernels without difficulty.  However I can't point to an
      individual culprit.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e0f39591
    • Linus Torvalds's avatar
      mm: re-architect the VM_UNPAGED logic · 6aab341e
      Linus Torvalds authored
      
      This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
      explicit support for a "remapped page range" aka VM_PFNMAP.  It allows a
      VM area to contain an arbitrary range of page table entries that the VM
      never touches, and never considers to be normal pages.
      
      Any user of "remap_pfn_range()" automatically gets this new
      functionality, and doesn't even have to mark the pages reserved or
      indeed mark them any other way.  It just works.  As a side effect, doing
      mmap() on /dev/mem works for arbitrary ranges.
      
      Sparc update from David in the next commit.
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6aab341e
  18. 22 Nov, 2005 4 commits
    • Hugh Dickins's avatar
      [PATCH] unpaged: ZERO_PAGE in VM_UNPAGED · f57e88a8
      Hugh Dickins authored
      
      It's strange enough to be looking out for anonymous pages in VM_UNPAGED areas,
      let's not insert the ZERO_PAGE there - though whether it would matter will
      depend on what we decide about ZERO_PAGE refcounting.
      
      But whereas do_anonymous_page may (exceptionally) be called on a VM_UNPAGED
      area, do_no_page should never be: just BUG_ON.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f57e88a8
    • Hugh Dickins's avatar
      [PATCH] unpaged: anon in VM_UNPAGED · ee498ed7
      Hugh Dickins authored
      
      copy_one_pte needs to copy the anonymous COWed pages in a VM_UNPAGED area,
      zap_pte_range needs to free them, do_wp_page needs to COW them: just like
      ordinary pages, not like the unpaged.
      
      But recognizing them is a little subtle: because PageReserved is no longer a
      condition for remap_pfn_range, we can now mmap all of /dev/mem (whether the
      distro permits, and whether it's advisable on this or that architecture, is
      another matter).  So if we can see a PageAnon, it may not be ours to mess with
      (or may be ours from elsewhere in the address space).  I suspect there's an
      entertaining insoluble self-referential problem here, but the page_is_anon
      function does a good practical job, and MAP_PRIVATE PROT_WRITE VM_UNPAGED will
      always be an odd choice.
      
      In updating the comment on page_address_in_vma, noticed a potential NULL
      dereference, in a path we don't actually take, but fixed it.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ee498ed7
    • Hugh Dickins's avatar
      [PATCH] unpaged: COW on VM_UNPAGED · 920fc356
      Hugh Dickins authored
      
      Remove the BUG_ON(vma->vm_flags & VM_UNPAGED) from do_wp_page, and let it do
      Copy-On-Write without touching the VM_UNPAGED's page counts - but this is
      incomplete, because the anonymous page it inserts will itself need to be
      handled, here and in other functions - next patch.
      
      We still don't copy the page if the pfn is invalid, because the
      copy_user_highpage interface does not allow it.  But that's not been a problem
      in the past: can be added in later if the need arises.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      920fc356
    • Hugh Dickins's avatar
      [PATCH] unpaged: VM_UNPAGED · 0b14c179
      Hugh Dickins authored
      
      Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
      drivers set VM_RESERVED on areas which are then populated by nopage.  The
      PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
      zap_pte_range, without changing those drivers not to set it: so their pages
      just leak away.
      
      Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
      to flag the special areas where the ptes may have no struct page, or if they
      have then it's not to be touched.  Replace most instances of VM_RESERVED in
      core mm by VM_UNPAGED.  Force it on in remap_pfn_range, and the sparc and
      sparc64 io_remap_pfn_range.
      
      Revert addition of VM_RESERVED to powerpc vdso, it's not needed there.  Is it
      needed anywhere?  It still governs the mm->reserved_vm statistic, and special
      vmas not to be merged, and areas not to be core dumped; but could probably be
      eliminated later (the drivers are probably specifying it because in 2.4 it
      kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
      don't get on).
      
      Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
      purpose whatsoever, and should be removed from drivers when we clean up.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarWilliam Irwin <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0b14c179