Journaling FFS with WAPBL
Jörg Sonnenberger
 
   Overview 
  
    -  A short introduction to FFS 
-  WAPBL: Overview 
-  WAPBL: In-depth 
-  Performance 
-  Open issues 
-  Questions 
 
   A short introduction to FFS 
  
    -  Superblock 
-  Inodes 
-  Directories 
-  Cylinder groups 
-  Consistency requirements 
 
   The FFS superblock 
  
    -  Description of the filesystem 
-  Block size, fragment size, number of blocks, etc 
-  Time of last mount and if unmounted cleanly 
-  Summary of filesystem content 
-  Stored redundantly to protect against bad blocks etc 
-  Different versions, some fields added, some killed 
-  dumpfs(8) tells the version (FFSv2 for WAPBL!) 
 
   Inodes 
  
    -  The file content, not the file name 
-  128 Bytes for FFSv1, 256 Bytes for FFSv2 
-  Link count, time stamps, size, flags, ownership, ... 
-  References to the first 12 blocks and indirect blocks for the rest 
-  Last block can be partially allocated: fragments 
-  Not all blocks have to be allocated: holes 
-  Inodes never end with holes 
-  Extended Attribute block for FFSv2 
 
   Directories 
  
    -  Records of inode number, record len, file type, name 
-  Padded to block boundaries 
-  "." and ".." as special entries 
 
   Cylinder groups 
  
    -  Distribute files over disk, reducing fragmentation 
-  Contain fixed size inode lists 
-  Contain free space bitmaps 
-  Contain superblock copy 
 
   Consistency requirements 
  
    -  Superblocks have to stay in sync 
-  Cylinder groups need consistent summaries and bitmaps 
-  Inodes must be freed once link count reaches 0 
-  Inodes must have indirect blocks written before writting the pointer 
-  Inodes must be initialized before creating directory entries 
-  Inode reference count must be modified on link(2) and unlink(2) 
 
   Practical example: mkdir(2) 
  
    -  Allocate free inode 
-  Allocate block by marking it as used in the bitmap 
-  Write directory template with "." and ".." entry 
-  Increment reference count of parent directory 
-  Write inode to disk with allocated block referenced and ref count 2 
-  Write directory entry to parent directory 
-  Update statistics 
 
   WAPBL: Goals 
  
    -  Crash recovery without fsck 
-  Improve performance by reducing synchronisation 
-  Potentially reduce number of disk seeks by allowing aggregation 
-  Simpler and less error prone than Soft Updates 
-  Trivial to use: mount -o log ... 
 
   WAPBL: Components 
  
    -  The generic WAPBL backend 
-  Integration into FFS 
 
   Overview: The WAPBL backend 
  
    -  Journal writing and replaying 
-  Journal records:
      
        -  Block entry 
-  Revocation of earlier journaled blocks 
-  List of unreferenced allocated inodes 
 
-  bwrite / bdwrite registers buffer and defer writing 
 
   In-depth: Journal layout 
  
    -  Circular buffer of records 
-  Header block at the start and the end of the log area 
-  Headers are written alternatively with generation counter 
-  Newer header determines newest valid and oldest active record 
-  Explicit disk synchronistation after all writes 
 
   In-depth: Journal layout (II) 
  
    -  Block entries: to be written to given location after crash 
-  Block revocation: when changing from meta data to data block 
-  Unreferenced allocated inode:
      
        -  During initialisation: mode = 0 
-  Unlinked, but still open: mode != 0 
 
 
   In-depth: Journal replay 
  
    -  Process all journal entries in order:
      
        -  Block entries: add to hash table 
-  Revocation entries: remove entries from hash table again 
-  Unreferenced inodes: keep last entry 
 
-  If not mounting read-only, write all blocks back to disk 
-  Call filesystem backend for unreferenced inodes 
-  Shared code between kernel and fsck 
 
   Overview: FFS integration 
  
    -  Journal location in superblock 
-  Registration of inode allocation and freeing 
-  Registration after freeing meta data blocks 
-  Annotate transaction borders 
-  Allocation of journal 
-  Journal replay on mount 
 
   Journal location 
  
    -  End of partition:
      
        -  Size limited only by disk space 
-  Disk address, size and block size stored in superblock 
 
-  In-filesystem:
      
        -  Limited to size of cylinder group 
-  Address, size, block size and inode number in superblock 
 
-  On mount, journal is created on-demand:
      
        -  At the end, if enough free space (1MB journal per 1GB size) 
-  Inside the filesystem (up to 64MB, at least 1MB) 
 
 
   In-depth: mkdir(2) 
  
    -  -> sys_mkdir 
-  -> ufs_mkdir 
-  Allocate and register new inode: 
 ffs_valloc: UFS_WAPBL_BEGIN + ffs_nodealloccg + UFS_WAPBL_END
-  UFS_WAPBL_BEGIN 
-  UFS_UPDATE -> unregister inode again 
-  (write template) 
-  UFS_WAPBL_END 
 
   In-depth: mkdir(2) journal record 
  
    -  First transaction:
      
        -  Cylinder group updates (Block entry) 
-  Inode update (Block entry) 
-  Unreferenced inode list 
 
-  Second transaction:
      
        -  Inode update (Block entry) 
-  Inode update for parent (Block entry) 
-  Directory content (Block entry) 
-  Unreferenced inode list 
 
 
   In-depth: ffs_write 
  
    -  Can be called from inside the filesystem code or from sys_write/vn_write 
-  UFS_WAPBL_BEGIN if not already inside a transaction 
-  -> VOP_PUTPAGES 
-  UFS_WAPBL_END if started earlier 
 
   Performance: test system 
  
    -  HP ProLiant ML110 
-  Xeon 3040 @1.86GHz 
-  2GB memory 
-  Test on dedicated SATA disk, write caching enabled 
-  OpenSuSE 11.1 and NetBSD 5.0 
 
   Performance (I): 10x pkgsrc.tar.bz2 
    
 
 
   Performance (II): build.sh release 
    
 
 
   Open issues 
  
    -  No checksum of journal entries 
-  Too much data flushing 
-  Too much serialisation of writes 
-  Holding the journal locked over UBC operations 
-  No data ordering 
-  Support for external journal