The internal design
[Hints and documentation for developers]

Terms used in this document

Entry

In subversion speak an entry is either a directory, a symlink or a file; In FSVS it can additionally be a block or character device.
Sockets and pipes are currently ignored, as they're typically re-created by the various applications.

WAA, CONF

Please see waa_file.

In-memory layout

In memory fsvs builds a complete tree of the needed entries (struct estat). They are referenced with the parent pointers upwards to the root, and the estat::by_inode and estat::by_name downwards.

Storage and allocation

Every directory entry can have a string space allocated, ie. space needed for the filenames in this directory (and possibly sub-directories, too.) On loading of the list in waa__input_tree() two memory ranges are allocated - one for the struct estats read, and one for the filenames.

Because of this free()ing of part of the entries is not possible; a freelist for the struct estats is used, but the string space is more or less permanent.

Algorithms and assumption in the code

Generally I tried to use fast and simple algorithms better than O(n); but it's possible that I forgot something.

Searching for an entry

Searching for an entry in a directory (in memory) is O(log2(n)), as I use bsearch().

Writing the entry list

Determining the correct order for writing the entries (in waa__output_tree()) is optimized by having all lists pre-sorted; about half the time (tested) a single compare is enough to determine the next written entry.

Note:
Of course, to keep the lists sorted, a lot of comparisons have to be made before waa__output_tree().

estat::by_inode and estat::by_name

The by_inode and by_name members are pointers to arrays of pointers to entries (:-); they must reference the same entries, only the order may differ.

by_inode must (nearly) always be valid ; by_name is optional.

The flag estat::to_be_sorted tells waa__output_tree() that the order of the by_inode array might be wrong, and has to be re-sorted before use.

While scanning for changes we use a global by_inode ordering, as this is much faster than naive traversal; the by_name array is used for comparing directories, to determine whether there are any new entries.

Both arrays must include a NULL -pointer at the end of the array.

Manber-Hash and MD5

To quickly find whether a given file has changed, and to send only the changed bytes over the wire, we take a running hash (a Manber-Hash), and whenever we find a "magic" value we take that as buffer end.

We calculate the MD5 of each buffer, and store them along with their start offset in the file. So on commit we can find identical blocks and send only those, and while comparing we can return "changed" as soon as we find a difference.

Error checking and stopping

Return codes are checked everywhere.

The return value of functions in this program is normally (int); 0 means success, something else an error.

Either this error is expected (like ENOENT for some operations) and handled, or it must be returned to the caller. Most of this is already defined in macros.

Typical function layout is like this (taken from waa.c):

  int waa__make_info_link(char *directory, char *name, char *dest)
  {
    int status;
    char *path, *eos;
  
  
    STOPIF( waa___get_waa_directory(directory, &path, &eos), NULL);
  
    strcpy(eos, name);
        ...
    if (access(path, F_OK) != 0)
      STOPIF_CODE_ERR( symlink(dest, path) == -1,
          errno, "cannot create informational symlink '%s' -> '%s'",
          path, dest);
  
  ex:
    return status;
  }

When a function gets called by subversion libraries, we have to use their return type. Here an example from commit.c:

svn_error_t *ci___set_props(void *baton, 
    struct estat *sts,
    change_any_prop_t function,
    apr_pool_t *pool)
{
  const char *ccp;
  svn_string_t *str;
  int status;
  svn_error_t *status_svn;


  status=0;
    ...
  if (sts->entry_type != FT_SYMLINK)
  {
      ...
    str=svn_string_createf (pool, "%u %s", 
        sts->st.uid, hlp__get_uname(sts->st.uid, "") );
    STOPIF_SVNERR( function, (baton, propname_owner, str, pool) );

    ...
  }

ex:
  RETURN_SVNERR(status);
}

The various STOPIF() -macros automatically print an error message and, depending on the debug- and verbosity-flags given on the command line, a back trace too.

Another special case is output to STDOUT; if we get an error EPIPE here, we pass it up to main() as -EPIPE (to avoid confusion with writing some other data), where it gets ignored. To avoid printing an error message this is hardcoded in the STOPIF() macros.

Assertions should be checked by BUG_ON(condition, format_string, ...). This will cause a segmentation violation, which (for debug builds) will automatically attach a debugger (gdb, only if present on the system).

Comments and documentation

FSVS is normalized to use doxygen format for the documentation: "/** ... */".

For non-trivial things it's practical to document the thoughts, too; such internal documentation uses the normal C-style comments ("/* ... */").

/* in documentation

In cases where a slash / and a star * have to be used in the documentation, there's a hack by putting a paragraph symbol (\c2\xa7 in UTF-8) between them, so that it doesn't break the comment block.

There's a perl hack for documentation generation, where these get removed.

Note:
For C this would not be strictly necessary; There's always the way of putting a # if 0 block around that comment block. Doxygen doesn't allow this; even if using a shell script (with comments indicated by #) doxygen doesn't allow /* or */.

About the tests

Delays after commit

There have been a lot of "sleep 1" commands in the tests, to get directories' mtime to change for new entries.

Now they are mostly changed to a simple "-o delay=yes" on the commit just above, which should give us about half a second on average.

Note:
If FSVS has to be run for the check, it must wait until the other instance has finished - else the dir-list file and so on won't be written; so parallel checking via & and wait doesn't really work.

Simply putting delay=yes in the FSVS configuration file more than doubled the run time of the tests - this was unacceptable to me.


Generated for fsvs by  doxygen 1.6.1