In subversion speak an entry is either a directory, a symlink or a file; In FSVS it can additionally be a block or character device.
Sockets and pipes are currently ignored, as they're typically re-created by the various applications.
Please see waa_file.
In memory fsvs builds a complete tree of the needed entries (struct
estat
). They are referenced with the parent
pointers upwards to the root, and the estat::by_inode
and estat::by_name
downwards.
Every directory entry can have a string space allocated, ie. space needed for the filenames in this directory (and possibly sub-directories, too.) On loading of the list in waa__input_tree()
two memory ranges are allocated - one for the struct estat
s read, and one for the filenames.
Because of this free()ing
of part of the entries is not possible; a freelist for the struct estat
s is used, but the string space is more or less permanent.
Generally I tried to use fast and simple algorithms better than O(n)
; but it's possible that I forgot something.
Searching for an entry in a directory (in memory) is O
(log2(n)), as I use bsearch()
.
Determining the correct order for writing the entries (in waa__output_tree()
) is optimized by having all lists pre-sorted; about half the time (tested) a single compare is enough to determine the next written entry.
The by_inode
and by_name
members are pointers to arrays of pointers to entries (:-); they must reference the same entries, only the order may differ.
by_inode
must (nearly) always be valid ; by_name
is optional.
The flag estat::to_be_sorted
tells waa__output_tree()
that the order of the by_inode
array might be wrong, and has to be re-sorted before use.
While scanning for changes we use a global by_inode
ordering, as this is much faster than naive traversal; the by_name
array is used for comparing directories, to determine whether there are any new entries.
Both arrays must include a NULL
-pointer at the end of the array.
To quickly find whether a given file has changed, and to send only the changed bytes over the wire, we take a running hash (a Manber-Hash), and whenever we find a "magic" value we take that as buffer end.
We calculate the MD5 of each buffer, and store them along with their start offset in the file. So on commit we can find identical blocks and send only those, and while comparing we can return "changed" as soon as we find a difference.
Return codes are checked everywhere.
The return value of functions in this program is normally (int); 0 means success, something else an error.
Either this error is expected (like ENOENT for some operations) and handled, or it must be returned to the caller. Most of this is already defined in macros.
Typical function layout is like this (taken from waa.c
):
int waa__make_info_link(char *directory, char *name, char *dest) { int status; char *path, *eos; STOPIF( waa___get_waa_directory(directory, &path, &eos), NULL); strcpy(eos, name); ... if (access(path, F_OK) != 0) STOPIF_CODE_ERR( symlink(dest, path) == -1, errno, "cannot create informational symlink '%s' -> '%s'", path, dest); ex: return status; }
When a function gets called by subversion libraries, we have to use their return type. Here an example from commit.c:
svn_error_t *ci___set_props(void *baton, struct estat *sts, change_any_prop_t function, apr_pool_t *pool) { const char *ccp; svn_string_t *str; int status; svn_error_t *status_svn; status=0; ... if (sts->entry_type != FT_SYMLINK) { ... str=svn_string_createf (pool, "%u %s", sts->st.uid, hlp__get_uname(sts->st.uid, "") ); STOPIF_SVNERR( function, (baton, propname_owner, str, pool) ); ... } ex: RETURN_SVNERR(status); }
The various STOPIF()
-macros automatically print an error message and, depending on the debug- and verbosity-flags given on the command line, a back trace too.
Another special case is output to STDOUT
; if we get an error EPIPE
here, we pass it up to main() as -EPIPE
(to avoid confusion with writing some other data), where it gets ignored. To avoid printing an error message this is hardcoded in the STOPIF()
macros.
Assertions should be checked by BUG_ON
(condition, format_string, ...). This will cause a segmentation violation, which (for debug builds) will automatically attach a debugger (gdb
, only if present on the system).
FSVS is normalized to use doxygen format for the documentation: "/** ... */"
.
For non-trivial things it's practical to document the thoughts, too; such internal documentation uses the normal C-style comments ("/* ... */"
).
In cases where a slash /
and a star *
have to be used in the documentation, there's a hack by putting a paragraph symbol (\c2\xa7
in UTF-8) between them, so that it doesn't break the comment block.
There's a perl hack for documentation generation, where these get removed.
# if 0
block around that comment block. Doxygen doesn't allow this; even if using a shell script (with comments indicated by #
) doxygen doesn't allow /*
or */
.There have been a lot of "sleep 1"
commands in the tests, to get directories' mtime to change for new entries.
Now they are mostly changed to a simple "-o delay=yes"
on the commit just above, which should give us about half a second on average.
&
and wait
doesn't really work.Simply putting delay=yes in the FSVS configuration file more than doubled the run time of the tests - this was unacceptable to me.