OpenBSD Journal

Developer blog: niallo

Contributed by dhartmei on from the opencvs dept.

Niall O'Higgins (niallo@) writes:

There has been quite a bit of interest from the user community in the OpenCVS project over the past while, and I have been thinking about writing some sort of blog entry on the subject for some time. I guess Marco's incessant blogging has finally gotten to me and inspired me to try to write something myself!
Since I have been hacking on the RCS parser lately, I thought I'd write a short article about this file format which sits at the core of CVS.

First of all, a little bit of history. RCS was first developed in the early 1980s by Walter Tichy at Purdue University, so its over 20 years old. RCS operates on individual files. If you're not already acquainted with how the RCS tools work, take a look at the rcsintro(1) manual page. There is also a 1985 paper RCS - A System for Version Control which has further information. To anybody with CVS experience, RCS may seem strangely familiar. CVS as we know it pretty much arrived on the scene around 1990. Rather than dealing with individual files, CVS operates on a directory of RCS files. This is why an RCS implementation lies at the heart of OpenCVS, and why we are using the OpenCVS code to write a replacement for GNU RCS, too (OpenRCS). This paper (PDF) from 1990 describes the implementation of CVS and its status at that time.

Essentially, the RCS file format (all those files with a ,v extension), is ASCII based and made up of three section. The first of these is the admin secion, which contains information about locks, symbolic names, etc. E.g.:

head 1.3;
access;
symbols;
locks
        niallo:1.3; strict;
comment @# @;

Following this section comes the deltas. Each entry in this section basically describes a particular revision of the file being tracked. It contains the date and time of the revision, the author, the state, sometimes branch information and a pointer to the previous revision. E.g.:

1.3
date    2006.02.27.17.07.13;    author niallo;  state Exp;
branches;
next    1.2;

1.2
date    2006.02.27.17.06.38;    author niallo;  state Exp;
branches;
next    1.1;

Finally, there is the deltatext section. These contain the data corresponding to the file contents and associated commit log message at each revision. Typically, the HEAD deltatext (usually the first appearing in the file) is the complete contents of the file. All previous deltatexts contain diffs against this deltatext in a sort of reverse ed(1) format. E.g.:

d1 1
a1 1
$Id$
d3 1

This deltatext demonstrates the two operations patches perform: delete, and addition. "d1 1" translates to "at line one, delete one line worth of text". Conversely, "a1 1" following by its line of data translates to "at line one, insert the following single line of text".

In order to retrieve a particular revision, you typically first retrieve the HEAD revision, then calculate the final data by reverse-applying the diffs in the deltatexts up to the desired revision.

Well, that's it for my short introduction to the RCS file format. Hopefully, this should illustrate why certain things can be tricky or slow in CVS/RCS, namely:

  • Dealing with binary files.
  • Retrieving an arbitrary revision requires parsing all previous deltas and deltatexts, thus it is not at all a random access operation.

(Comments are closed)


Comments
  1. By cycloon (62.206.217.131) on http://cycloon.org

    These Blog-like entries are getting really nice. Especially when reading source-changes@ costs too much time.

    Comments
    1. By daniel (82.131.15.100) on http://septum.org

      couldn't agree more

      feels like i'm part of "the gang" :)

      keep 'em coming!

  2. By Anonymous Coward (213.5.161.18) on

    by the way, do you think opencvs will be ready for inclusion by the time openbsd 4.0 ships? Or you have too much still ahead?

    Comments
    1. By Xavier Santolaria (158.169.131.14) xsa@ on

      Hopefully yes. OpenRCS should be finished real soon and thus linked to the builds (after trees unlock of course).

  3. By Thorsten Glaser (213.196.250.125) on http://mirbsd.de/

    Please go on, this was already known to me,
    but we have great interest in OpenCVS as well.
    Currently we are using GNU CVS 1.12.13 which
    contains some very interesting features such
    as commitid (can be used to emulate changeset
    support), but we hope OpenCVS will be ready
    some time in the future to switch over.

    I'm especially interested in whether the stuff
    in CVSROOT/ (the scripts) will be compatible,
    because we plan to do the following:

    On each commit, the script which ordinarily
    writes the ChangeLog entry (log_accum2 in
    OpenBSD) gathers all the changed files and
    creates a CTM delta which is then pushed to
    the "primary anoncvs mirror". This avoids
    the need to rsync every so often and gives
    an additional bonus of "more" (not totally)
    atomical operations.

    Comments
  4. By dingo (198.208.159.18) af.dingo@gmail.com on

    I am really looking forward to OpenCVS, I use gnu CVS every day and we will all benefit from the quality of an OpenBSD project with a BSD license replacing yet another gnu application. I really like this trend...

    Thanks a million for the update, niallo, it has inspired an extra donation from me!

    Would this be the first modern concurrent versions system with a BSD license?

    Comments
    1. By Janne Johansson (130.237.95.193) on www.inet6.se

      If you build svn without BDB, you should be able to get free of GPL at least.
      The apache license is weird too, but in a different way.
      (for the APR stuff that is)

      Comments
      1. By SH (82.182.103.172) on

        The bdb has a BSD license with some additions, is not GPL. There are components used by Subversion that is GPL, like the Neon library, but if you remove it then you can't access repositories hosted using Apache httpd mod_dav_svn module.

    2. By Ray (199.67.138.42) on

      There’s also OpenCM.

      Comments
      1. By jfb (69.70.32.10) on

        Yeah, if you feel like waiting for hours just for a single update or commit. OpenCM is very nice conceptually, but in practice, it is anything but usable (at least the last time I tried it).

    3. By jfb (69.70.32.10) on

      Would this be the first modern concurrent versions system with a BSD license?

      I would say it's the first modern CVS period (or are you talking about versioning systems in general, in which case it is not, I think OpenCM is BSD-licensed). The GNU CVS code has become quite spaghettiish over the years and there are a lot of small bugs creeping in there, as well as a complete lack of standards with regards to style...

  5. By Anonymous Coward (84.188.211.96) on

    What advantages (compared to GNU-CVS (functionality) will OpenCVS have?
    Or do you just re-code GNU CVS (there also many reasons to do this of course).

    But I would know if OpenCVS will have any function the original CVS does not have.

    Comments
    1. By Nate (65.94.100.49) on

      Additional functionality comes later, at first it's going to be as close to completely compatible with the GNU one as possible antd then move into new functionality and ideas later.

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]