[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Arch as a replacement for CVS for OpenBSD?



I don't want to drag out the thread too much on this list, but since
it is a "misc" list, I'll answer weingart's questions for now.  You
can also ask questions on the "arch-users" or "arch-dev" list
(@regexps.com).  I've not answered some of the questions that I'm sure
are clear from the documentation.


	> [various reasons why arch is not ideal for VMS or Mac]

Right.  arch is for Posix environments.



	>   `inventory' is used to identify which files in a tree are
	>   significant, and to assign a logical identity to each file and
	>   directory.  The logical identity remains the same even if a file
	>   is renamed and is the basis on which renames are detected.
	
	I gather that this identity will exist for eternity once it is
	in existance?  If so, how do you manage these identities?  

See http://www.regexps.com/src/docs.d/arch/html/inventory.html


	Is there a cost associated with looking them up or managing
	them?

Very little.  The documentation should make that clear.


	What operations gives rise to a new identity?  

Using mechanism (1) or (3), just creating a new file does it.  If you
use mechanism (2), then you also need to run `larch add FILENAME'.



	Can you rename a branch of a file, and have the trunk still be
	the original name?

Yes.  In fact, I do this all the time.  I have a generic package
("module" in CVS-speak) for the top-level directory of all software I
distribute.    I have branches or each distribution.  On those
branches, once, a while back, I stored revisions that modify the
generic top-level package by removing some files and rearranging other
ones.  To make a new release, I just `larch star-merge' the generic
package with the release-specific branches and all of those deletions
and rearrangements happen automatically.  For example, the generic top
level has:

	docs.d/
	  docs.d/arch/
	    docs.d/arch/html/
	    ...
	  docs.d/hackerlab/
	  docs.d/systas/

The distribution branch for arch renames "docs.d/arch" to "docs" and
deletes the other files.  I never have to do that by hand any more: it
happens automatically during merging.

You can even have a situation where two branches use different names
for the same file.  When you merge back and forth between these
branches, each branch still has its own name for the file, but the
changes are made to the right file in each case.

	
	I assume that directories are lists of such identities?  

Not quite.  A directory is just a directory.  It has both a location
name (a relative path) and a logical name (that determines its role
when making or applying patch sets).  When the mkpatch/dopatch process
tries to decide where a file or directory belongs, it looks at the
logical id of the directory that contains it.


	If so, can you have the equivelant of hard links (one file
	show up in two directory objects)?
	
Not at the moment.  Is that a feature you need for some reason?  If
so, can you describe how you use it?


    In other words, I need to decompress/detar each revision (possibly
    quite large) in order to construct a "diff" between 1.1 and 1.112
    (that would possibly be 112 decomp/detar)?

No.  Revision libraries let you construct a diff with no such
overhead in the general case.  In special cases (e.g. diffs between a
project tree and its immediate ancestor) other caching mechanisms
eliminate that overhead.



	  > * Revision Libraries
	  > 
	  >   In addition to a repository of patch sets, arch is typically
	  >   configured to maintain a "revision library".  A revision library is
	  >   a collection of revisions stored as complete copies of the source
	  >   tree, but with an important space optimization: unmodified files are
	  >   shared among these trees using hard links.

	  These are uncompressed/detarred versions of particular revisions of the
	  patch sets?  IE: they exist wholesale within the repo?  If so, how do
	  you handle moved/renamed files?  Where do they exist?  In both places?
	  Hardlink?  Also, are the patch sets based on any of these revision
	  libraries?  If so, how do you optimize speed/space wrt the number of
	  patch sets you need to apply to recover any one revision?

A revision library is uncompressed/detarred versions of the entire
trees of revisions -- except that files that are common between
revisions (even if renamed) are shared via hard-links.

Patch sets are, by default, constructed from revision libraries and
project trees.  This is about as low-overhead as you can possibly get.
As a fallback, when there is no revision library or the necessary
revisions are missing from the library, they are constructed the slow
way (by first building the ORIG or MOD tree from the patch sets.).


	  This functionality seems dangerously close to having a lock-step version
	  of the repo checked out.  What I mean, is that you in some sense have a
	  global counter, which counts the "step" the repo is at.  Each operation
	  increments the counter.

Nope.  There is a counter ("patch level") for each development line
("branch" in CVS-speak, "version" in arch-speak).  However, all those
counters are independent of one another.


	 > * Tags
	 > 
	 >   Any revision, instead of being a complete tar file of the entire
	 >   tree or a simple patch set, can be a "tag".  Conceptually, a tag is
	 >   a symbolic link to some other revision.  Tags are how branches are 
	 >   implemented (the baseline revision of a branch is a tag of the
	 >   revision being branched from).
	 
	 Does this mean tags are global to the repo?  This concept is
	 very fuzy to me.  How do I compare this to the cvs tag/branch
	 concept?

A tag is just a record in the repository that means "to get this
revision, X (the tag revision), go and get that other revision, Y (the
tagged revision) and add the patch log entry for X."

In CVS, every file in a tree must be tagged independently of the
others.  In arch, you tag everything at once.

	 > * Patch Logs

	 This can be very usefull.  On "commit" does this information get saved
	 within the repository?  If so, how?

Each (whole-tree) revision is stored in its own directory.  One of the
(plain-text, roughly RFC822-format) files in that directory is the log
entry for that revision.


      >   Every repository has a globally unique name.  The name is location
      >   independent: it remains the same for all mirrors of the repository
      >   and if the repository is migrated.

      How are they located?  DNS entries?  Config file?

A config file maps the logical name of the repository to a URL or
local path.  "ftp:" method URLs are currently supported, and there are
some patches pending for an SSH-based transport.

     How do you deal with people that wish to have their own repo?  IE: they
     explicitly do not with to be part of a branch or any other connection
     with the official repo?  How do you deal with patches from such places?
     Do you have support for that, or is it handled much like cvs, except that
     you maintian your own "vendor" branch?

I'm not sure what you're asking.  There are no special problems with
that situation.

-t