DITrack Design Document (Draft) $Id$ $HeadURL$ Copyright (c) 2006 The DITrack Project, www.ditrack.org. 1. Ontology And Terminology =========================== 1.1. Data building blocks ------------------------- From an end-user perspective, an ISSUE is a set of an original description, all further comments and attached files. Each issue has a HEADER that contains meta-information about the issue. We expect an issue to have certain headers; also there may be arbitrary headers added by a user. On the implementation side, an issue is comprised as a sequence of COMMENTS. Each comment contains three conceptual parts: a header, a payload (text) and a delta. A comment header contains meta-information about the comment itself (such as addition date, authorship, etc). As with issue headers, a number of header fields are predefined, and there may be others added by user (that we treat transparently). A comment payload is an optional data entered by the comment author. A comment delta is the difference in *issue* headers as they existed before and after the comment addition. Each comment is stored in a database, while issue headers are implicit. It means that in order to figure out current issue headers we need to read in all comments and sequentially apply the deltas. 1.2. Naming ----------- By use of DITrack, an end user deals with two kinds of ENTITIES: ISSUES and COMMENTS. Thus, we need somehow to distinguish those. We refer to entities by IDENTIFIERS (IDS). Entities that are already committed and reside in a repository are assigned numeric ids which are referred to as NUMBERS. Issue numbers are integrals starting with 1, without gaps (normally). Entities that are not yet committed reside in a local storage which is called Local Modifications Area (LMA). We refer to such entities as to "local" ones. To avoid ever confusing local entities with committed ones, local entities are assigned character ids, which we refer to as NAMES. The name sequence goes as A, B, .. Z, AA, AB, .. AZ, BA, .. BZ, ... We use the term 'entity' where the distinction between a comment and an issue is not important. Similarly, we use the term 'id' (or 'identifier') where the distinction between a name and a number is not important. Comment identifiers are compound in the form of X.Y. Where X is the issue id and Y is the comment id. Comment numbers start with 0, thus the very first comment is actually an issue description. Below is a sample of *theoretically* possible combinations: 1.0 - comment 0 of issue 1 (actually, issue 1 original description). 1.1 - comment 1 of issue 1. 1.A - the first local comment to a committed issue 1. 0 - INVALID issue number. 1 - issue 1 as a whole. 0.1 - INVALID comment id. A.0 - INVALID (local issue A can't have committed comments) A.1 - INVALID (local issue A can't have committed comments). A.A - the very first comment (a descritpion) of local issue A. 2. Basic Philosophy =================== The DITrack clients deal with working copies of a repository. A working copy is always a pristine snapshot of the repository (unless it's wedged, which is an abnormal condition). The repository/working copy data is always plain text, so that the issue database is accessible even in the absence of a DITrack client. The LMA is *not* plain text now, but it may change later. It is currently just a tradeoff between the implementation speed and transparent data representation. 3. Data Representation ====================== Comments are represented on disk as RFC2822 plain text files. The message (in RFC2822 terms) payload is the comment payload (and may be absent). The headers of the message represent both headers and delta. Each header of the message which starts with 'DT-' is used by DITrack for special purposes (e.g. to encode the delta). Deltas are encoded by using 'DT-Old-*' and 'DT-New-*' fields. For each changed header 'X' there must be two corresponding headers, 'DT-Old-X' and 'DT-New-X' which specify the old and new values of the header respectively. Empty values are treated as 'absent'. Thus, if 'DT-New-X' header has a blank value, the header 'X' is considered to be deleted as a result of applying the delta.