Document Management Systems

1. Focus on the customer. Identify who will use your documentation and what they believe they require. Then negotiate with them to determine the minimal subset that they actually need.

2. Keep it just simple enough, but not too simple. The best documentation is the simplest that gets the job done. When creating documentation, follow AM's principle, Use the Simplest Tools ( ), and its practices, Create Simple Content and Depict Models Simply ( ).

3. The customer determines sufficiency. As the documentation's author, you should ensure that it has meaning and provides value; your customer's role is to validate that you have done so.

4. Document with a purpose. Create a document only if it fulfills a clear, important and immediate goal of your overall project.

5. Prefer other forms of communication over documentation. Documentation supports knowledge transfer, but it's only one of several methods of communication. Often, alternative options may be more useful.

6. Put documentation in the most appropriate place. Where will someone want a piece of documentation? Varying projects require different types of documentation. For example, a design decision is best documented in the code when programmers are the primary audience, is best added as a note on a diagram when the primary audience is a designer, or is best placed in an external document when the audience includes management.

7. Wait for what you are documenting to stabilize. Delay the creation of all documents as long as possible, creating them just before you need them. For example, system overviews are best written as you near a product's release, because by that time, you know what you've actually built.

8. Display models publicly. When you publicly display models--on a whiteboard, corkboard or internal Web site--you're promoting transfer of information and communication. More communication leads to less detailed documentation, because people are already aware of the basic information they've gleaned from your model.

9. Document current models first. If you keep a model up-to-date, it's probably valuable--and therefore worthy of documentation.

10. Require people to justify documentation requests. Ask your customers what they intend to use the documentation for and how they will actually use it. Their answers will often reveal that some documentation functions only as a security blanket. There are much better way ways to address fear than by providing superfluous documentation.

11. Write the fewest documents with the least overlap. Try to build larger documents from smaller ones. I once worked on a project in which all documentation was written as HTML pages, with each page focusing on a single topic. This process defined information in one place and one place only, so there was no opportunity for overlap.

12. Get help from an experienced writer. Technical writers know how to organize and present information effectively. Also, write documentation with a partner--just as pair programming can provide significant value ( ), "pair documenting" can be equally effective.


Linux Gazette: A Need for Documentation

(Oct 14, 2001, 19:07 UTC) (2275 reads) (12 talkbacks) (Posted by mhall)
"In fact, the better the documentation, the simpler it is to use a program. Take the Apache web server for example, it comes with heavy documentation. As a result, anybody who can understand a little English is able to use Apache and configure it--without using a point-and-click interface. This article tries to encourage programmers to document their projects, as well as to provide ideas and tips on doing so."

QDocs Document Management System - Freeware

Columbus Free Document Management ScrollKeeper Open Source Document Management [Nov. 28, 2001]

Operating systems are very complex these days, composed of many parts and pieces. Linux, like other Unix-like, free software operating systems, is really just a collection of autonomous and dependent software packages. On my workstation there are about 850 packages at last count. A moderately busy, production Internet server might have as many as 650 packages. And a development server, supporting diverse activities of a complex development team, might have as many as 1,000 packages.

Each of these packages, in turn, contains files numbering from a few to several hundred. My workstation's 850 packages contain about 120,000 files, which means each package contains an average of 140 files. Some of these files, thankfully, are package documentation. If 5% of each package's files are documentation, that's 5,000 documentation files. Together these files form the system documentation, which is one of the many virtues of a free software operating system.

Table of Contents

The Document Collection Problem


OMF: Free Software's Dublin Core Lite

Hacking ScrollKeeper

The Future of ScrollKeeper


Not only is there a lot of system documentation, but it exists in a wide variety of formats, conventions, and standards. For example, on a Linux system you can find documentation in command-line switches (-h, --help); man pages (often produced by groff or similar "first generation" Unix documentation systems); plain text README files, which may or may not follow a folkish layout or structural convention; info and texinfo files, an explicitly tree-shaped, node-based documentation system intended as a "next generation" man system; TeX and LaTeX files, from which DVI files are generated; Adobe PDF and Postscript files with or without the source documents from which they were derived; and lest we overlook the angle-bracket world: DocBook (perhaps versions 2, 3, and 4, in SGML and XML instances), HTML (of various vintage and contemporary DTDs), XHTML, and ad hoc, project-specific markup languages; and then there are many less well-known, less well-adopted documentation systems, with various capabilities and conventions and formats.

Each of these formats has one or several viewing contexts -- applications that are ideal or merely passable for viewing them -- and, perhaps, a compiler-like application that's used to create them.

In short, there is, in principle, a non-trivial document collection problem inside every server or workstation on your network.

The great temptation is to throw it all away in favor of And that might actually work in many cases; sometimes you'll want to look at the documentation for a new version of some package in order to decide whether you want to upgrade. It doesn't make much sense to install the package, read the new documentation, only to learn you didn't want to install the package after all, and then remove the package. It's simpler to find the new documentation on the Web first.

But some package maintainers do not use the Web exclusively. And there are virtues, depending on the context of usage, across the variety of document flavors: in some settings, a man page is precisely what you want. Throwing it all away in favor of the Web isn't a real solution.

System administrators and users need a framework for document collection that's evolved within the ecological niche of a free software Unix-like operating system. And that's exactly what the ScrollKeeper project provides. ScrollKeeper is "a cataloging system for documentation on open systems," which "manages documentation metadata...and provides a simple API to allow help browsers to find, sort, and search the document catalog." ScrollKeeper uses the Open Source Metadata Framework (hereafter, OMF) -- a subset adaptation of Dublin Core -- to describe document metadata.

Over the course of its evolution, ScrollKeeper has been guided by the document collection needs of the GNOME project in particular, with Dan Mueth, lead of the GNOME Documentation Project, and Sun's Laszlo Kovacs contributing design ideas and code. This should come as no surprise since one aim of GNOME is to provide a consistent, unified interface for Unix-like systems, and that means not only providing consistent help and documentation tools for GNOME applications, but for the underlying system documentation as well.


The current version of ScrollKeeper (0.2) provides basic support for two different kinds of user: package maintainers who provide system documentation are encouraged to create an OMF file to describe their documentation resources; system integrators and document application developers are encouraged to use ScrollKeeper's metadata API to create a variety of "help browsers" and other collection tools, including integrating help and document functions into existing systems, like the Nautilus file browser or GNOME control panel. ScrollKeeper thus provides a kind of "middleware" between document producers and consumers.

In practice, ScrollKeeper is a tool chain which can be used to create, store, and manages trees of document metadata, especially metadata represented as OMF instances. It serves as a concrete means to promote the use of OMF as a metadata representation. These two goals are mutually reinforcing. Without some standard metadata representation, it is extremely difficult to create a general metadata management API. Imagine, for example, writing metadata extractors for each of the document formats above, some of which don't have any, to say nothing of a standard, way to represent metadata.

A document collection tool that's going to survive in this niche really needs a generalized metadata representation, which is what the OMF provides. But the flip side is equally true. Without some promise that metadata description efforts, however minimal, will bear fruit (by being well-integrated at the user level), it's difficult for independent (often non-commercial) package maintainers to see the point of exerting even minimal effort to describe documentation resources at all.

Since document collections can be conceptualized as trees, and since XML/SGML is very good at representing data as trees, it's unsurprising to learn that ScrollKeeper uses XML extensively. There are three central parts of ScrollKeeper currently -- a contents list, a table of contents, and an extended contents list -- which it creates at document install or uninstall time and stores as XML.

The contents list is a system-wide tree of every document known to ScrollKeeper, often sorted on the OMF subject element, which is ideally constrained by means of a controlled vocabulary of subjects (i.e., an authoritative classification of subject values, in canonical form, which is used to normalize subject data).

At this point the conceptual division of labor is clear. People who write help browsers and other user applications aren't necessarily interested in creating controlled vocabularies. Further, different communities employing OMF may well need to use different domain-specific controlled vocabularies. For example, the controlled subject vocabulary suitable for GNOME application documents wouldn't necessarily be well-suited to describe other kinds of documentation resources. The various users of a metadata representation scheme like OMF may need several controlled vocabularies, without which metadata will, over time, become fragmented, unreliable, and less useful.

The contents list is created as ScrollKeeper examines OMF instances, which are stored in a directory, $OMFDIR, say, /usr/share/omf. Thus, in order for package maintainers to register their resource metadata with ScrollKeeper, they merely have to ensure that an OMF instance is copied to $OMFDIR. There are plans for future versions of ScrollKeeper to create OMF instances on the fly by extracting metadata from document resources that store metadata in predictable, sane ways. DocBook is a good example of a format from which, in principle, metadata may be automatically extracted. In order to avoid name collisions, ScrollKeeper specifies a template for the name of a file-based OMF instances -- [document_title]-[locale].omf.

The table of contents is a per-document tree representing the main structural contents of a document (i.e., sections and subsections). ScrollKeeper creates the table of contents automatically for DocBook resources by extracting section and subsection elements.

The extended contents list is another system-wide tree created by merging the contents list and the table of contents for each document in the contents list. It's simple to imagine a fairly useful system-wide help browser which just gives users a way to navigate a graphical representation of the extended contents list tree.

If you're using ScrollKeeper in an application, locating the various XML representations of the contents list, extended contents list, and tables of contents is as simple as calling scrollkeeper-get-contents-list [language], which returns the file system path of the contents list XML document; scrollkeeper-get-extended-contents-list [language], which returns the file system path of the contents list XML document; scrollkeeper-get-toc-from-docpath [docpath], which returns the file system path of the table of contents of a document; and scrollkeeper-get-toc-from-id [doc_id], which also returns the table of contents path, given a document id.

HomeVault Document Management System