It has become appallingly obvious that our technology has exceeded our humanity.
Calvin Northrup Mooers (1919–1994), was an American computer scientist who coined the term “Information Retrieval” in March 1950 and went on from there to obtain several patents in information retrieval and signaling, design a text-handling language (TRAC), author more than 200 publications, and form one of the first companies, whose only concern was information.
Mooers was a native of Minneapolis, Minnesota, attended the University of Minnesota, worked at the Naval Ordnance Laboratory from 1941 to 1946, and then entered the M.I.T. (Massachusetts Institute of Technology), where he earned a master’s degree in mathematics and physics. At M.I.T. Mooers developed a mechanical system using superimposed codes of descriptors for information retrieval called Zatocoding, and founded the Zator Company in 1947 to market his idea.
In 1951 Mooers issued a report entitled Making Information Retrieval Pay, in Issue 55 of the Zator technical bulletin. In section II, INFORMATION RETRIEVAL vs. INFORMATION WAREHOUSING, of this report, Moore stated:
Information retrieval must be distinguished from another operation performed on information. This is the ‘information warehousing’ operation, which is the orderly receipt, cataloguing and storage of information. Almost every library does a highly efficient and satisfactory job of information warehousing. This is fortunate, since successful operation of information retrieval—discovery and use of information–depends upon competent information warehousing. On the other hand, merely to warehouse a large collection of information does little to aid the User to discover the information he needs. Here we have a prevalent fallacy of the libraries.
In section IX, The DOKEN, we can read:
Can the world-wide torrent of scientific information—from an estimated 30,000 periodicals containing an estimated 1,000,000 papers per annum–be met by any conceivable retrieval machine? The answer is yes, and the back-log (estimated roughly at 100,000,000 pieces) can be handled too.
No existing machine is capable of doing a reasonable job of information retrieval on such a collection. The fastest electronic tabulating machinery would seem to require about 2,600 hours, or about 3 1/2 months, to scan a collection of 100 million pieces in answer to one request for information. The Microfilm Rapid Selector, according to published speeds, would take about 170 hours of steady running time or about a week to make the same search. Both these are too slow to meet a reasonable requirement that a central agency having such a machine should be able to make a number of searches each day, and to send out the bibliographies the same day the request was received.
A machine that can do this job is actually possible—and it can be constructed within the limitations of our present technology. I will describe some of the features of such a machine in order that you will know what such a machine will be like when it is built. On the other hand, I can’t tell you the date that his machine will actually be constructed because I cannot forecast when anyone will be able to afford it. The great expense is not in the machine. The machine will cost less than one of the enormous computing machines that we have been hearing so much about, and which some organizations seem to be able to afford. The real cost is handling and analyzing the magnitude of information in setting up the system. We should figure on a cost of at least $2 per item. Thus the annual cost of processing the world’s information—$2,000,000—would be several times the cost of the machine itself. But, to get back to the details of our hypothetical machine:
We will call the machine the D O K E N, which is short for “documentary engine”. The DOKEN is capable of making a complete multi-subject search of 100 million items in about 2 minutes, and having scanned the record, it reproduces or prints a bibliography of the selected abstracts at a rate of about 10 per minute by a dry printing process. Many searches are conducted each hour, steadily, throughout the day. After the first DOKEN is operating, film records for other DOKENS can be inexpensively copied at a fraction of the original cost. A DOKEN is a most appropriate instruction for national or regional research centers. It would be the information retrieval auxiliary instrument at a large library center for the local collection plus the entire world’s literature. For instance, it could scan the Library of Congress collection (10 million catalogued items) in 10 seconds.
The DOKEN can achieve the stated performance goal only by recourse to the most efficient techniques. That means that the job must be broken down into the different functional operations, and highly efficient specialized structures and methods are used to accomplish each. There are three separate functional organs that we must consider. They are: 1) the code storage and scanning engine, 2) the abstract record and reading engine, and 3) the abstract printing stations. These organs, unlike the corresponding elements in the Rapid Selector, are physically separate structures. We will consider them in turn.
The Code storage and scanning engine contains the coded subjects of 100 million documents. Therefore, at least from considerations of sheer bulk, the most efficient possible subject coding must be used. The choice here is Zatocoding—the method of superimposition of random codes in each subject field—since this method seems to be considerably more efficient than any other coding scheme now known. We let each document be described by as many as 25 different cross-referenced subjects. The coded record is micro-photographed on photographic film, and this film strip is helically wound on a metal drum 10 feet in diameter and 7 feet long. This drum is driven at about 300 rpm, and the scanning head, following the helically-wound film, passes from one end to the other in less than a minute. The codes for more than one million documents are scanned in each second. This is about 5,000 times Rapid Selector speed. The basic principles of such a scanning head, able to do this with standard equipment, have been worked out. Selections, when made, are temporarily recorded as document or abstract numbers in an electronic or magnetic memory. The selections are made according to any simple or complex configuration of subject ideas, which can be chosen arbitrarily to suit the needs of the request at hand.
The abstract storage and reading engine is the organ which stores micrographic copies of 200-word abstracts and the citations for the documents. A single, large, square, semi-transparent sheet carries from a quarter million to a million of such abstracts. These sheets are stored in a stack, and by a mechanism like that of an automatic jukebox record changer, the different sheets are pulled out of the stack to be read by an optical copying television head. This read head, using the two coordinate positions of the wanted abstract, finds the abstract, magnifies it, and electrically copies it into a wire circuit. Many such optical heads are working at the same time in the abstract storage engine. This abstract storage and reading engine fits nicely in an ordinary large-sized room, since the stack is only about 20 feet long.
The abstract printing stations are placed remote from the rest of the engine–at the request desk or in the mailing room for mail service. The process used is a fast dry-printing, employing either ultra-violet sensitive diazo paper, or an electro-sensitive facsimile paper. Photography (silver) and Xerography do not meet nearly as well the requirements for a fast, simple and cheap process for giving a single-copy. Presently available equipment, about the size of table radio and now on the market, can produce about ten 200-word abstracts per minute at each station. There are as many stations in the operation as there are reading heads in the reading engine. The abstracts produced are reasonably clear, and are full-sized and readable without any optical aid.
Such is the DOKEN. It can be built if there is a need for it. Part of the world’s intellectual output is already being abstracted. With cooperation, and less than 10% additional effort, this same information could be put into a DOKEN system. Perhaps this cooperative endeavour will take the pattern so well worked out by Chemical Abstracts with its large corps of volunteer abstractors, and smaller staff of central editors. If so, the cost of the world-wide documentary project could be whittled down to manageable proportions. Support could be on a subscription basis. Bibliographic searches to any request would be finished by return airmail, giving an overnight service to information users.
Smaller versions of the same instrument have a possible use in other situations, such as the whole chemical literature, the U.S. Patent Office, or the files of insurance companies. In such smaller collections, a much more complete subject coding is possible and would certainly be desirable in the case of patents.
With regional DOKENs available, company collections of information on punched cards can be enriched by the inclusion of specially selected items from DOKEN bibliographies. But these bibliographies of abstracts would generally have to be pruned, recorded, and ‘slanted’ into the particular company’s technical viewpoint in order to raise their utility up to the company’s retrieval system threshold value.
In 1959 Mooers coined “Mooers’ law”: An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it. Where an information retrieval system tends not to be used, a more capable information retrieval system may tend to be used even less.
In 1961 Mooers founded the Rockford Research Institute, where he developed the TRAC (Text Reckoning And Compiling) programming language, and attempted to control its distribution and development using trademark law and a unique invocation of copyright. (At the time patent law would not allow him to control what he saw as his intellectual property and profit from it.) Mooers was awarded the American Society for Information Science’s Award of Merit in 1978.