Path: cs.utk.edu!gatech!howland.reston.ans.net!swrinde!ihnp4.ucsd.edu!news.cerf.net!mdcsc!hve From: hve@mdcsc (Henry Eggers) Newsgroups: comp.databases.pick Subject: Re: General PICK DB design Date: 28 Jun 1994 21:40:48 GMT Organization: McDonnell Information Systems (MDIS) Lines: 132 Distribution: world Message-ID: <2uq5d1$dh8@news.cerf.net> References: <2ui54f$pj2@hobbes.cc.uga.edu> Reply-To: heggers@ca.mdis.com NNTP-Posting-Host: mdcsc.ca.mdis.com X-Newsreader: TIN [version 1.2 PL0] Bob Stearns (is@groucho.dev.uga.edu) wrote: : One of the vendors touts its use of the a Pick :database over a UNIX system as a significant plus due to the capabilities of :the Pick data model... I find this question interesting, because it partakes of the question, "How is a notion fruitful?" and therefore essay an answer. I also find the question interesting because the machine is 'useful', but it's not clear why it's useful. Since the machine is atheoretical, there is not a proper justification for it. Since, at the end, it is made up of a fortunate set of elements, different observers see different elements as important. I would hope that there would be views alternate to this one offered. At the lowest level, data is represented as strings of ASCII characters. Substrings are delimited by a set of four delimiters derived from the IBM 1401. These were the segment mark, the record mark, the group mark and the word mark. These are now called the segment mark, the attribute mark, the value mark and the subvalue mark. From the stand point of the data model, only the last three marks, attribute, value and subvalue are used. As a result of this, each 'object' on 'file' is a sparce three-dimensional array of variable-length strings. Each 'thing' on 'file' is made up of a name, called an 'item-id' and the sparse array already met. The 'item-id' is a string of ASCII characters, and is unique within that 'file'. The sparse array is referred to as an 'item'. A 'file' is a collection of 'items', for the moment. An 'item' is an object which is larger than a 'record' and smaller than a 'file'. This appears to be as important as it is unexpected. An 'item' is (intended) to be made up of all the information about 'something' of the sort of things found in this 'file'. There is a normative prescription that all of the things in a given file should be of the 'same kind'. An 'item' derives from 'records' about an external thing (a thing in the real world, the modeling of which is being attempted, separate from the computer). Go back to cards: Each card is a record; each card is made up of a series of fields. Each field contains a particular kind of data. Within an 'item' each 'attribute' (the thing terminated by an attribute mark) contains 0, 1 or more instantiations of a particular kind of data. Mapping a card onto an item puts the first 'field' in the first 'attribute', the second 'field' into the second 'attribute' and so forth. Now one goes back to the card reader for the next card associated with this thing -- let us presume that there is one -- and repeats the process, only the contents of the first 'field' of the second card become the second value of the first 'attribute', and so on. The delimiter between the multiple values within an 'attribute' are (not surprisingly) called value marks. Therefore, 'value mark count', or value number, means 'record number associated with this thing'. If one were to define a matrix, A[i,j] of rows, i, and columns, j, such that each 'card' is a row, then an item is the transform of the matrix A[i,j] -> B[j,i], where each row, j, is the jth attribute, and each column, i, is the ith value. Thus, the transformation of the collection of cards to a single item involves a matrix transform on the cards after they have been collected. In this way, the information from field one of all of the cards associated with this thing are accumulated in attribute 1, from field 2, in attribute 2, and so forth. The effect is that, by obtaining the 'item', one obtains 'all' of the information about this instantion of this 'thing'. Subvalues are the replication of this process with respect to values, and as such are harder to motivate, are used less, and will be ignored hereinafter. Actual addressing of an item is by the touple, file-name, item-name; files are defied as items in another file which defines the user's 'account' (another inheritance). An account is just a file which includes the definition of all files which can be 'seen' from that location. This part appears to be as tautological as it is baffling: The file structure is _just_ that there is a file in which one finds oneself when logged on, which contains the 'list' of all accessible files. Of course, the 'list' of files is instantiated as a series of items, one item per file. 'Files' are implemented so that access by name to an item is of order 1, that is, the time taken to retrieve an item does not increase as the number of items in the 'file' increases (presuming the file is 'correctly allocated', the consideration of which is a historically popular cul-de-sac). The traditional method has been hashed, open bucket. 'Correct allocation' involves having the 'right' number of buckets. 'Open bucket' means that bucket overflow is low cost: just add another page of memory. An arbitrary number of items can be put in a single bucket, but the 'right' number is preferable. The 'right' number is about enough to 'fill up' a page of the underlying memory system. It has been de facto convenient that, historically, the machine viewed all attached disks as a single, flat, paged virtual address space. While there are numerical 'limitations' in any real instantiation, they have not been major problems. Thus far we have: Since all of the data is delimited ASCII strings, the data is typeless; Since all of the files are hashed, open bucket, the files are typeless; Since all of the disks are part of single, monolitic virtual memory system, there is neither 'memory', nor are there 'disks'. (I note in passing that there was a historical 1 to 1 correspondence between processes and terminals, so terminals weren't visible, either). The first effect of this is that most of the complexity of dealing with a 'computer' is done away with. There is little to no fiddling with the machine which can be done. At the same time, the 'data' on the machine looks a lot like the data in the real world which it represents. Add to this a selection and report generating language which allows fairly rapid, arbitrary and capricious data retrieval, and a proceedural language with the necessary data language intrinsics to handle the data structure embedded in an 'item' easily and naturally, and you have a machine on which people who know little to nothing about computers are able to build applications which 'mirror' their business fairly accurately and rapidly, and which they are able to modify very rapidly. The bottom line appears to be that the data objects are relatively more congruent to the complexity and sloppiness of the real world than those which derive from either computer hardware optimizing data bases or from theoretically 'interesting' data bases. (So, then, lads, let's have a good scrum about it. Who's for taking on the topic, "These are the first persistant objects?") -- Henry V. Eggers | Opinions are heggers@ca.mdis.com | solely mine. Path: cs.utk.edu!emory!swrinde!ihnp4.ucsd.edu!news.cerf.net!mdcsc!hve From: hve@mdcsc (Henry Eggers) Newsgroups: comp.databases.pick Subject: Re: Large Basic programs in A/P Date: 28 Jun 1994 22:28:10 GMT Organization: McDonnell Information Systems (MDIS) Lines: 36 Message-ID: <2uq85r$et8@news.cerf.net> References: <2t77hj$ae2@crl.crl.com> <2ul2gs$lbp@werple.apana.org.au> Reply-To: heggers@ca.mdis.com NNTP-Posting-Host: mdcsc.ca.mdis.com X-Newsreader: TIN [version 1.2 PL0] David Rose (dave@werple.apana.org.au) wrote: : johnlom@netcom.com (John Lombardo) writes: : >GOTO and GOSUB statements are typically implemented as an index from : >the program counter's current location. : correct ! : > GOTO MyLabel would become PC += (MyLabel-PC) in pseudo-object : becomes PC += Displacement, where displacement is positive for a foward jump : and negative for a backward jump. Displacement is a two byte signed number, : therefor the maximum distance you can jump is +/- 32k (sound familiar). This : is becuase the program counter is an address register and the displacement : is added to the address register to find the next instruction. Correct. : This was changed by chandru murthi at ultimate to change the distance to some : thing like +/- 58k. None of this was an issue until basic programs could exceed 32K on R83, which, I note in passing, allowed 64K of object, and checked to make sure that none of the jumps exceeded 32k, thus avoiding the problem sited next. : >There is an implicit limit on the length of a jump depending on the number of : [... cut ...] : >When the program blows up by starting to print BANANA over and over (probably : >not), or blows up with and a basic runtime error (much more likely), you know : >you've found a limit. The problem remains that, given the opportunity, a part of the population will write single programs which represent everything. The other part of the problem is that the 'environment' hasn't taken on the matter of inter- modular efficiency well enough to encourage reasonable software engineering. Please, someone tell me that I'm wrong, and that it's been attended to by someone. -- Henry V. Eggers | Opinions are heggers@ca.mdis.com | solely mine.