Path: cs.utk.edu!gatech!howland.reston.ans.net!swrinde!ihnp4.ucsd.edu!news.cerf.net!mdcsc!hve
From: hve@mdcsc (Henry Eggers)
Newsgroups: comp.databases.pick
Subject: Re: General PICK DB design
Date: 28 Jun 1994 21:40:48 GMT
Organization: McDonnell Information Systems (MDIS)
Lines: 132
Distribution: world
Message-ID: <2uq5d1$dh8@news.cerf.net>
References: <2ui54f$pj2@hobbes.cc.uga.edu>
Reply-To: heggers@ca.mdis.com
NNTP-Posting-Host: mdcsc.ca.mdis.com
X-Newsreader: TIN [version 1.2 PL0]

Bob Stearns (is@groucho.dev.uga.edu) wrote:

:                  One of the vendors touts its use of the a Pick
:database over a UNIX system as a significant plus due to the capabilities of
:the Pick data model...

I find this question interesting, because it partakes of the question, "How
is a notion fruitful?" and therefore essay an answer.  I also find the
question interesting because the machine is 'useful', but it's not clear
why it's useful.  Since the machine is atheoretical, there is not a
proper justification for it.  Since, at the end, it is made up of a
fortunate set of elements, different observers see different elements as
important.  I would hope that there would be views alternate to this one
offered.

At the lowest level, data is represented as strings of ASCII characters.
Substrings are delimited by a set of four delimiters derived from the
IBM 1401.  These were the segment mark, the record mark, the group mark
and the word mark.  These are now called the segment mark, the attribute
mark, the value mark and the subvalue mark.  From the stand point of
the data model, only the last three marks, attribute, value and subvalue
are used.

As a result of this, each 'object' on 'file' is a sparce three-dimensional
array of variable-length strings.

Each 'thing' on 'file' is made up of a name, called an 'item-id' and the
sparse array already met.  The 'item-id' is a string of ASCII characters,
and is unique within that 'file'.  The sparse array is referred to as an
'item'.  A 'file' is a collection of 'items', for the moment.

An 'item' is an object which is larger than a 'record' and smaller than
a 'file'.  This appears to be as important as it is unexpected.

An 'item' is (intended) to be made up of all the information about 'something'
of the sort of things found in this 'file'.   There is a normative prescription
that all of the things in a given file should be of the 'same kind'.

An 'item' derives from 'records' about an external thing (a thing in the 
real world, the modeling of which is being attempted, separate from the
computer).  Go back to cards:  Each card is a record;  each card is
made up of a series of fields.  Each field contains a particular kind of
data.  Within an 'item' each 'attribute' (the thing terminated by an
attribute mark) contains 0, 1 or more instantiations of a particular kind
of data.  Mapping a card onto an item puts the first 'field' in the 
first 'attribute', the second 'field' into the second 'attribute' and
so forth. 

Now one goes back to the card reader for the next card associated with this
thing -- let us presume that there is one -- and repeats the process, only
the contents of the first 'field' of the second card become the second
value of the first 'attribute', and so on.  The delimiter between the
multiple values within an 'attribute' are (not surprisingly) called 
value marks.

Therefore, 'value mark count', or value number, means 'record number
associated with this thing'.   If one were to define a matrix, A[i,j]
of rows, i, and columns, j, such that each 'card' is a row, then an
item is the transform of the matrix A[i,j] -> B[j,i], where each row, j,
is the jth attribute, and each column, i, is the ith value.  Thus, the
transformation of the collection of cards to a single item involves
a matrix transform on the cards after they have been collected.

In this way, the information from field one of all of the cards associated
with this thing are accumulated in attribute 1, from field 2, in attribute
2, and so forth.  The effect is that, by obtaining the 'item', one obtains
'all' of the information about this instantion of this 'thing'.

Subvalues are the replication of this process with respect to values, and
as such are harder to motivate, are used less, and will be ignored
hereinafter.

Actual addressing of an item is by the touple, file-name, item-name;  files
are defied as items in another file which defines the user's 'account'
(another inheritance).  An account is just a file which includes the
definition of all files which can be 'seen' from that location.  

This part appears to be as tautological as it is baffling:  The file
structure is _just_ that there is a file in which one finds oneself when
logged on, which contains the 'list' of all accessible files.  Of course,
the 'list' of files is instantiated as a series of items, one item per
file.

'Files' are implemented so that access by name to an item is of order 1,
that is, the time taken to retrieve an item does not increase as the number
of items in the 'file' increases (presuming the file is 'correctly
allocated', the consideration of which is a historically popular
cul-de-sac).  The traditional method has been hashed, open bucket.
'Correct allocation' involves having the 'right' number of buckets.
'Open bucket' means that bucket overflow is low cost:  just add another
page of memory.  An arbitrary number of items can be put in a single
bucket, but the 'right' number is preferable.  The 'right' number
is about enough to 'fill up' a page of the underlying memory system.

It has been de facto convenient that, historically, the machine viewed all
attached disks as a single, flat, paged virtual address space.  While
there are numerical 'limitations' in any real instantiation, they have
not been major problems.

Thus far we have:  Since all of the data is delimited ASCII strings,
the data is typeless;  Since all of the files are hashed, open bucket,
the files are typeless;  Since all of the disks are part of single,
monolitic virtual memory system, there is neither 'memory', nor are there
'disks'.  (I note in passing that there was a historical 1 to 1
correspondence between processes and terminals, so terminals weren't
visible, either).

The first effect of this is that most of the complexity of dealing with
a 'computer' is done away with.   There is little to no fiddling with
the machine which can be done.  At the same time, the 'data' on the
machine looks a lot like the data in the real world which it represents.

Add to this a selection and report generating language which allows
fairly rapid, arbitrary and capricious data retrieval, and a proceedural
language with the necessary data language intrinsics to handle the
data structure embedded in an 'item' easily and naturally, and you
have a machine on which people who know little to nothing about
computers are able to build applications which 'mirror' their business
fairly accurately and rapidly, and which they are able to modify
very rapidly.

The bottom line appears to be that the data objects are relatively more
congruent to the complexity and sloppiness of the real world than those
which derive from either computer hardware optimizing data bases or
from theoretically 'interesting' data bases.

(So, then, lads, let's have a good scrum about it.  Who's for taking
on the topic, "These are the first persistant objects?")

--
Henry V. Eggers         |  Opinions are
heggers@ca.mdis.com     |  solely mine.
Path: cs.utk.edu!emory!swrinde!ihnp4.ucsd.edu!news.cerf.net!mdcsc!hve
From: hve@mdcsc (Henry Eggers)
Newsgroups: comp.databases.pick
Subject: Re: Large Basic programs in A/P
Date: 28 Jun 1994 22:28:10 GMT
Organization: McDonnell Information Systems (MDIS)
Lines: 36
Message-ID: <2uq85r$et8@news.cerf.net>
References: <2t77hj$ae2@crl.crl.com> <johnlomCrFB0E.Fwp@netcom.com> <2ul2gs$lbp@werple.apana.org.au>
Reply-To: heggers@ca.mdis.com
NNTP-Posting-Host: mdcsc.ca.mdis.com
X-Newsreader: TIN [version 1.2 PL0]

David Rose (dave@werple.apana.org.au) wrote:
: johnlom@netcom.com (John Lombardo) writes:

: >GOTO and GOSUB statements are typically implemented as an index from 
: >the program counter's current location.
: correct !

: >  GOTO MyLabel  would become PC += (MyLabel-PC) in pseudo-object
: becomes PC += Displacement, where displacement is positive for a foward jump
: and negative for a backward jump. Displacement is a two byte signed number,
: therefor the maximum distance you can jump is +/- 32k (sound familiar). This
: is becuase the program counter is an address register and the displacement
: is added to the address register to find the next instruction.
Correct.
: This was changed by chandru murthi at ultimate to change the distance to some
: thing like +/- 58k.
None of this was an issue until basic programs could exceed 32K on R83, which,
I note in passing, allowed 64K of object, and checked to make sure that none
of the jumps exceeded 32k, thus avoiding the problem sited next.

: >There is an implicit limit on the length of a jump depending on the number of
: [... cut ...]
: >When the program blows up by starting to print BANANA over and over (probably
: >not), or blows up with and a basic runtime error (much more likely), you know
: >you've found a limit.
The problem remains that, given the opportunity, a part of the population will
write single programs which represent everything.  The other part of the
problem is that the 'environment' hasn't taken on the matter of inter-
modular efficiency well enough to encourage reasonable software engineering.

Please, someone tell me that I'm wrong, and that it's been attended to by
someone.

--
Henry V. Eggers         |  Opinions are
heggers@ca.mdis.com     |  solely mine.