Google
 

Trailing-Edge - PDP-10 Archives - decuslib10-09 - 43,50466/clustr.doc
There are 2 other files named clustr.doc in the archive. Click here to see a list.
                         WESTERN MICHIGAN UNIVERSITY
                               COMPUTER CENTER
 
LIBRARY PROGRAM #1.12.1
CALLING NAME:  CLUSTR
PROGRAMMED AND ADAPTED BY:  RUSSELL BARR III
PREPARED BY:  BILL GRANET & RUSSELL BARR III
STATISTICAL CONSULTANT:  DR. MICHAEL STOLINE
APPROVED BY:  JACK R. MEAGHER
DATE:  FEBRUARY 17, 1976
 
                          SINGLE LINK CLUSTER ANALYSIS PROGRAM
 
TABLE OF CONTENTS
 
1.0  INTRODUCTION
2.0  OPTIONS
3.0  LIMITATIONS
4.0  RESPONSES FOR QUESTIONS-INPUT? AND OUTPUT?
5.0  RESPONSES FOR THE QUESTION-FORMAT?
6.0  PROGRAM QUESTIONS AND HOW TO ANSWER THEM
7.0  SAMPLE TERMINAL AND BATCH RUNS
8.0  REFERENCES
9.0  DELTA ONE-HAT
 
1.0  INTRODUCTION
THIS PROGRAM IS A COMBINATION OF PROGRAMMING FROM COMMUNICATIONS OF THE ACM
(JUNE 1973, PAGES 355-361), COMPUTER JOURNAL (FEBRUARY 1973, PAGES 30-34), AND
SUBSTANTIAL PROGRAMMING BY MR. RUSSELL R. BARR III.
 
GIVEN M OBSERVATIONS ON N VARIABLES, OR A MATRIX WHOSE ELEMENTS ARE MEASURES
OF DEGREE OF ASSOCIATION BETWEEN EACH PAIR OF A SET OF N OBJECTS, THIS PROGRAM
GIVES INDICATIONS OF SIMILARITY WITHIN GROUPS AND DISSIMILARITY BETWEEN GROUPS.
 
THE SINGLE-LINK, OR NEAREST NEIGHBOR, METHOD OF CLUSTERING IS USED.  THE OUTPUT
OF THE CLUSTERING IS GIVEN EITHER AS A DENDROGRAM OR AS A SHADE PLOT OR BOTH AS
THE USER SPECIFIES.
 
2.0  INPUT OPTIONS AND CLUSTER LIMITS
 
2.1  RAW DATA INPUT OPTIONS
 
IF RAW DATA IS SUBMITTED, THIS PROGRAM GENERATES A MATRIX FROM THE M
OBSERVATIONS ON N VARIABLES BY ONE OF FIVE DIFFERENT METHODS SPECIFIED BY THE
USER.
 
THESE MATRICES ARE CALLED:
 
  1)  CORRELATION (PEARSON PRODUCT MOMENT)
  2)  EUCLIDEAN DISTANCE
  3)  MANHATTAN METRIC
  4)  MINKOWSKI METRIC
  5)  CHI SQUARE DISTANCE
 
THESE FORMULAS WILL BE ILLUSTRATED WITH A SMALL DATA SET.  PEARSON PRODUCT
MOMENT CALCULATIONS WILL NOT BE ILLUSTRATED AS IT IS WIDELY KNOWN.  LET THE
FOLLOWING 3 OBSERVATIONS ON 2 VARIABLES BE GIVEN:
 
  3,2
  1,1
  0,4
NOTE: (A) A**2 MEANS A*A.  (B) ABS(A) MEANS ABSOLUTE VALUE OF A.
      (C)<= MEANS LESS THAN OR EQUAL.  (D)< MEANS LESS THAN.
 
(A)  EUCLIDEAN DISTANCES MATRIX ON THE VARIABLES
 
     A(11) A(12)     0   17
                  =
     A(21) A(22)    17    0
 
     WHERE 17 = (3-2)**2 + (1-1)**2 + (0-4)**2  = 1+0+16 = A(12) = A(21)
 
     EUCLIDEAN DISTANCES MATRIX ON THE OBSERVATIONS
 
     A(11) A(12) A(13)      0  5  13
     A(21) A(22) A(23)  =   5  0  10
     A(31) A(32) A(33)     13 10   0
 
     A(12) = (3-1)**2 + (2-1)**2 = 4+1 = 5
     A(13) = (3-0)**2 + (2-4)**2 = 9+4 = 13
     A(23) = (1-0)**2 + (1-4)**2 = 1+9 = 10
 
(B)  MANHATTAN METRIC MATIRX ON THE VARIABLES
 
     A(11) A(12)     0  5
     A(21) A(22)     5  0
 
     A(21) = A(12) = 3-2 + 1-1 + 0-4 = 1+0+4 = 5
 
     MANHATTAN METRIC MATRIX ON THE OBSERVATIONS
 
     A(11) A(12) A(13)     0  3  5
     A(21) A(22) A(23)  =  3  0  4
     A(31) A(32) A(33)     5  4  0
 
     A(12) = ABS(3-1) + ABS(2-1) = (2+1) = 3
     A(13) = ABS(3-0) + ABS(2-4) = (3+2) = 5
     A(23) = ABS(1-0) + ABS(1-4) = (1+3) = 4
 
(C)  MINKOWSKI METRIC MATRIX FOR BOTH VARIABLES AND OBSERVATIONS IS A VARIATION
     OF THE MANHATTAN METRIC WHEREBY THE ABSOLUTE VALUES ARE RAISED TO A
     POWER.
 
(D)  CHI SQUARE DISTANCES MATRIX ON THE VARIABLES
 
     A(11) A(12)     0     3.65
                  =
     A(21) A(22)     3.65  0
 
              (3-E(11))**2       (1-E(12))**2    (0-E(3))**2    (2-E(21))**2
WHERE A(12) = -----------  +     ---------    +  ----------  +  -----------  +
                E(11)              E(12)           E(13)          E(21)
 
              (1-E(22))**2    (4-E(23))**2
              -----------  +  ---------
                E(22)           E(23)
 
                     (3+1+0)           5 (4)   20
WHERE E(11) = (3+2)  -------------  =  ----- = --
                     (3+1+0+2+1+4)      11     11
 
                    4    8                      4   16
      E(12) = (1+1) -- = --     E(13) = (0+4)  -- = --
                    11   11,                   11   11,
 
                     (2+1+4)       7                 7   14
      E(21) = (3+2)  ------- =  5 --  E(22) = (1+1) -- = --
                       11         11,               11   11,
 
                      7   28
      E(23) = (0+4)  -- = --
                     11   11
 
CHI SQUARE DISTANCES MATRIX ON THE OBSERVATIONS
 
     A(11) A(12) A(13)        0   .058   3.6
     A(21) A(22) A(23)  =  .058      0   2.4
     A(31) A(32) A(33)      3.6    2.4    0
 
 
             (3-E(11))**2    (2-E(12))**2    (1-E(21))**2    (1-E(22))**2
     A(12) = -----------  +  -----------  +  -----------  +  -----------
               E(11)           E(12)           E(21)           E(22)
 
                    (3+2)           5                         15
     E(11) = (3+1)  ---------  = 4  -  E(12) = (2+1) (5/7) =  --
                    (3+2+1+1)       7,                         7,
 
                    (1+1)     2                2   6
     E(21) = (3+1)  ----- = 4 -  E(22) = (2+1) - = -
                      7       7,               7   7
 
             (3-E(11))**2    (2-E(12))**2    (0-E(21))**2    (4-E(22))**2
     A(13) = -----------  +  -----------  +  -----------  +  -----------
               E(11)           E(12)           E(21)           E(22)
 
                    (3+2)          5
     E(11) = (3+0)  ---------  = 3 -  E(12) = (2+4)  (5/9) = 40/9,
                    (3+2+0+4)      9
 
                    (0+4)   12                4
     E(21) = (3+0)  ----- = --  E(22) = (2+4) - = 32/9
                      9      9,               9
 
THEREFORE    (3-15/9)**2    (2-40/9)**2    (12/9)**2    (4-32/9)**2
     A(13) = ----------  +  ----------  =  --------  +  ----------
                15/9           40/9         12/9           32/9
 
             (1-E(11))**2     (1-E(12))**2     (0-E(21))**2     (4-E(22))**2
     A(23) = ------------  +  ------------  +  ------------  +  ------------
                E(11)            E(12)            E(21)            E(22)

                    (1+1)          2                2   10
     E(11) = (1+0)  ---------  = 1 -  E(12) = (1+4) - = --
                    (1+1+0+4)      6,               6    6,
 
                    (0+4)      4                 4   20
     E(21) = (1+0)  -----  = 1 -,  E(22) = (1+4) - = --
                      6        6                 6   6
 
2.2  CLUSTER LIMITS
 
THE FOLLOWING OPTIONS DEFINE HOW THE 15 DEGREES OF SHADE ARE DEFINED FROM THE
DISTANCE MATRIX.
 
OPTION 1:  THE RANKS OF THE DISTANCES ARE SORTED AND DIVIDED INTO 15 EQUAL
           INTERVALS.
 
OPTION 2:  THE RANGE OF THE ACTUAL VALUE OF THE DISTANCES IS DIVIDED IN 15
           EQUAL INTERVALS.
 
OPTION 3:  THE USER SUPPLIES THE MINIMUM AND MAXIMUM VALUES FOR THE RANGE
           DESIRED AND THIS IS DIVIDED INTO 15 EQUAL INTERVALS.
 
OPTION 4:  THE USER ENTERS THE STARTING VALUE OF THE FIRST CLUSTER FOLLOWED
           BY THE UPPER LIMITS OF ALL 15 CLUSTERS.
 
3.0  LIMITATIONS
 
IF THE METHOD OF INPUT IS 1,2,3,4,5, (SEE SECTION 2.1) THE MAXIMUM DATA SIZE IS
 
     (A)  K*K+6K+M*N+2L <= 26,000.  K=MAX0(M,N).  L=K*(K-1)/2.  WHERE M IS THE
          NUMBER OF COLUMNS AND N IS THE NUMBER OF ROWS.
 
     (B)  IF (A) IS NOT ATTAINABLE FOR METHOD 1 BUT THE DATA NEED NOT BE
          TRANSPOSED, THE MAXIMUM IS
               M<=95(N IS NOT LIMITED).
          THIS IS USEFUL WHERE N IS LARGE WITH RESPECT TO M.
 
     (C)  IF METHOD 6 OR 7 IS USED THE MAXIMUM MATRIX IS <= 95.
 
IF THE ORDER OF THE DISTANCE MATRIX FORMED BY THE INPUT IS LARGER THAN 100,
NEITHER THE SHADE DIAGRAM NOR THE DENDROGRAM CAN BE PRINTED.  IF THE ORDER OF
THE DISTANCE MATRIX IS LARGER THAN 50, THE DENDROGRAM CANNOT BE PRINTED.
WHENEVER THE DENDROGRAM IS NOT PRINTED, THE ORDERED INPUT AND THEIR COMPUTED
DISTANCES OR HEIGHTS ARE PRINTED SO THE USER MAY CREATE HIS OWN.
 
NOTE:  THESE LIMITATIONS ARE BASED ON THE CURRENT USER CORE AREA OF 40K.
       ANY INCREASE ABOVE 40K SHOULD BE ADDED TO 26000 IN THE ABOVE FORMULA. 
4.0  RESPONSES FOR QUESTIONS-INPUT? AND OUTPUT?
 
THE RESPONSES TO THESE QUESTIONS DEFINE WHERE THE USER INTENDS TO WRITE HIS
OUTPUT FILE (OUTPUT?) AND FROM WHERE THE USER EXPECTS TO READ HIS INPUT DATA
(INPUT?).
 
THE NORMAL RESPONSE TO EACH OF THESE QUESTIONS CONSISTS OF THREE BASIC PARTS:
A DEVICE, A FILENAME, AND A PROJECT-PROGRAMMER NUMBER.
 
THE GENERAL FORMAT FOR THESE THREE PARTS IS AS FOLLOWS:
 
     DEV:FILE.EXT[PROJ,PROG]
 
1)  DEV:  ANY OF THE FOLLOWING DEVICES ARE APPROPRIATE WHERE INDICATED:
 
               DEVICE LIST            DEFINITION            STATEMENT USE
 
                  TTY:                TERMINAL              INPUT OR OUTPUT
                  DSK:                DISK                  INPUT OR OUTPUT
                  CDR:                CARD READER           INPUT ONLY
                  LPT:                LINE PRINTER          OUTPUT ONLY
                  DTA0:               DECTAPE 0             INPUT OR OUTPUT
                  DTA1:               DECTAPE 1             INPUT OR OUTPUT
                  DTA2:               DECTAPE 2             INPUT OR OUTPUT
                  DTA3:               DECTAPE 3             INPUT OR OUTPUT
                  DTA4:               DECTAPE 4             INPUT OR OUTPUT
                  DTA5:               DECTAPE 5             INPUT OR OUTPUT
                  DTA6:               DECTAPE 6             INPUT OR OUTPUT
                  DTA7:               DECTAPE 7             INPUT OR OUTPUT
                  MTA0:               MAGNETIC TAPE 0       INPUT OR OUTPUT
                  MTA1:               MAGNETIC TAPE 1       INPUT OR OUTPUT
 
DEVICES MAY BE SPECIFIED BY LOGICAL OR PHYSICAL NAMES.  THE DEVICE LIST COLUMN
HAS PHYSICAL NAMES.
 
INPUT MAY NOT BE DONE FROM THE LINE PRINTER NOR MAY OUTPUT GO TO THE CARD
READER.
 
2)  FILE.EXT IS THE NAME AND EXTENSION OF THE FILE TO BE USED.  THIS PART OF
    SPECIFICATION IS USED ONLY IF DISK OR DECTAPE IS USED.
 
3)  [PROJ,PROG]  IF A DISK IS USED AND THE USER WISHES TO READ A FILE IN
    ANOTHER PERSON'S DIRECTORY, HE MAY DO SO BY SPECIFYING THE PROJECT-
    PROGRAMMER NUMBER OF THE DIRECTORY FROM WHICH HE WISHES TO READ.  THE
    PROJECT NUMBER AND THE PROGRAMMER NUMBER MUST BE SEPARATED BY A COMMA AND
    ENCLOSED IN BRACKETS.  OUTPUT MUST GO TO YOUR OWN AREA.
 
    EXAMPLE:
              OUTPUT?  LPT:/COPIES:2
              INPUT?   DSK:DATA.DAT[71171,571026]
 
    IN THE EXAMPLE, TWO COPIES OF THE OUTPUT ARE TO BE PRINTED BY THE HIGH
    SPEED LINE PRINTER.  THE INPUT DATA IS A DISK FILE OF NAME DATA.DAT IN USER
    DIRECTORY [71171,71026].
 
    DEFAULTS:
 
    1)  IF NO DEVICE IS SPECIFIED BUT A FILENAME IS SPECIFIED, THE DEFAULT
        DEVICE WILL BE DSK:.
 
    2)  IF NO FILENAME IS SPECIFIED AND A DISK OR DECTAPE IS USED, THE DEFAULT
        ON INPUT WILL BE FROM INPUT.DAT:  ON OUTPUT IT WILL BE OUTPT.DAT.
 
    3)  IF THE PROGRAM IS RUN FROM THE TERMINAL AND NO SPECIFICATION IS GIVEN
        (JUST A CARRIAGE RETURN), BOTH INPUT AND OUTPUT DEVICES WILL BE THE
        TERMINAL.
 
    4)  IF THE PROGRAM IS RUN THROUGH BATCH AND NO SPECIFICATION IS GIVEN (A
        BLANK CARD), THE INPUT DEVICE WILL BE CDR: AND THE OUTPUT DEVICE WILL
        BE LPT:.
 
    5)  IF NO PROJECT-PROGRAMMER NUMBER IS GIVEN, THE USER'S OWN NUMBER WILL BE
        ASSUMED.
 
NOTE:  (1)  THE FOLLOWING RESPONSES TO OUTPUT? AND INPUT? ARE VALID AFTER THEIR
            FIRST OCCURRENCE.
 
            (A)  SAME
 
                 UPON RETURNING TO INPUT?, IF THE SAME DATA FILE IS TO BE USED
                 AGAIN, SIMPLY ENTER "SAME"; OTHERWISE, EITHER USE THE "FINISH"
                 OPTION OR ENTER ANOTHER FILENAME, ETC.
 
            (B)  CONTINUE
 
                 FOR MAGTAPES, THIS MEANS DO NOT REREAD THE SAME PART OF TAPE
                 AS BEFORE RATHER READ THE NEXT PART OF TAPE.  FOR DISK OR
                 DECTAPE, THE RESULT OF USING CONTINUE IS THE SAME AS USING THE
                 SAME OPTION.
 
            (C)  FINISH

                 THE USER MUST ENTER EITHER FINISH, FINI, OR  ^Z TO BRANCH OUT OF
                 THIS PROGRAM.  FAILURE TO DO SO MIGHT RESULT IN LOSING THE
                 ENTIRE OUTPUT FILE.
 
            (D)  /OUTPUT
 
                 IF ONE ENTERS THIS, THE NEXT PROMPTING WILL BE OUTPUT?  NOW
                 ONE MAY SPECIFY AN OUTPUT DEVICE AND/OR FILENAME DIFFERENT
                 FROM THE ONE USED IN RESPONSE TO THE PREVIOUS OUTPUT?.  UNLESS
                 A "CONTINUE" IS SPECIFIED FOR THE "OUTPUT?", THE PREVIOUS
                 OUTPUT FILE WILL BE CLOSED (AND PRINTED IF LPT).
 
       (2)  (FOR HELP TYPE HELP)-- WILL ALSO BE PRINTED THE FIRST TIME OUTPUT?
            AND INPUT? ARE PRINTED.  AFTER THAT, THIS MESSAGE WILL NOT BE
            PRINTED.
5.0  RESPONSES FOR THE QUESTION-FORMAT
 
THERE ARE 3 OPTIONS AVAILABLE FOR THE FORMAT; NAMELY:
 
       (A)  STANDARD FORMAT OPTION
 
            UNLESS OTHERWISE SPECIFIED, THE PROGRAM ASSUMES THE STANDARD OPTION.
            IN THIS OPTION, THE DATA IS REARRANGED IN GROUPS OF 10 PER LINE,
            TWO VALUES BEING SEPARATED BY A COMMA.
 
       (B)  OBJECT TIME FORMAT OPTION
 
            IF THE DATA IS SUCH THAT A USER'S OWN FORM IS REQUIRED, SIMPLY
            ENTER A LEFT PARENTHESIS FOLLOWED BY THE FIRST FORMAT SPECIFICATION,
            A COMMA AND THEN THE SECOND SPECIFICATION, ETC.  WHEN YOU FINISH,
            ENTER A RIGHT PARENTHESIS AND THEN A CARRIAGE RETURN.  THERE CAN BE
            A MAXIMUM OF 5 LINES FOR THE FORMAT, EACH LINE BEING 80 COLUMNS
            LONG.
 
       (C)  SAME OPTION
 
            IF THE FORMAT TO BE USED DURING THE SECOND OR SUBSEQUENT OCCURRENCE
            OF THE LINE, FORMAT (F-TYPE ONLY) IS THE SAME AS THE PREVIOUS
            FORMAT, SIMPLY ENTER "SAME<CR>".
 
6.0  PROGRAM QUESTIONS AND HOW TO ANSWER THEM
 
IN THE FOLLOWING SECTION, THE SYMBOL "<CR> MEANS PRESS THE "RETURN" BUTTON.
 
6.1  OUTPUT?
 
SEE SECTION 4.0.  THE NEXT QUESTION IS 6.2.
 
6.2  INPUT?
 
SEE SECTION 4.0.  THE NEXT QUESTION IS 6.3.
 
6.3  ENTER OUTPUT IDENTIFICATION IF DESIRED, ELSE TYPE <CR>
 
TYPE DESIRED OUTPUT IDENTIFICATION.  IF NONE IS DESIRED, SIMPLY PRESS THE
RETURN BUTTON.  THE NEXT QUESTION IS 6.4.
 
6.4  ENTER TYPE OF INPUT, TYPE <CR> FOR HELP
 
THIS QUESTION REQUESTS THE NUMBER OF THE DESIRED INPUT METHOD.  TYPING <CR>
PRINTS THE FOLLOWING EXPLANATION:
 
     TYPE:
     1  FOR RAW DATA, CORRELATION COMPUTED
     2  FOR RAW DATA, EUCLIDEAN DISTANCES COMPUTED
     3  FOR RAW DATA, MANHATTAN METRIC COMPUTED
     4  FOR RAW DATA, MINKOWSKI METRIC COMPUTED
     5  FOR RAW DATA, CHI SQUARE COMPUTED
     6  FOR LOWER TRIANGULAR MATRIX OF DISTANCES
     7  FOR LOWER TRIANGULAR MATRIX OF CORRELATION
     ENTER TYPE OF INPUT, TYPE <CR> FOR HELP
 
SEE SECTION 7.1 FOR AN EXAMPLE OF HOW TO ENTER DATA IN METHOD 6 OR 7.  WHEN
YOU HAVE RESPONDED WITH ONE OF THE METHODS ABOVE, THE NEXT QUESTION IS 6.5.
 
6.5  ENTER CLUSTER LIMITS CONTROL CODE, TYPE <CR> FOR HELP
 
THE QUESTION REQUESTS THE NUMBER OF THE METHOD YOU WOULD LIKE TO USE TO SPECIFY
THE 15 INTERVALS.  TYPING <CR> PRINTS THE FOLLOWING EXPLANATION:
 
     TYPE:
     1  TO DIVIDE THE RANKS OF DISTANCES INTO 15 EQUAL INTERVALS
     2  TO DIVIDE THE ACTUAL DISTANCES INTO 15 EQUAL INTERVALS
     3  TO SPECIFY THE INTERVAL MAX AND MIN
     4  TO ENTER THE ENDPOINTS OF THE INTERVALS
     ENTER CLUSTER LIMITS CONTROL CODE, TYPE <CR> FOR HELP
 
THESE OPTIONS ARE EXPLAINED IN SECTION 2.2.  WHEN YOU HAVE RESPONDED WITH ONE
OF THE CODES ABOVE, THE NEXT QUESTION IS 6.6.
 
6.6  ENTER METHOD(S) OF OUTPUT, TYPE <CR> FOR HELP
 
TYPING <CR> PRINTS THE FOLLOWING EXPLANATION:
 
     TYPE:
     1  FOR SHADE DIAGRAM
     2  FOR DENDROGRAM
     3  FOR BOTH
 
WHEN YOU HAVE RESPONDED WITH ONE OF THE METHODS ABOVE, THE NEXT QUESTION IS 6.7.
 
6.7  FORMAT (F-TYPE ONLY)
 
SEE SECTION 5.0.  IF YOUR RESPONSE TO QUESTION 6.4 WAS '6' OR '7', THE NEXT
QUESTION IS 6.10.  OTHERWISE, IT IS 6.8.
 
6.8  ENTER NUMBER OF ROWS AND COLUMNS (I.E., 13,6)
 
ENTER THE SIZE OF THE INPUT DATA MATRIX.  IF THE RESPONSE TO QUESTION 6.4 WAS
'4', THE NEXT QUESTION IS 6.9; OTHERWISE, GO TO LINE 6.13.
 
6.9  ENTER POWER FOR MINKOWSKI METRIC
 
SEE SECTION 2.0 FOR EXPLANATION.  IF YOU RESPONDED WITH A '3' TO QUESTION 6.5,
THE NEXT QUESTION IS 6.11.  IF YOU RESPONDED TO QUESTION 6.5 WITH A '4', THE
NEXT QUESTION IS 6.12; OTHERWISE, GO TO LINE 6.13.
 
6.10  HOW MANY VARIABLES
 
ENTER THE SIZE OR "ORDER" OF THE DISTANCE OR CORRELATION MATRIX (WHICHEVER IS
APPLICABLE).  IF YOU RESPONDED TO QUESTION 6.5 WITH A '3', THE NEXT QUESTION
IS 6.11.  IF YOU RESPONDED TO QUESTION 6.5 WITH A '4', THE NEXT QUESTION IS
6.12; OTHERWISE, GO TO LINE 6.13.
 
6.11  ENTER CLUSTER LIMITS (MIN,MAX)
 
ENTER A PAIR OF NUMBERS SEPARATED BY A COMMA REPRESENTING THE RANGE FROM THE
BOTTOM OF THE LOWEST CLUSTER TO THE TOP OF THE HIGHEST CLUSTER.  GO TO LINE
6.13.
 
6.12  ENTER END POINTS OF CLUSTERS (8 PER LINE)
 
SEE SECTION 2.2.  GO TO LINE 6.13.
 
6.13  IF "?NOT ENOUGH CORE AVAILABLE" IS PRINTED, THE JOB IS TOO LARGE AND THE
NEXT QUESTION IS 6.1.  SEE SECTION 5.0.  IF THE SINGLE MESSAGE IS "?NOT ENOUGH
CORE AVAILABLE TO KEEP DATA IN CORE-CONTINUING" IS PRINTED, THE NEXT QUESTION
IS 6.15.  IF YOU RESPONDED WITH A '6' OR '7' TO QUESTION 6.4, THE NEXT QUESTION
IS 6.15; OTHERWISE, THE NEXT QUESTION IS 6.14.
 
6.14  TYPE:
      1  FOR CLUSTERING BY VARIABLE
      2  FOR CLUSTERING BY OBSERVATION
      3  FOR BOTH
 
SEE SECTION 2.0.  IF INPUT IS FROM THE TERMINAL, THE NEXT LINE IS 6.15;
OTHERWISE, IT IS 6.16.
 
6.15  ENTER DATA
 
ENTER YOUR DATA ACCORDING TO THE FORMAT YOU SPECIFIED AT LINE 6.7.  IF YOU
SPECIFIED '2' TO QUESTION 6.6, THE NEXT QUESTION IS 6.18; OTHERWISE, IT IS 6.17.
 
6.17  DO YOU WISH TO HAVE A SHADE DIAGRAM BEFORE CLUSTERING?(YES OR NO)
 
RESPOND WITH A "YES" IF YOU WISH TO HAVE A SHADE DIAGRAM BOTH BEFORE AND AFTER
CLUSTERING.  THE NEXT QUESTION IS 6.18.
 
6.18  DO YOU WISH TO HAVE LABELS IN THE OUTPUT?(YES OR NO)
 
IF YOU WANT TO ENTER A LABEL FOR EACH COLUMN ENTER A "YES"; OTHERWISE, ENTER A
"NO".  IF YOU RESPONDED WITH A "YES" TO THIS QUESTION, THE NEXT QUESTION IS
6.19; OTHERWISE, THE NEXT QUESTION FOLLOWS THE OUTPUT AND IS 6.1.
 
6.19  ENTER THE ### LABELS, ONE PER LINE (10 CHARACTERS MAX)
 
### STANDS FOR THE NUMBER OF COLUMNS YOU ENTERED IN EITHER QUESTION 6.8 OR 6.10.
IF YOU ANSWERED "3" TO QUESTION 6.14, THIS QUESTION IS ASKED TWICE.  THE NEXT
QUESTION IS 6.1.
 
7.0  SAMPLE TERMINAL AND BATCH RUNS
 
7.1  SAMPLE TERMINAL RUN
 
IN THIS EXAMPLE INPUT IS FROM THE TERMINAL AND OUTPUT IS TO THE TERMINAL.
"<CR>" MEANS PRESS THE RETURN BUTTON.  EXCEPT FOR PROMPTINGS ALL
 INFORMATION ON THE SAME LINE WITH <CR> AND PRECEDING <CR> ARE
 ENTERED BY THE USER.  28 LINES OF DATA FOLLOWING ENTER DATA ARE
ENTERED BY USER.


.R CLUSTR<CR>

WMU SINGLE-LINK CLUSTER ANALYSIS

OUTPUT? (for help type HELP) TTY:<CR>
INPUT? (for help type HELP) TTY:<CR>

ENTER OUTPUT IDENTIFICATION IF DESIRED, ELSE TYPE <RETURN>
TERMINAL TEST<CR>

ENTER TYPE OF INPUT, TYPE <RETURN> FOR HELP
7<CR>

ENTER CLUSTER LIMITS CONTROL CODE, TYPE <RETURN> FOR HELP
3<CR>

ENTER METHOD(S) OF OUTPUT, TYPE <RETURN> FOR HELP
3<CR>

FORMAT: (F-TYPE ONLY)
(18F4.3)<CR>

HOW MANY VARIABLES? 24<CR>

ENTER CLUSTER LIMITS(MIN,MAX) 0,.75<CR>

ENTER DATA
 318
 403 317
 468 230 305
 321 285 247 227
 335 234 268 327 622
 304 157 223 335 656 722
 332 157 382 391 578 527 619
 326 195 184 325 723 714 685 532
 116 057 075 099 311 203 246 285 170
 308 150 091 110 344 353 232 300 280 484
 314 145 140 160 215 095 181 271 113 585 428
 489 239 321 327 344 309 345 395 280 408 535 512
 125 103 177 066 280 292 236 252 260 172 350 131 195
 238 131 065 127 229 251 172 175 248 154 240 173 139 370
 414 272 263 322 187 291 180 296 242 124 314 119 281 412 325
 176 005 177 187 208 273 228 255 274 289 362 278 194 341 345 324
 368 255 211 251 263 167 159 250 208 317 350 349 323 201 334 344 448
 270 112 312 137 190 251 226 274 274 190 290 110 263 206 192 258 324 358
 365 292 297 339 398 435 451 427 446 173 202 246 241 302 272 388 262 301
 167
 369 306 165 349 318 263 314 362 266 405 399 355 425 183 232 348 173 357
 331 413
 413 232 250 380 441 386 396 357 483 160 304 193 279 243 246 283 273 317
 342 463 374
 474 348 383 335 435 431 405 501 504 262 251 350 382 242 256 360 287 272
 303 509 451 503
 282 211 203 248 420 433 437 388 424 531 412 414 358 304 165 262 326 405
 374 366 448 375 434

DO YOU WISH TO HAVE A SHADE DIAGRAM
BEFORE CLUSTERING?(YES OR NO) YES<CR>

DO YOU WISH TO HAVE LABELS IN THE OUTPUT?(YES OR NO) NO<CR>
TERMINAL TEST                                                                   

SHADE DIAGRAM BEFORE CLUSTERING

		          1     H
		          2     *H
		          3     X*H
		          4     X+*H
		          5     *=++H
		          6     *+=*HH
		          7     *=+*HHH
		          8     *=XXHHHH
		          9     *==*HHHHH
		         10     ----*++==H
		         11     *---*X+==XH
		         12     *--=+-==-HXH
		         13     X+*****X=XHHH
		         14     --=-==+===*-=H
		         15     +---+===+=+=-XH
		         16     X==*====+-*-=X*H
		         17     =.==+=+===X==***H
		         18     X=+====++****+**XH
		         19     =-*-==+====-=+==*XH
		         20     X==*XXXXX=+++*=X=*=H
		         21     X*=**=*X=XXXX=+*=X*XH
		         22     X++XXXXXX=*==++==**XXH
		         23     X*X*XXXHH==*X+=X==*HXHH
		         24     =+++XXXXXHXXX*==*XXXXXXH

SYMBOLS      .   -   -   =   +   =   *   X   X   X   H   H   H   H   H   
TERMINAL TEST                                                                   

DELTA-ONE-HAT =    0.1921630E+00


SHADE DIAGRAM AFTER CLUSTERING

AT 2ND SHADE M=        24

		          2     H
		         15     -H
		         19     -=H
		          3     *-*H
		         17     .**=H
		         18     =*X+XH
		         14     -X+=*+H
		         16     =*==**XH
		         21     *+*==X=*H
		          4     +--*==-**H
		         22     ++*+=*+=XXH
		          8     ===X=+==XXXH
		          6     +========*XHH
		          7     ==+++=+=**XHHH
		          5     =+=++===*+XHHHH
		          9     =+===+=+=*XHHHHH
		         20     =====**XX*XXXXXXH
		         23     *=*X==+XX*HHXXXHHH
		          1     *+=X=X-XXXX*****XXH
		         11     -+=-X***X-*=X+*=+=*H
		         13     +-=*=*==X*=X***=+XXHH
		         10     -==-=*=-X-==++*===-XXH
		         12     -=--=*--X===-=+-+**XHHH
		         24     +=X+*X*=X+XXXXXXXX=XXHXH

SYMBOLS      .   -   -   =   +   =   *   X   X   X   H   H   H   H   H   
DENDROGRAM

CASE                                                       
NUMBER       1 1   1 1 1 1 2   2           2 2   1 1 1 1 2 
           2 5 9 3 7 8 4 6 1 4 2 8 6 7 5 9 0 3 1 1 3 0 2 4 

           I I I I I I I I I I I I I I I I I I I I I I I I
     0.277 I I I I I I I I I I I I I I -+- I I I I I I I I
           I I I I I I I I I I I I I I  I  I I I I I I I I
     0.278 I I I I I I I I I I I I -+-  I  I I I I I I I I
           I I I I I I I I I I I I  I   I  I I I I I I I I
     0.286 I I I I I I I I I I I I  --+--  I I I I I I I I
           I I I I I I I I I I I I    I    I I I I I I I I
     0.381 I I I I I I I I I I I --+---    I I I I I I I I
           I I I I I I I I I I I   I       I I I I I I I I
     0.415 I I I I I I I I I I I   I       I I I I I -+- I
           I I I I I I I I I I I   I       I I I I I  I  I
     0.465 I I I I I I I I I I I   I       I I I -+-  I  I
           I I I I I I I I I I I   I       I I I  I   I  I
     0.469 I I I I I I I I I I I   I       I I I  I   -+--
           I I I I I I I I I I I   I       I I I  I    I
     0.488 I I I I I I I I I I I   I       I I I  --+---
           I I I I I I I I I I I   I       I I I    I
     0.491 I I I I I I I I I I I   I       -+- I    I
           I I I I I I I I I I I   I        I  I    I
     0.496 I I I I I I I I I I I   ----+-----  I    I
           I I I I I I I I I I I       I       I    I
     0.497 I I I I I I I I I I ----+----       I    I
           I I I I I I I I I I     I           I    I
     0.511 I I I I I I I I I I     I           --+---
           I I I I I I I I I I     I             I
     0.526 I I I I I I I I I I     -------+-------
           I I I I I I I I I I            I
     0.532 I I I I I I I I I ------+-------
           I I I I I I I I I       I
     0.549 I I I I I I I I ----+----
           I I I I I I I I     I
     0.552 I I I I -+- I I     I
           I I I I  I  I I     I
     0.586 I I I I  I  I ---+---
           I I I I  I  I    I
     0.588 I I I I  I  --+---
           I I I I  I    I
     0.595 I I I I  --+---
           I I I I    I
     0.597 I I I --+---
           I I I   I
     0.626 I I --+--
           I I   I
     0.630 I --+--
           I   I
     0.652 --+--


INPUT? FINISH<CR>

END OF EXECUTION
CPU TIME: 2.31  ELAPSED TIME: 11.05
EXIT

 
7.2  SAMPLE BATCH SETUP
 
IN THIS EXAMPLE THE DATA IS ON DISK IN THE FILE SLINK.DAT AND THE OUTPUT IS TO
THE LINE PRINTER.  THE COMMENTS TO THE RIGHT ARE NOT PART OF THE CONTROL DECK.
 
   $JOB[###,###]                          ###,### IS REPLACED BY THE USER'S OWN
                                          PROJECT-PROGRAMMER NUMBER.
   $PASSWORD ####                         #### STANDS FOR THE USER'S PASSWORD
   .R CLUSTR                              PROGRAM NAME
   LPT:                                   OUTPUT TO LINE PRINTER
   SLINK.DAT                              INPUT FILE
   BATCH EXAMPLE                          IDENTIFICATION LINE
   7                                      INPUT IS A CORRELATION MATRIX
   1                                      INTERVAL CONTROL CODE
   3                                      TYPE OF OUTPUT
   (9F8.2)                                FORMAT
   18                                     SIZE OF MATRIX
   YES                                    SHADE DIAGRAM BEFORE CLUSTERING
   NO                                     NO LABELS
   FINISH                                 NO MORE DATA SETS
   (EOF)                                  END OF FILE CARD
 
8.0  REFERENCES
 
(A)  "MATHEMATICAL TAXONOMY" BY JARDINE AND SIBSON; PUBLISHED BY JOHN WILEY,
     1971.
 
(B)  COMMUNICATIONS OF THE ACM, JUNE 1973, PAGES 355-361.
 
(C)  THE COMPUTER JOURNAL, FEBRUARY 1973, PAGES 30-34.
 
9.0  DELTA-ONE-HAT
 
   DELTA-ONE-HAT = SUM(D(I,J)-D*(I,J))/ SUM D(I,J).  WHERE D(I,J) IS THE DISTANCE
	          (I<J)                (I<J)
BETWEEN THE ITH AND JTH OBJECT (OBSERVATION OR VARIABLE) AND D*(I,J) IS THE
SMALLEST HEIGHT SUCH THAT OBJECTS (OBSERVATIONS OR VARIABLES) I AND J ARE
INCLUDED IN A CLUSTER OF THAT HEIGHT.  DELTA-ONE-HAT SATISFIES THE RELATIONSHIP
0<= DELTA-ONE-HAT <= 1.  A SMALL   DELTA-ONE-HAT INDICATES THAT THE DATA IS AMENABLE
TO SINGLE-LINK CLASSIFICATION.  SEE THE REFERENCES (A) AND (C) FOR MORE DETAILS.