Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

An Application of Posets in Cluster Analysis - Lecture Slides | 510 511, Study notes of World History

Material Type: Notes; Class: 510 - COLLOQ COMP HIST; Subject: HISTORY, EUROPEAN; University: Rutgers University; Term: Fall 2004;

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-e6i-2
koofers-user-e6i-2 🇺🇸

10 documents

1 / 35

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Cluster Analysis:
An Application of Posets?
Melvin F. Janowitz
DIMACS, Rutgers University
Piscataway, NJ 07641
Earlier versions of this talk were given at ´
Ecole Nationale
Sup´
erieure des T´
el´
ecommuniation de Bretagne on October 30,
2004, at DIMACS on March 9, 2005, at Soci´
et´
e Francophone
de Classification on May 31, 2005, and at the IFCS meeting in
July, 2006
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23

Partial preview of the text

Download An Application of Posets in Cluster Analysis - Lecture Slides | 510 511 and more Study notes World History in PDF only on Docsity!

Cluster Analysis: An Application of Posets?

Melvin F. Janowitz DIMACS, Rutgers University Piscataway, NJ 07641

Earlier versions of this talk were given at ´Ecole Nationale Sup´erieure des T´el´ecommuniation de Bretagne on October 30, 2004, at DIMACS on March 9, 2005, at Soci´et´e Francophone de Classification on May 31, 2005, and at the IFCS meeting in July, 2006

Think about two important areas of data analysis.

Cluster Analysis : Input is attributes on finite set E (Numerical, nominal, binary). Convert to dissimilarity coefficient (DC) DC is mapping d: E ×E → <+ 0 such that

  • d(x, y) = d(y, x).
  • d(x, x) = 0 for all x.

Fact: Td(h) = {{x, y} : d(x, y) ≤ h} is an equivalence relation for all h if and only if d is an ultrametric in that

d(x, y) ≤ max{d(x, z), d(y, z)} for all x, y, z ∈ E

Fact: Td(h) = {{x, y} : d(x, y) ≤ h} is an equivalence relation for all h if and only if d is an ultrametric in that

d(x, y) ≤ max{d(x, z), d(y, z)} for all x, y, z ∈ E

Cluster method is often viewed as a trans- formation of a DC to an ultrametric. One common method is called single linkage clus- tering: Transform Td(h) into its transitive closure for each h.

Formal concept analysis Input is attributes on finite set E (Numerical, nominal, binary) Convert to binary attributes. Classify E by grouping objects together on basis of shared attributes. Idea: For each set of attributes, group together the objects having these attributes,

Totally different philosophies:

Cluster analysis: Classify on basis of a summary of attributes

Formal concept analysis: Classify directly from individual attributes

Cluster algorithms

E is object set d is DC on E taking values in <+ 0. Extend d to disjoint nonempty subsets A, B Image of d is h 1 < h 2 ,... , hk. Form transitive closure R 1 of Td(h 1 ) New set is E 1 the classes of R 1.

Example: Take E as the first nine inte- gers. We wish to classify E by considering various properties that these integers might enjoy. These properties are the attributes we shall consider. Here are some we might consider o odd s perfect square p prime c perfect cube t multiple of three Objects: 1,2,3,4,5,6,7,8, Attributes: o,s,p,c,t

object o s p c t 1 1 1 0 1 0 2 0 0 1 0 0 3 1 0 1 0 1 4 0 1 0 0 0 5 1 0 1 0 0 6 0 0 0 0 1 7 1 0 1 0 0 8 0 0 0 1 0 9 1 1 0 0 1

Attributes for the first nine integers

Simple Matching Coefficient

 - dsm - 1 0 0.8 0.8 0.4 0.6 0.8 0.6 0.4 0. - 2 0.8 0 0.4 0.4 0.2 0.4 0.2 0.4 0. - 3 0.8 0.4 0 0.8 0.2 0.4 0.2 0.8 0. - 4 0.4 0.4 0.8 0 0.6 0.4 0.6 0.4 0. - 5 0.6 0.2 0.2 0.6 0 0.6 0 0.6 0. - 6 0.8 0.4 0.4 0.4 0.6 0 0.6 0.4 0. - 7 0.6 0.2 0.2 0.6 0 0.6 0 0.6 0. - 8 0.4 0.4 0.8 0.4 0.6 0.4 0.6 0 0. - 9 0.4 0.8 0.4 0.4 0.6 0.4 0.6 0.8 
  • dJ
    • 1 0 1 0.8 0.67 0.75 1 0.75 0.67 0.
    • 2 1 0 0.67 1 0.5 1 0.5
    • 3 0.8 0.67 0 1 0.33 0.67 0.33 1 0.
    • 4 0.67 1 1 0 1 1 1 1 0.
    • 5 0.75 0.5 0.33 1 0 1 0 1 0.
    • 6 1 1 0.67 1 1 0 1 1 0.
    • 7 0.75 0.5 0.33 1 0 1 0 1 0.
    • 8 0.67
    • 9 0.5 1 0.5 0.67 0.75 0.67 0.75

Figure 1: Complete Linkage Clustering – Simple Matching

Summarize and then classify or Classify and then summarize. That is the difference between cluster analysis and formal concept analysis.

Summarize and then classify or Classify and then summarize. That is the difference between cluster analysis and formal concept analysis.

If we allow ourselves to have dissimilari- ties taking values in a poset with 0, we can at least postpone the process of forming a sum- mary. For example, with binary attributes, can define a DC taking values in the Boolean algebra formed by the attribute space. We illustrate with the nine integer example.

Note that each of these possibilities is already an ultrametric in that d(a, b) ≤ d(a, c) ∨ d(b, c) for all a, b, c.

d 1 1 2 3 4 5 6 7 8 9 1 00000 11110 01111 10010 01110 11011 01110 11100 00011 (^23 1111001111 0000010001 1000100000 0110011101 1000000001 0010110100 1000000001 0011010111 ) (^45 1001001110 0110010000 1110100001 0000011100 1110000000 0100110101 1110000000 0101010110 ) 6 11011 00101 10100 01001 10101 00000 10101 00011 11000 (^78 0111011100 1000000110 0000110111 1110001010 0000010110 1010100011 0000010110 1011000000 ) 9 00011 11101 01100 10001 01101 11000 01101 11011 00000 Table 1: Illustration of the d 1 coefficient for the nine integers

d 2 1 2 3 4 5 6 7 8 9 1 00101 11111 01111 10111 01111 11111 01111 11101 00111 (^23 1111101111 1101111011 1101101010 1111111111 1101101011 1111111110 1101101011 1111111111 ) (^45 1011101111 1111111011 1111101011 1011111111 1111101011 1111111111 1111101011 1111111111 ) (^67 1111101111 1111111011 1111001011 1111111111 1111101011 1111011111 1111101011 1111111111 ) (^89 1110100111 1111111111 1111101110 1111110111 1111101111 1111111110 1111101111 1110111111 )

Table 2: Illustration of the d 2 coefficient for the nine integers

Formal Concept Analysis thumbnail sketch

G, M sets with ⊥ a binary relation from G to M. Members of G objects, M attributes. Idea: a ⊥ m means object a has attribute m. A⊥^ = {m ∈ M : a ⊥ m ∀a ∈ A}. B⊥^ = {g ∈ G : b ⊥ g ∀b ∈ B}.

(A ⊆ G, B ⊆ M)