



























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Class: 510 - COLLOQ COMP HIST; Subject: HISTORY, EUROPEAN; University: Rutgers University; Term: Fall 2004;
Typology: Study notes
1 / 35
This page cannot be seen from the preview
Don't miss anything!
Cluster Analysis: An Application of Posets?
Melvin F. Janowitz DIMACS, Rutgers University Piscataway, NJ 07641
Earlier versions of this talk were given at ´Ecole Nationale Sup´erieure des T´el´ecommuniation de Bretagne on October 30, 2004, at DIMACS on March 9, 2005, at Soci´et´e Francophone de Classification on May 31, 2005, and at the IFCS meeting in July, 2006
Think about two important areas of data analysis.
Cluster Analysis : Input is attributes on finite set E (Numerical, nominal, binary). Convert to dissimilarity coefficient (DC) DC is mapping d: E ×E → <+ 0 such that
Fact: Td(h) = {{x, y} : d(x, y) ≤ h} is an equivalence relation for all h if and only if d is an ultrametric in that
d(x, y) ≤ max{d(x, z), d(y, z)} for all x, y, z ∈ E
Fact: Td(h) = {{x, y} : d(x, y) ≤ h} is an equivalence relation for all h if and only if d is an ultrametric in that
d(x, y) ≤ max{d(x, z), d(y, z)} for all x, y, z ∈ E
Cluster method is often viewed as a trans- formation of a DC to an ultrametric. One common method is called single linkage clus- tering: Transform Td(h) into its transitive closure for each h.
Formal concept analysis Input is attributes on finite set E (Numerical, nominal, binary) Convert to binary attributes. Classify E by grouping objects together on basis of shared attributes. Idea: For each set of attributes, group together the objects having these attributes,
Totally different philosophies:
Cluster analysis: Classify on basis of a summary of attributes
Formal concept analysis: Classify directly from individual attributes
Cluster algorithms
E is object set d is DC on E taking values in <+ 0. Extend d to disjoint nonempty subsets A, B Image of d is h 1 < h 2 ,... , hk. Form transitive closure R 1 of Td(h 1 ) New set is E 1 the classes of R 1.
Example: Take E as the first nine inte- gers. We wish to classify E by considering various properties that these integers might enjoy. These properties are the attributes we shall consider. Here are some we might consider o odd s perfect square p prime c perfect cube t multiple of three Objects: 1,2,3,4,5,6,7,8, Attributes: o,s,p,c,t
object o s p c t 1 1 1 0 1 0 2 0 0 1 0 0 3 1 0 1 0 1 4 0 1 0 0 0 5 1 0 1 0 0 6 0 0 0 0 1 7 1 0 1 0 0 8 0 0 0 1 0 9 1 1 0 0 1
Attributes for the first nine integers
- dsm - 1 0 0.8 0.8 0.4 0.6 0.8 0.6 0.4 0. - 2 0.8 0 0.4 0.4 0.2 0.4 0.2 0.4 0. - 3 0.8 0.4 0 0.8 0.2 0.4 0.2 0.8 0. - 4 0.4 0.4 0.8 0 0.6 0.4 0.6 0.4 0. - 5 0.6 0.2 0.2 0.6 0 0.6 0 0.6 0. - 6 0.8 0.4 0.4 0.4 0.6 0 0.6 0.4 0. - 7 0.6 0.2 0.2 0.6 0 0.6 0 0.6 0. - 8 0.4 0.4 0.8 0.4 0.6 0.4 0.6 0 0. - 9 0.4 0.8 0.4 0.4 0.6 0.4 0.6 0.8
Figure 1: Complete Linkage Clustering – Simple Matching
Summarize and then classify or Classify and then summarize. That is the difference between cluster analysis and formal concept analysis.
Summarize and then classify or Classify and then summarize. That is the difference between cluster analysis and formal concept analysis.
If we allow ourselves to have dissimilari- ties taking values in a poset with 0, we can at least postpone the process of forming a sum- mary. For example, with binary attributes, can define a DC taking values in the Boolean algebra formed by the attribute space. We illustrate with the nine integer example.
Note that each of these possibilities is already an ultrametric in that d(a, b) ≤ d(a, c) ∨ d(b, c) for all a, b, c.
d 1 1 2 3 4 5 6 7 8 9 1 00000 11110 01111 10010 01110 11011 01110 11100 00011 (^23 1111001111 0000010001 1000100000 0110011101 1000000001 0010110100 1000000001 0011010111 ) (^45 1001001110 0110010000 1110100001 0000011100 1110000000 0100110101 1110000000 0101010110 ) 6 11011 00101 10100 01001 10101 00000 10101 00011 11000 (^78 0111011100 1000000110 0000110111 1110001010 0000010110 1010100011 0000010110 1011000000 ) 9 00011 11101 01100 10001 01101 11000 01101 11011 00000 Table 1: Illustration of the d 1 coefficient for the nine integers
d 2 1 2 3 4 5 6 7 8 9 1 00101 11111 01111 10111 01111 11111 01111 11101 00111 (^23 1111101111 1101111011 1101101010 1111111111 1101101011 1111111110 1101101011 1111111111 ) (^45 1011101111 1111111011 1111101011 1011111111 1111101011 1111111111 1111101011 1111111111 ) (^67 1111101111 1111111011 1111001011 1111111111 1111101011 1111011111 1111101011 1111111111 ) (^89 1110100111 1111111111 1111101110 1111110111 1111101111 1111111110 1111101111 1110111111 )
Table 2: Illustration of the d 2 coefficient for the nine integers
Formal Concept Analysis thumbnail sketch
G, M sets with ⊥ a binary relation from G to M. Members of G objects, M attributes. Idea: a ⊥ m means object a has attribute m. A⊥^ = {m ∈ M : a ⊥ m ∀a ∈ A}. B⊥^ = {g ∈ G : b ⊥ g ∀b ∈ B}.
(A ⊆ G, B ⊆ M)