





























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The key points in these lecture slides and the complex network are given in the following list:Graph Data Mining Two, Sequence, Rna, Compounds, Texts, Graph Pattern Mining, Mining Frequent Subgraph Patterns, Graph Indexing, Graph Similarity Search, Graph Classification
Typology: Slides
1 / 37
This page cannot be seen from the preview
Don't miss anything!
Lecture 11:
Graph Data Mining
Graph Data Mining
DNA sequence
Outline
Mining Frequent Subgraph Patterns Graph Indexing Graph Similarity Search
Graph pattern-based approach Machine Learning approaches
Link-density-based approach
5
Graph Pattern Mining
Frequent subgraphs
A (sub)graph is frequent if its support (occurrence frequency) in a given dataset is no less than a minimum support threshold
Support of a graph g is defined as the percentage of graphs in G which have g as subgraph
Applications of graph pattern mining
Mining biochemical structures Program control flow analysis Mining XML structures or Web communities Building blocks for graph classification, clustering, compression, comparison, and correlation analysis
7
Example
GRAPH DATASET
FREQUENT PATTERNS (MIN SUPPORT IS 2)
8
Graph Mining Algorithms
Apriori-based approach Pattern-growth approach
10
Apriori-Based Approach
…
G
G 1
G 2
Gn
k-edge
(k+1)-edge
G’
G’’
Join Prune check the frequency of each candidate
G 1
Gn
Subgraph isomorphism test NP-complete
11
AGM (Inokuchi, et al.) generates new graphs with one more node
Methodology: breadth-search, joining two graphs
FSG (Kuramochi and Karypis) generates new graphs with one more edge
13
If a graph is frequent, all of its subgraphs are frequent
the Apriori property
An n -edge frequent graph may have 2 n^ subgraphs
Among 422 chemical compounds which are confirmed to be active in an AIDS antiviral screen dataset, there are 1,000,000 frequent graph patterns if the minimum support is 5%
Closed Frequent Graphs
A frequent graph G is closed
if there exists no supergraph of G that carries the same support as G
If some of G’s subgraphs have the same support
it is unnecessary to output these subgraphs nonclosed graphs
Lossless compression
Still ensures that the mining result is complete
16
Scalability Issue
Naïve solution
Sequential scan (Disk I/O) Subgraph isomorphism test (NP-complete)
Problem: Scalability is a big issue
17
Indexing Strategy
Graph (G)
Substructure
Query graph (Q)
If graph G contains query graph Q, G should contain any substructure of Q
Remarks Index substructures of a query graph to prune graphs that do not contain these substructures
19
Why Frequent Structures?
We cannot index (or even search) all of substructures
Large structures will likely be indexed well by their substructures
Size-increasing support threshold
support
minimum support threshold
size
20
Structure Similarity Search
(a) caffeine (b) diurobromine (c) sildenafil