


































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of community detection in social networks, including definitions of communities and community detection methods. It covers various techniques such as network interaction, community detection algorithms, subjectivity of community definition, and community criteria. The document also discusses applications of community detection in areas like viral marketing and outbreak detection.
Typology: Slides
1 / 42
This page cannot be seen from the preview
Don't miss anything!
Lecture 8
Community: “subsets of actors among whom there are relatively strong, direct, intense, frequent or positive ties.” -- Wasserman and Faust, Social Network Analysis, Methods and Applications
Community is a set of actors interacting with each other frequently a.k.a. group, subgroup, module, cluster
A set of people without interaction is NOT a community e.g. people waiting for a bus at station but don’t talk to each other
Community Detection: “formalize the strong social groups based on the social network properties” a.k.a. grouping, clustering, finding cohesive subgroups
Given: a social network Output: community membership of (some) actors
Some social media sites allow people to join groups
Not all sites provide community platform Not all people join groups
Community Detection
Network interaction provides rich information about the relationship between users Is it necessary to extract groups based on network topology? Groups are implicitly formed Can complement other kinds of information Provide basic information for other tasks
Applications Understanding the interactions between people Visualizing and navigating huge networks Forming the basis for other tasks such as data mining
Classification
User Preference or Behavior can be represented as class labels
Given A social network Labels of some actors in the network
Output Labels of remaining actors in the network
Visualization after Prediction
: Smoking : Non-Smoking :? Unknown
Predictions 6: Non-Smoking 7: Non-Smoking 8: Smoking 9: Non-Smoking 10: Smoking
Viral Marketing/Outbreak Detection
Users have different social capital (or network values) within a social network, hence, how can one make best use of this information?
Viral Marketing: find out a set of users to provide coupons and promotions to influence other people in the network so benefit is maximized
Outbreak Detection: monitor a set of nodes that can help detect outbreaks or interrupt the infection spreading (e.g., H1N1 flu)
Goal: given a limited budget, how to maximize the overall benefit?
An Example of Viral Marketing
Find the coverage of the whole network of nodes with the minimum number of nodes
How to realize it – an example Basic Greedy Selection: Select the node that maximizes the utility, remove the node and then repeat
Node 7 is not a node with high centrality!
Node- Centric
Group- Centric
Network- Centric
Hierarchy- Centric
Node-Centric Community Detection
Complete Mutuality cliques Reachability of members k-clique, k-clan, k-club Nodal degrees k-plex, k-core Relative frequency of Within-Outside Ties LS sets, Lambda sets
Commonly used in traditional social network analysis
Reachability is calibrated by the Geodesic distance
Geodesic: a shortest path between two nodes (12 and 6) Two paths: 12-4-1-2-5-6, 12-10- 12-10-6 is a geodesic
Geodesic distance: #hops in geodesic between two nodes e.g., d(12, 6) = 2, d(3, 11)=
Diameter: the maximal geodesic distance for any 2 nodes in a network #hops of the longest shortest path
Diameter = 5
Any node in a group should be reachable in k hops
k-clique: a maximal subgraph in which the largest geodesic distance between any nodes <= k
A k-clique can have diameter larger than k within the subgraph e.g., 2-clique {12, 4, 10, 1, 6} Within the subgraph d(1, 6) = 3
k-club: a substructure of diameter <= k e.g., {1,2,5,6,8,9}, {12, 4, 10, 1} are 2-clubs
LS sets: Any of its proper subsets has more ties to other nodes in the group than outside the group Too strict, not reasonable for network analysis
A relaxed definition is k-component
Require the computation of edge-connectivity between any pair of nodes via minimum-cut, maximum-flow algorithm 1-component is a connected component
Each node has to satisfy certain properties Complete mutuality Reachability Nodal degrees Within-Outside Ties
Limitations: Too strict, but can be used as the core of a community Not scalable, commonly used in network analysis with small-size network Sometimes not consistent with property of large-scale networks e.g., nodal degrees for scale-free networks