








































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Professor: Kosa; Class: Internet Algorithmics; Subject: CSC Computer Science; University: Tennessee Tech University; Term: Spring 2009;
Typology: Study notes
1 / 48
This page cannot be seen from the preview
Don't miss anything!
2009-4-20 1
Index Construction and Maintenance
Metadata Searching
Ranking
Nicholas Lester, Alistair Moffat, and Justin Zobel
CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management.
Efficient On-line Index Maintenance for Dynamic Text Collections
Hongbo Xu, and Bin Wang.
CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management.
Contiguous on-disk posting lists maximize query processing performance, but require frequent relocations of most lists in the index Non-contiguous on-disk posting lists increase index maintenance performance, but decrease query processing performance
This strategy maximizes the query processing performance, however for every merge event the entire index must be processed.
How best to manage the sequence of mergings so as to minimize the total merging cost, without allowing the number of partitions to grow excessively.
If a bufferload contains b pointers, the first partial index cannot exceed (r-1)b pointers, where k is a parameter. In general, the kth partial index is no more than (r-1)rk-1b pointers. At level k, the partition is either empty, or contains at least rk-1b pointers_._
The merging pattern established when r = 3. After nine bufferloads have been generated by
the in-memory part of the index process, the first index is placed into partition 3. All numbers
listed represent multiples of b , the size of each bufferload.
Fix the number of partitions p , and determine r accordingly.
Querying performance vs. construction performance.
A large number of posting lists of term have to be decompressed and
The complexity of index maintenance is largely increased.
Each node of which is a sub-index; The nodes of the tree is divided into H layers; At layer k, the number of nodes is either zero, or is less than m (m
= 2 ); The node in layer k+1 is roughly c times bigger than the node in layer k; Any two nodes in each layer are not significantly in size. If a DBT is balanced and sub-index merge operation is only performed on one layer. When a tree is unbalanced, a node is pushed down to a lower layer to make the tree to be balanced.
Garbage Collection: a bitmap is used to identify the deleted documents and filter out those deleted documents at query processing time. A threshold p is checked to determine whether a garbage collection is integrated into the merging process or not.