In: Computer Science
There are several typical cube computation methods such as Multi-Way, BUC, and Star-cubing. Briefly describe each one of these methods outlining the key points.
1).ANSWER :
GIVENTHAT :
Data Cube Computation Methods
1. Multi-Way Array Aggregation for Cube Computation (MOLAP)
2. BUC (Bottom-Up Computation)
3. Star-Cubing
4. High-Dimensional OLAP
Multi-Way Array Aggregation
1. Array-based bottom-up algorithm
2. Using multi-dimensional chunks
3. No direct tuple comparisons
4. Simultaneous aggregation on multiple dimensions
5. Intermediate aggregate values are re-used for computing ancestor cuboids
6. Cannot do Apriori pruning: No iceberg optimization
7. Partition arrays into chunks (a small subcube which inserts in memory).
8. Compressed sparse array addressing: (chunk_id, offset)
9. the simplest traversing order to try to to multi-way aggregation is that the one that minimizes the memory requirement and reduced I/Os
10. Method: the planes should be sorted and computed consistent with their size in ascending order
11. Idea: keep the littlest plane within the main memory, fetch and compute just one chunk at a time for the most important plane
12. Limitation of the method: computing well just for alittle number of dimensions
13. If there are an outsized number of dimensions, “top-down” computation and iceberg cube computation methods are often explored.
Bottom-Up Computation
1. Divides dimensions into partitions and facilitates iceberg pruning
a. If a partition doesn't satisfy min_sup, its descendants are often pruned
b. If minsup = 1 Þ compute full CUBE!
2. No simultaneous aggregation
3. Usually, entire data set can’t slot in main memory
4. Sort distinct values
5. partition into blocks that fit
6. Continue processing
7. Optimizations
8. Partitioning
9. External Sorting, Hashing, Counting Sort
10. Ordering dimensions to encourage pruning
11. Cardinality, Skew, Correlation
12. Collapsing duplicates
13. Can’t do holistic aggregates anymore!
Star-Cubing
1. Explore shared dimensions
2. Allows for shared computations
3. Aggregate during a top-down manner but with the bottom-up sub-layer underneath
4. Shared dimensions grow in bottom-up fashion
5. Anti-monotonic property of shared dimensions
6. Intuition: if we'll compute the shared dimensions before the actual cuboid, we'll use them to undertake to to Apriori pruning
7. Problem: the way to prune while still aggregate simultaneously on multiple dimensions?
8. Use a tree structure almost like H-tree to represent cuboids
9. Collapses common prefixes to save lots of memory
10. Keep count at node
11. Traverse the tree to retrieve a specific tuple
14. Solution: Replace such attributes by a *. Such attributes are star attributes, and therefore the corresponding nodes within the cell tree are star nodes
15. Suppose minsup = 2
16. Perform one-dimensional aggregation. Replace attribute values for which count < 2 with *. And collapse all *’s together
17. Resulting table has all such attributes replaced with the star-attribute
18. With regards to the iceberg computation, this new table may be a lossless compression of the first table
19. Given the new compressed table, it's possible to construct the corresponding cell tree—called star tree
20. Keep a star table at the side for straightforward lookup of star attributes
21. The star tree may be a lossless compression of the first cell tree
High-Dimensional OLAP
1. None of the previous cubing method can handle high dimensionality!
2. Challenge to current cubing methods:
a. The “curse of dimensionality’’ problem
b. Iceberg cube and compressed cubes: only delay the inevitable explosion
c. Full materialization: still significant overhead in accessing results on disk
3. High-D OLAP is required in applications
a. Science and engineering analysis
b. Bio-data analysis: thousands of genes
c. Statistical surveys: many variables
4. Observation: OLAP occurs only on alittle subset of dimensions at a time
5. Semi-Online Computational Model
a. Partition the set of dimensions into shell fragments
b. Calculate the data cubes for every shell fragment while retaining inverted indices or value-list indices
c. Given the pre-computed fragment cubes, dynamically compute cube cells of the high-dimensional data cube online