In: Computer Science
Why would log-structured merge trees be inappropriate (bad) for an application like git that manages source code? Explain
- In application like Git, we need to make frequent updates to
the storage structure without having to have a conflict from ONE
update to another.
- There can be multiple updates in Git at the same time for
different codebases / branches.
Basic of
log-structured merge trees
- In this data
structure, the files / storage system are indexed. If we want to
refer to any point of data or any file, we can do it by using the
index of that.
- The index keys are stored in seperate file. Whenever we want to
refer to the data, linear search to the file happens for the index
to the data item. (Point B)
Why log-structured
merge(LSM) will be bad of application like Git
In git, files might grow at huge volume based on the
project scope. We might say, files can increase
exponentially.
To resolve Point B mentioned above, whenever number of files will
increase to a defined limit, we will make the pair of files and
then merge those files in one file. This is done to reduce the
number of indexes to half and make the index file searching faster
(file size is halved). But, point to note here is that while we
reduced one trade off , we increased one. The file size has now become
double. Here is the issue. When the files size keep on
increasing (doubling) in each merge, the subsequent merge operation
will take much longer time. And in application like Git, there is a
huge volume of new data every day.
Thus, merging two files with a lot of data will take a lot of time
and might result in latency. So, using LSM is a bad option in
application like Git.
Kindly upvote if this
helped.