Question

In: Computer Science

why is it so challenging to capture data changes as part of an extract, transform, and...

why is it so challenging to capture data changes as part of an extract, transform, and load (ETL) process? WHAT ARE SOME METHODS THAT CAN BE USED TO CAPTURE INCREMENTAL DATA CHANGES And are there any drawbacks to these methods?

(sorry for the caps)

Solutions

Expert Solution

=> The more data there is, the more complicated the replication becomes, because new data is constantly being added, and existing data is constantly changing.

Methods To Capture Incremental Data Changes

There are a few different methods ::

1. Triggers

2. DATE_MODIFIED

3. Log-Based Change Data Capture

Triggers

=> Database triggers can be used to detect data changes.

=> Firing the trigger, and storing the row changes in a shadow table, introduces overhead.

=> The lower-overhead alternative to only store the primary key of the table requires a join back to the source table to retrieve the changes which

-  increases the load to retrieve the changes

- loses intermediate changes if multiple changes took place on the same row

=> Also, if changes are made to tables then triggers and shadow tables may also have to be modified, recreated and/or recompiled which introduces extra overhead to manage and maintain the database.

DATE_MODIFIED

=> Filter should run on the DATE_MODIFIED column to only retrieve rows that were modified since the most recent time data was extracted.

=> To apply this method DATE_MODIFIED must be available on all tables and must be reliably set.

=> Database triggers may be a good way to set the values but these may introduce overhead on the transactional application.

=> DATE_MODIFIED may be indexed to lower the impact of the select statement at the cost of storing and continuously updating the additional index.

Log-Based Change Data Capture

=> The biggest benefit of log-based change data capture is the asynchronous nature. Changes are captured independent of the source application performing the changes.

=> Log-based Change Data Capture is generally considered the superior approach to change data capture that can be applied to all possible scenarios including systems with extremely high transaction volumes.

Disadvantages::

=> Interpreting the changes in the transaction log is difficult because there are no documented standards on how the changes are stored.

=> Database vendors may not provide an interface to the transaction logs – documented or not – and even if there is one it may be relatively slow and/or resource intensive.


Related Solutions

is econ a necessary part of life and if so, then why?
is econ a necessary part of life and if so, then why?
Why is Fourier transform an excellent method for data collection in the IR range but not...
Why is Fourier transform an excellent method for data collection in the IR range but not as useful for spectroscopy in the UV range? b) What form does the raw data take in FT methods, i.e. what is on the x and y axes? c) How is the raw data converted into the IR spectrum?
Why is organizational change so challenging? How can new leaders make change easier for organizations?
Why is organizational change so challenging? How can new leaders make change easier for organizations?
Part I: transformCase(original) and transformCases(cases) * * Write functions to transform COVID case data into a...
Part I: transformCase(original) and transformCases(cases) * * Write functions to transform COVID case data into a new Object format. * * The `transformCase(original)` function takes an Object like so: * * { * "Age Group": "20 to 29 Years", * "Neighbourhood Name": "Humewood-Cedarvale", * "Outcome": "ACTIVE", * "Client Gender": "FEMALE", * "Classification": "CONFIRMED", * "FSA": "M6C", * "Currently Hospitalized": "No", * "Episode Date": "2020-09-11", * "Assigned_ID": 17704, * "Outbreak Associated": "Sporadic", * "Ever Intubated": "No", * "Reported Date": "2020-09-18", *...
create a grid that identifies examples of data capture and data entry and how it promotes...
create a grid that identifies examples of data capture and data entry and how it promotes data integrity.
In the Milgram Obedience Study, a part of why there was so much obedience to authority...
In the Milgram Obedience Study, a part of why there was so much obedience to authority was because of the trust the participant had in the Experiment. Give 3 reasons why they trusted him so much.
Here is another article for you to discuss. It's a little challenging, so you'll need to...
Here is another article for you to discuss. It's a little challenging, so you'll need to read it carefully. Post your reactions to this thread. What do you think the author is arguing here? Does he make good arguments? Why or why not? And do you agree with him? New York Times July 23, 2006 Conspiracy Theories 101 By STANLEY FISH Kevin Barrett, a lecturer at the University of Wisconsin at Madison, has now taken his place alongside Ward Churchill...
As an RN what would be the most challenging part of being a Mandated Abuse Reporter...
As an RN what would be the most challenging part of being a Mandated Abuse Reporter for you?
Explain why the pH of deionized water changes so drastically when acid or base is added.
Explain why the pH of deionized water changes so drastically when acid or base is added.
1.) You can easily extract DNA from a strawberry. This is in part due to the...
1.) You can easily extract DNA from a strawberry. This is in part due to the fact that strawberries are octaploid so they have a lot of DNA in them, just this alludes to a disconnect between plant biologists/growers and the average consumer. When the general public were asked the following two questions there was a major disconnect between their understanding of basic biology. Question 1.) Does a regular tomato contain DNA? Consumers were likely to answer “No.” Question 2.)...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT