In: Statistics and Probability
OK I have two data sets with 30 million rows each each data set is five columns with four attributes and an amount. I want to confirm that the two data sets are exactly the same no two rows of data in the 30 million rolls are duplicates
For my proof I will confirm each data set has the same number of rows. And I will also do the following:
I will create four smaller data sets from each of the two large data sets. Each of the smaller data sets will remove one of the four attributes
If each of those for data sets matches exactly and the total count matches is that proof that the two large data sets are exactly the same
The reason I am doing this test as I am not able to compare 30 million rows to 30 million rows because the set is too large for the tools I have available
QUESTION: If the four smaller data sets match exactly and the total row count matches exactly. Have I proved these the two 30 million row data sets are exactly the same l?