In: Computer Science
Part #4: Data normalization:
Background: Data in a relational database is stored in a normalized form. Data normalization or just normalization is a strategy used to organize data into multiple related tables to reduce data redundancy while preserving data integrity.
Exercise: Normalize the student data in the University table (sample data is shown further below) into 3 tables, namely, Student, Department, and Course. You do not need to populate data. Just illustrate the schema for the 3 tables. You may make suitable assumptions to normalize the data. Ensure you introduce additional columns as needed to preserve relationships between the 3 tables.
University table |
|||||||
StudentID |
Name |
Address |
Dept_ID |
Dept |
CRN |
Course |
Room |
+0119144 |
Kodi |
3 S. East.. |
CSE |
Compu.. |
241433 |
Intro to.. |
Ben 010 |
+0235422 |
Mary |
10 Cherr.. |
CSE |
Compu.. |
241433 |
Intro to.. |
Ben 010 |
+0717113 |
Alex |
64 Rose.. |
CSE |
Compu.. |
241433 |
Intro to.. |
Ben 010 |
+0115144 |
Jones |
3 Talaw.. |
CSE |
Compu.. |
241433 |
Intro to.. |
Ben 010 |
+0334455 |
Tommy |
22 Dow.. |
CSE |
Compu.. |
241433 |
Intro to.. |
Ben 010 |
Another 95 rows of this table for CRN 241433 not shown for brevity |
Show just the schema (without any data) for the normalized form of the data with 3 tables in the space below:
Estimating benefit of normalization
The objective of normalization is to reduce redundant data in a database. Assume the University table has 100 rows -- that is 100 students in 1 course taught by 1 department. Each row has 8 columns. So the total number of columns in the database (which has just 1 University table) is 100 * 8 = 800 cells.
Now, compute similar information using the normalized version of your tables. In the normalized tables, indicate how many rows & total number of columns that would be present and fill-in the following table:
Table name |
Number of rows |
Cols per row |
Number of cells in the table |
Student |
|
|
|
Department |
|||
Course |
The total number of cells in the normalized version (with 3 tables) of the database is: |
If each cell occupies 100 bytes, what would be the percentage difference in size of the size of the two databases? Compute the answer by completing the data in the following tables:
Size of un-normalized University database: |
|
Size of normalized database: |
|
Raw difference in database size: |
|
Percentage difference in database size: |
As per data given in the university table we will distribute the data and seperate it into three tables student, course and department.
In the student table each students should have unique student id, name and address.
In case of department we have unique department id and department name.
Course table has unique CRN(course reference number),Course and room.
Now we will make relationship between these tables to preserve data integrity. Here, we will use foreign key relationship between tables.
Relationship between course and department table:
We know that a department have many courses. So, we will add one column to course table 'depID' with foreign key constraint.
So, now the columns of course table will be:
Course table has unique CRN(course reference number) ,Course,room and depID.
Relationship between student and course table:
Note: We will assume that a student can join a single course.
Then, we need to add one column 'courseID' to student table with foreign key contraints.
So, now the columns of student table will be:
Student table should have unique student id, name, address and courseID.
--------------------------------------------------------------------------------------------
Student table:
Number of rows: Assume 100 students
Number of columns: 4 (studentID, name, address, courseID)
Cols per row: 4
Number of cell in table: 400 (Number of rows multiply by cols per row)
-----------------------------------------------------------------------------------------------
Course table:
Note: As given we have only one course that is 'Intro to ...'. So, we have only one course. Thus, number of rows in course table will be 1.
Number of rows: 1
Number of columns: 4 (CRN, course, room,depID)
Cols per row: 4
Number of cell in table: 4 (Number of rows multiply by cols per row)
-------------------------------------------------------------------------------------------
Department table:
Note: As given we have only one department that is 'CSE'. So, we have only one department. Thus, number of rows in course table will be 1.
Number of rows: 1
Number of columns: 2 (depID, depName)
Cols per row: 2
Number of cell in table: 2 (Number of rows multiply by cols per row)
----------------------------------------------------------------------------------------------
The total number of cells in the normalized version (with 3 tables) of the database is: (400+4+2) = 406
-----------------------------------------------------------------------------------------------
If each cell occupies 100 bytes, what would be the percentage difference in size of the size of the two databases? Compute the answer by completing the data in the following tables:
Size of un-normalized University database: |
Number of cells X space occupy by each cell 800 X 100 = 80000 bytes |
Size of normalized database: |
Number of cells X space occupy by each cell = 406 X 100 = 40600 bytes |
Raw difference in database size: |
80000 - 40600 = 39400 bytes |
----------------------------------------------- |
|
Percentage difference in database size: Calculate percentage difference |V1−V2|[(V1+V2)/2]×100 =|80000−40600|[(80000+40600)/2]×100 =|80000−40600|[(80000+40600)/2]×100 =(39400/(120600/2))×100 =(39400/60300)×100 =0.6534×100 =65.34%difference |