In: Computer Science
Database Systems - Security and Administration
Jak University is a leading higher education institution that is well known throughout Malaysia for its academic achievements. It is proven that this university exists in all major cities in Malaysia. Jak University is also in Kuala Lumpur and as a management center, has 100 thousand students and 10 thousand lecturers spread across several branch campuses. The learning process is done face-to-face and online. Each campus in each city can join virtual classes, so that the data from each campus can be connected to one another. Academic data from each campus branch is stored in files and then sent to the Center in Kuala Lumpur and each campus can also share data (share files). However, after this university has been operating for several years, the number of lecture transactions has increased, so that the volume of data stored in files is getting bigger. So in day-to-day operations often experience problems access to data, both when sending data and when accessing data from other campuses.
Question:
1. The University of Jak uses a file-based approach in managing
its academic data. But over time began to experience problems with
data accessibility.
You were appointed as a Database Consultant to change the data
management process at the University of Jak using a database
approach. You need to explain to the Management in Kuala Lumpur so
that the leaders can understand and be sure about the proposed
application of the database approach, where this approach can
support operational performance in data management. Explain what is
meant by a file-base approach and a database approach. Also
describe the advantages and disadvantages of each.
2. If the management approves the proposal, then you need to design a good database architecture. You can choose a Three-Tier Client-Server architecture.
3. The hardware running the DBMS must continue to operate even if one of the hardware components is damaged due to various factors. One solution is to implement RAID technology. Describe what RAID technology is. Also explain by providing examples of how each RAID level is implemented.
Q1. Explain what is meant by a file-base approach and a database approach. Also describe the advantages and disadvantages of each.
ANSWER:-
1. FILE BASED APPROACH--
For example:- The details of customers may be stored in one file, orders in another, etc.
For example:- all of the programs associated with processing customers' orders are referred to as the order processing application.
Disadvantages of the file-based approach
Using the file-based approach to keep organizational information has a number of disadvantages.
2. Data isolation:- Data isolation is a property that determines when and how changes made by one operation become visible to other concurrent users and systems. This issue occurs in a concurrency situation. This is a problem because:
3. Integrity problems:- Problems with data integrity is another disadvantage of using a file-based system. It refers to the maintenance and assurance that the data in a database are correct and consistent. Factors to consider when addressing this issue are:
4. Security problems:- Security can be a problem with a file-based approach because:
5. Concurrency access:- Concurrency is the ability of the database to allow multiple users access to the same record without adversely affecting transaction processing. A file-based system must manage, or prevent, concurrency by the application programs. Typically, in a file-based system, when an application opens a file, that file is locked. This means that no one else has access to the file at the same time.
The following diagram shows how different applications will each have their own copy of the files they need in order to carry out the activities for which they are responsible:
Advantages of File Based System
The file Based system is not complicated and is simpler to use.
Because of the above point, this system is quite inexpensive.
Because the file based system is simple and cheap, it is normally suitable for home users and owners of small businesses.
Since the file based system is used by smaller organisations or individual users, it stores comparatively lesser amount of data. Hence, the data can be accessed faster and more easily.
1. DATABASE APPROACH--
The database approach is an improvement on the shared file solution as the use of a database management system (DBMS) provides facilities for querying, data security and integrity, and allows simultaneous access to data by a number of different users. At this point we should explain some important terminology:
Database: A database is a collection of related data.
Database management system: The term 'database management system', often abbreviated to DBMS, refers to a software system used to create and manage databases. The software of such systems is complex, consisting of a number of different components. The term database system is usually an alternative term for database management system.
System catalogue/Data dictionary: The description of the data in the database management system.
Database application: Database application refers to a program, or related set of programs, which use the database management system to perform the computer-related tasks of a particular business function, such as order processing.
One of the benefits of the database approach is that the problem of physical data dependence is resolved; this means that the underlying structure of a data file can be changed without the application programs needing amendment. This is achieved by a hierarchy of levels of data specification. Each such specification of data in a database system is called a schema.
The Systems Planning and Requirements Committee of the American National Standards Institute encapsulated the concept of schema in its three-level database architecture model, known as the ANSI/SPARC architecture, which is shown in the diagram below:
Advantages of Database Approach
1. Reducing Data Redundancy:- The file based data management systems contained multiple files that were stored in many different locations in a system or even across multiple systems. Because of this, there were sometimes multiple copies of the same file which lead to data redundancy.
2. Sharing of Data:- In a database, the users of the database can share the data among themselves. There are various levels of authorisation to access the data, and consequently the data can only be shared based on the correct authorisation protocols being followed.
3. Data Integrity:- Data integrity means that the data is accurate and consistent in the database. Data Integrity is very important as there are multiple databases in a DBMS. All of these databases contain data that is visible to multiple users. So it is necessary to ensure that the data is correct and consistent in all the databases and for all the users.
4. Data Security:- Data Security is vital concept in a database. Only authorised users should be allowed to access the database and their identity should be authenticated using a username and password. Unauthorised users should not be allowed to access the database under any circumstances as it violates the integrity constraints.
5. Privacy:-The privacy rule in a database means only the authorized users can access a database according to its privacy constraints. There are levels of database access and a user can only view the data he is allowed to. For example - In social networking sites, access constraints are different for different accounts a user may want to access.
6. Backup and Recovery:- Database Management System automatically takes care of backup and recovery. The users don't need to backup data periodically because this is taken care of by the DBMS. Moreover, it also restores the database after a crash or system failure to its previous condition.
7. Data Consistency:- Data consistency is ensured in a database because there is no data redundancy. All data appears consistently across the database and the data is same for all the users viewing the database. Moreover, any changes made to the database are immediately reflected to all the users and there is no data inconsistency.
Disadvantages of database approach
2. Size:- The database management system consumes a substantial amount of main memory as well as a large amount of disk space in order to make it run efficiently.
3. Cost of DBMS:-A multi-user database management system may be very expensive. Even after the installation, there is a high recurrent annual maintenance cost on the software.
4. Cost of conversion:-When moving from a file-base system to a database system, the company is required to have additional expenses on hardware acquisition and training cost.
5. Performance:-As the database approach is to manage many applications rather than exclusively for a particular one, some applications may not run as fast as before.
6. Higher impact of a failure:- The database approach increases the vulnerability of the system due to the centralization. As all users and applications reply on the database availability, the failure of any component can bring operations to a halt and affect the services to the customer seriously.
In database systems, concurrency is managed thus allowing multiple users access to the same record. This is an important difference between database and file-based systems.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Q 2. If the management approves the proposal, then you need to design a good database architecture. You can choose a Three-Tier Client-Server architecture.
ANSWER:-
The design of a DBMS depends on its architecture. It can be centralized or decentralized or hierarchical. The architecture of a DBMS can be seen as either single tier or multi-tier. An n-tier architecture divides the whole system into related but independent n modules, which can be independently modified, altered, changed, or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and uses it. Any changes done here will directly be done on the DBMS itself. It does not provide handy tools for end-users. Database designers and programmers normally prefer to use single-tier architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS can be accessed. Programmers use 2-tier architecture where they access the DBMS by means of an application. Here the application tier is entirely independent of the database in terms of operation, design, and programming.
3-tier Architecture:- A 3-tier architecture separates its tiers from each other based on the complexity of the users and how they use the data present in the database. It is the most widely used architecture to design a DBMS.
Database (Data) Tier − At this tier, the database resides along with its query processing languages. We also have the relations that define the data and their constraints at this level.
Application (Middle) Tier − At this tier reside the application server and the programs that access the database. For a user, this application tier presents an abstracted view of the database. End-users are unaware of any existence of the database beyond the application. At the other end, the database tier is not aware of any other user beyond the application tier. Hence, the application layer sits in the middle and acts as a mediator between the end-user and the database.
User (Presentation) Tier − End-users operate on this tier and they know nothing about any existence of the database beyond this layer. At this layer, multiple views of the database can be provided by the application. All views are generated by applications that reside in the application tier.
Multiple-tier database architecture is highly modifiable, as almost all its components are independent and can be changed independently.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Q3. The hardware running the DBMS must continue to operate even if one of the hardware components is damaged due to various factors. One solution is to implement RAID technology. Describe what RAID technology is. Also explain by providing examples of how each RAID level is implemented.
ANSWER:-
RAID (redundant array of independent disks) is a way of storing the same data in different places on multiple hard disks or solid-state drives to protect data in the case of a drive failure. There are different RAID levels, however, and not all have the goal of providing redundancy.
RAID works by placing data on multiple disks and allowing input/output (I/O) operations to overlap in a balanced way, improving performance. Because the use of multiple disks increases the mean time between failures (MTBF), storing data redundantly also increases fault tolerance.
RAID arrays appear to the operating system (OS) as a single logical drive. RAID employs the techniques of disk mirroring or disk striping. Mirroring will copy identical data onto more than one drive. Striping partitions helps spread data over multiple disk drives. Each drive's storage space is divided into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order.
Disk mirroring and disk striping can also be combined in a RAID array.
A RAID controller is a device used to manage hard disk drives in a storage array. It can be used as a level of abstraction between the OS and the physical disks, presenting groups of disks as logical units. Using a RAID controller can improve performance and help protect data in case of a crash.
A RAID controller may be hardware- or software-based. In a hardware-based RAID product, a physical controller manages the array. The controller can also be designed to support drive formats such as SATA and SCSI. A physical RAID controller can also be built into a server's motherboard.
With software-based RAID, the controller uses the resources of the hardware system, such as the central processor and memory. While it performs the same functions as a hardware-based RAID controller, software-based RAID controllers may not enable as much of a performance boost and can affect the performance of other applications on the server.
If a software-based RAID implementation isn't compatible with a system's boot-up process, and hardware-based RAID controllers are too costly, firmware or driver-based RAID is another potential option.
Firmware-based RAID controller chips are located on the motherboard, and all operations are performed by the CPU, similar to software-based RAID. However, with firmware, the RAID system is only implemented at the beginning of the boot process. Once the OS has loaded, the controller driver takes over RAID functionality. A firmware RAID controller isn't as pricy as a hardware option, but it puts more strain on the computer's CPU. Firmware-based RAID is also called hardware-assisted software RAID, hybrid model RAID and fake RAID.
RAID levels:-
Raid devices will make use of different versions, called levels. The original paper that coined the term and developed the RAID setup concept defined six levels of RAID -- 0 through 5. This numbered system enabled those in IT to differentiate RAID versions. The number of levels has since expanded and has been broken into three categories: standard, nested and nonstandard RAID levels.
Standard RAID levels:-
RAID 0. This configuration has striping, but no redundancy of data. It offers the best performance, but it does not provide fault tolerance.
RAID 1. Also known as disk mirroring, this configuration consists of at least two drives that duplicate the storage of data. There is no striping. Read performance is improved since either disk can be read at the same time. Write performance is the same as for single disk storage.
RAID 2. This configuration uses striping across disks, with some disks storing error checking and correcting (ECC) information. RAID 2 also uses a dedicated Hamming code parity; a linear form of error correction code. RAID 2 has no advantage over RAID 3 and is no longer used.
RAID 3. This technique uses striping and dedicates one drive to storing parity information. The embedded ECC information is used to detect errors. Data recovery is accomplished by calculating the exclusive information recorded on the other drives. Since an I/O operation addresses all the drives at the same time, RAID 3 cannot overlap I/O. For this reason, RAID 3 is best for single-user systems with long record applications.
RAID 4. This level uses large stripes, which means a user can read records from any single drive. Overlapped I/O can then be used for read operations. Since all write operations are required to update the parity drive, no I/O overlapping is possible.
RAID 5. This level is based on parity block-level striping. The parity information is striped across each drive, enabling the array to function even if one drive were to fail. The array's architecture allows read and write operations to span multiple drives -- resulting in performance better than that of a single drive, but not as high as that of a RAID 0 array. RAID 5 requires at least three disks, but it is often recommended to use at least five disks for performance reasons.
RAID 5 arrays are generally considered to be a poor choice for use on write-intensive systems because of the performance impact associated with writing parity data. When a disk fails, it can take a long time to rebuild a RAID 5 array.
RAID 6. This technique is similar to RAID 5, but it includes a second parity scheme distributed across the drives in the array. The use of additional parity enables the array to continue to function even if two disks fail simultaneously. However, this extra protection comes at a cost. RAID 6 arrays often have slower write performance than RAID 5 arrays.
Nested RAID levels
Some RAID levels are referred to as nested RAID because they are based on a combination of RAID levels. Here are some examples of nested RAID levels.
RAID 10 (RAID 1+0). Combining RAID 1 and RAID 0, this level is often referred to as RAID 10, which offers higher performance than RAID 1, but at a much higher cost. In RAID 1+0, the data is mirrored and the mirrors are striped.
RAID 01 (RAID 0+1). RAID 0+1 is similar to RAID 1+0, except the data organization method is slightly different. Rather than creating a mirror and then striping the mirror, RAID 0+1 creates a stripe set and then mirrors the stripe set.
RAID 03 (RAID 0+3, also known as RAID 53 or RAID 5+3). This level uses striping (in RAID 0 style) for RAID 3's virtual disk blocks. This offers higher performance than RAID 3, but at a higher cost.
RAID 50 (RAID 5+0). This configuration combines RAID 5 distributed parity with RAID 0 striping to improve RAID 5 performance without reducing data protection.
Nonstandard RAID levels
Nonstandard RAID levels vary from standard RAID levels and are usually developed by companies or organizations for mainly proprietary use. Here are some examples.
RAID 7. A nonstandard RAID level based on RAID 3 and RAID 4 that adds caching. It includes a real-time embedded OS as a controller, caching via a high-speed bus and other characteristics of a stand-alone computer.
Adaptive RAID. This level enables the RAID controller to decide how to store the parity on disks. It will choose between RAID 3 and RAID 5, depending on which RAID set type will perform better with the type of data being written to the disks.
Linux MD RAID 10. This level, provided by the Linux kernel, supports the creation of nested and nonstandard RAID arrays. Linux software RAID can also support the creation of standard RAID 0, RAID 1, RAID 4, RAID 5 and RAID 6 configurations.
Advantages of RAID
Downsides of using RAID