Fourth Normal Form (4NF)

Fourth Normal Form comes into picture when Multi-valued Dependency occur in any relation. In this tutorial we will learn about Multi-valued Dependency, how to remove it and how to make any table satisfy the fourth normal form.

Follow the video above for complete explanation of 4th Normal Form. Or, if you want, you can even skip the video and jump to the section below for the complete tutorial.

In our last tutorial, we learned about the boyce-codd normal form, we suggest you to follow the last tutorial before this one.


Rules for 4th Normal Form

For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:

  1. It should be in the Boyce-Codd Normal Form.
  2. And, the table should not have any Multi-valued Dependency.

Let’s try to understand what multi-valued dependency is in the next section.


What is Multi-valued Dependency?

A table is said to have multi-valued dependency, if the following conditions are true,

  1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have multi-valued dependency.
  2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
  3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C should be independent of each other.

If all these conditions are true for any relation(table), it is said to have multi-valued dependency.


Time for an Example

Below we have a college enrolment table with columns s_idcourse and hobby.

s_idcoursehobby
1ScienceCricket
1MathsHockey
2C#Cricket
2PhpHockey

As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths, and has two hobbies, Cricket and Hockey.

You must be thinking what problem this can lead to, right?

Well the two records for student with s_id 1, will give rise to two more records, as shown below, because for one student, two hobbies exists, hence along with both the courses, these hobbies should be specified.

s_idcoursehobby
1ScienceCricket
1MathsHockey
1ScienceHockey
1MathsCricket

And, in the table above, there is no relationship between the columns course and hobby. They are independent of each other.

So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as well.


How to satisfy 4th Normal Form?

To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.

CourseOpted Table

s_idcourse
1Science
1Maths
2C#
2Php

And, Hobbies Table,

s_idhobby
1Cricket
1Hockey
2Cricket
2Hockey

Now this relation satisfies the fourth normal form.

A table can also have functional dependency along with multi-valued dependency. In that case, the functionally dependent columns are moved in a separate table and the multi-valued dependent columns are moved to separate tables.

If you design your database carefully, you can easily avoid these issues.

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known as 3.5 Normal Form.

Follow the video above for complete explanation of BCNF. Or, if you want, you can even skip the video and jump to the section below for the complete tutorial.

In our last tutorial, we learned about the third normal form and we also learned how to remove transitive dependency from a table, we suggest you to follow the last tutorial before this one.


Rules for BCNF

For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two conditions:

  1. It should be in the Third Normal Form.
  2. And, for any dependency A → B, A should be a super key.

The second point sounds a bit tricky, right? In simple words, it means, that for a dependency A → B, A cannot be a non-prime attribute, if B is a prime attribute.


Time for an Example

Below we have a college enrolment table with columns student_idsubject and professor.

student_idsubjectprofessor
101JavaP.Java
101C++P.Cpp
102JavaP.Java2
103C#P.Chash
104JavaP.Java

As you can see, we have also added some sample data to the table.

In the table above:

  • One student can enrol for multiple subjects. For example, student with student_id 101, has opted for subjects – Java & C++
  • For each subject, a professor is assigned to the student.
  • And, there can be multiple professors teaching one subject like we have for Java.

What do you think should be the Primary Key?

Well, in the table above student_id, subject together form the primary key, because using student_id and subject, we can find all the columns of the table.

One more important point to note here is, one professor teaches only one subject, but one subject may have two different professors.

Hence, there is a dependency between subject and professor here, where subject depends on the professor name.

This table satisfies the 1st Normal form because all the values are atomic, column names are unique and all the values stored in a particular column are of same domain.

This table also satisfies the 2nd Normal Form as their is no Partial Dependency.

And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.

But this table is not in Boyce-Codd Normal Form.


Why this table is not in BCNF?

In the table above, student_id, subject form primary key, which means subject column is a prime attribute.

But, there is one more dependency, professor → subject.

And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.


To make this relation(table) satisfy BCNF, we will decompose this table into two tables, student table and professor table.

Below we have the structure for both the tables.

Student Table

student_idp_id
1011
1012
and so on…

And, Professor Table

p_idprofessorsubject
1P.JavaJava
2P.CppC++
and so on…

And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we will learn about the Fourth Normal Form.


A more Generic Explanation

In the picture below, we have tried to explain BCNF in terms of relations.

BCNF Normal Form

Third Normal Form (3NF)

Third Normal Form is an upgrade to Second Normal Form. When a table is in the Second Normal Form and has no transitive dependency, then it is in the Third Normal Form.

The video below covers the concept of Third Normal Form in details.

In our last tutorial, we learned about the second normal form and even normalized our Score table into the 2nd Normal Form.

So let’s use the same example, where we have 3 tables, StudentSubject and Score.

Student Table

student_idnamereg_nobranchaddress
10Akon07-WYCSEKerala
11Akon08-WYITGujarat
12Bkon09-WYITRajasthan

Subject Table

subject_idsubject_nameteacher
1JavaJava Teacher
2C++C++ Teacher
3PhpPhp Teacher

Score Table

score_idstudent_idsubject_idmarks
110170
210275
311180

In the Score table, we need to store some more information, which is the exam name and total marks, so let’s add 2 more columns to the Score table.

score_idstudent_idsubject_idmarksexam_nametotal_marks

Requirements for Third Normal Form

For a table to be in the third normal form,

  1. It should be in the Second Normal form.
  2. And it should not have Transitive Dependency.

What is Transitive Dependency?

With exam_name and total_marks added to our Score table, it saves more data now. Primary key for our Score table is a composite key, which means it’s made up of two attributes or columns → student_id + subject_id.

Our new column exam_name depends on both student and subject. For example, a mechanical engineering student will have Workshop exam but a computer science student won’t. And for some subjects you have Prctical exams and for some you don’t. So we can say that exam_name is dependent on both student_id and subject_id.

And what about our second new column total_marks? Does it depend on our Score table’s primary key?

Well, the column total_marks depends on exam_name as with exam type the total score changes. For example, practicals are of less marks while theory exams are of more marks.

But, exam_name is just another column in the score table. It is not a primary key or even a part of the primary key, and total_marks depends on it.

This is Transitive Dependency. When a non-prime attribute depends on other non-prime attributes rather than depending upon the prime attributes or primary key.


How to remove Transitive Dependency?

Again the solution is very simple. Take out the columns exam_name and total_marks from Score table and put them in an Exam table and use the exam_id wherever required.

Score Table: In 3rd Normal Form

score_idstudent_idsubject_idmarksexam_id

The new Exam table

exam_idexam_nametotal_marks
1Workshop200
2Mains70
3Practicals30

Advantage of removing Transitive Dependency

The advantage of removing transitive dependency is,

  • Amount of data duplication is reduced.
  • Data integrity achieved.

What is Second Normal Form?

For a table to be in the Second Normal Form, it must satisfy two conditions:

  1. The table should be in the First Normal Form.
  2. There should be no Partial Dependency.

If you want you can skip the video, as the concept is covered in detail below the video.

What is Partial Dependency? Do not worry about it. First let’s understand what is Dependency in a table?


What is Dependency?

Let’s take an example of a Student table with columns student_idnamereg_no(registration number), branch and address(student’s home address).

student_idnamereg_nobranchaddress

In this table, student_id is the primary key and will be unique for every row, hence we can use student_id to fetch any row of data from this table

Even for a case, where student names are same, if we know the student_id we can easily fetch the correct record.

student_idnamereg_nobranchaddress
10Akon07-WYCSEKerala
11Akon08-WYITGujarat

Hence we can say a Primary Key for a table is the column or a group of columns(composite key) which can uniquely identify each record in the table.

I can ask from branch name of student with student_id 10, and I can get it. Similarly, if I ask for name of student with student_id 10 or 11, I will get it. So all I need is student_id and every other column depends on it, or can be fetched using it.

This is Dependency and we also call it Functional Dependency.


What is Partial Dependency?

Now that we know what dependency is, we are in a better state to understand what partial dependency is.

For a simple table like Student, a single column like student_id can uniquely identfy all the records in a table.

But this is not true all the time. So now let’s extend our example to see if more than 1 column together can act as a primary key.

Let’s create another table for Subject, which will have subject_id and subject_name fields and subject_id will be the primary key.

subject_idsubject_name
1Java
2C++
3Php

Now we have a Student table with student information and another table Subject for storing subject information.

Let’s create another table Score, to store the marks obtained by students in the respective subjects. We will also be saving name of the teacher who teaches that subject along with marks.

score_idstudent_idsubject_idmarksteacher
110170Java Teacher
210275C++ Teacher
311180Java Teacher

In the score table we are saving the student_id to know which student’s marks are these and subject_id to know for which subject the marks are for.

Together, student_id + subject_id forms a Candidate Key(learn about Database Keys) for this table, which can be the Primary key.

Confused, How this combination can be a primary key?

See, if I ask you to get me marks of student with student_id 10, can you get it from this table? No, because you don’t know for which subject. And if I give you subject_id, you would not know for which student. Hence we need student_id + subject_id to uniquely identify any row.

But where is Partial Dependency?

Now if you look at the Score table, we have a column names teacher which is only dependent on the subject, for Java it’s Java Teacher and for C++ it’s C++ Teacher & so on.

Now as we just discussed that the primary key for this table is a composition of two columns which is student_id & subject_id but the teacher’s name only depends on subject, hence the subject_id, and has nothing to do with student_id.

This is Partial Dependency, where an attribute in a table depends on only a part of the primary key and not on the whole key.


How to remove Partial Dependency?

There can be many different solutions for this, but out objective is to remove teacher’s name from Score table.

The simplest solution is to remove columns teacher from Score table and add it to the Subject table. Hence, the Subject table will become:

subject_idsubject_nameteacher
1JavaJava Teacher
2C++C++ Teacher
3PhpPhp Teacher

And our Score table is now in the second normal form, with no partial dependency.

score_idstudent_idsubject_idmarks
110170
210275
311180

Quick Recap

  1. For a table to be in the Second Normal form, it should be in the First Normal form and it should not have Partial Dependency.
  2. Partial Dependency exists, when for a composite primary key, any attribute in the table depends only on a part of the primary key and not on the complete primary key.
  3. To remove Partial dependency, we can divide the table, remove the attribute which is causing partial dependency, and move it to some other table where it fits in well.