Imagine a professor whose office is written into two different rows of a table. The day that office changes, someone updates one row and forgets the other — and now your database cheerfully insists the same professor is in Room 301 and Room 502 at once. That contradiction is the everyday problem normalization exists to prevent.
Database normalization can be summed up in one sentence: "Store each fact in exactly one place." The systematic way to do that is 1NF → 2NF → 3NF. If the same information is scattered across multiple rows, the moment you update one and forget the others, your data becomes inconsistent — an update anomaly. Try it yourself below.
In a denormalized table you must manually update both rows when the office changes; miss one and you get the contradiction above. In a 3NF design, you update exactly 1 row. That is the whole point — the rest of normalization is just rules for deciding which column belongs where so each fact is stored in only one place.
Functional Dependency — The Criterion for Normalization
Normalization asks, "What does this column depend on?" That is a functional dependency. For example, student_id → student_name means "if you know the student ID, the name is determined."
In the diagram below, pay attention to the arrow colors:
- Green = depends on the whole key (good)
- Yellow = depends on only part of the key (partial dependency → violates 2NF)
- Red dashed = depends through a non-key column (transitive dependency → violates 3NF)
Removing the yellow and red arrows = splitting the table = normalization.
1NF → 2NF → 3NF at a Glance
Click through the steps below to see how the table gets split at each stage. Red cells mark columns that should not be in that table.
1NF: One Value per Cell
If you put multiple values like MATH101, CS102 into a single cell, you can’t search properly and you can’t join properly either. Rule: every cell must be atomic, meaning it contains only one value. Columns like phone1, phone2, phone3 have the same problem — they just hide repeated values inside the column names.
2NF: Remove Partial Dependencies
In a table with a composite key (student_id, course_id), student_name depends only on student_id. It does not depend on the whole key, only part of it — a partial functional dependency. Rule: every non-key column must depend on the entire key. If there is a partial dependency, move that column into a separate table.
What if the key is a single column? Then partial dependency is impossible, so if the table is in 1NF, it is automatically in 2NF.
3NF: Remove Transitive Dependencies
In a Courses table, instructor_office does not depend directly on course_id (the key). It depends through the path course_id → instructor_id → instructor_office — a non-key column depending on another non-key column. That is a transitive dependency. Rule: non-key columns must depend only on the key, not through another non-key column. Solution: move instructor_office into an Instructors table.
One Sentence Instead of Memorizing Rules
"Every non-key attribute must depend on the key, the whole key, and nothing but the key." That one sentence is the whole story:
- the key = 1NF (a fact must be identifiable by a key)
- the whole key = 2NF (it must depend on the entire key, not just part of it)
- nothing but the key = 3NF (it must depend directly on the key, not through another non-key column)
On an exam, ask just this: "What does this column depend on? The whole key, part of the key, or a non-key column?"
Concepts Students Often Confuse
3NF is the finish line. It is a great starting point, not the end. Sometimes you need the stricter BCNF, and other times you intentionally denormalize for performance.
More tables is always better. The goal is not to increase table count — it is to make each table responsible for only one kind of fact. Splitting without a dependency-based reason just makes JOINs more complicated.
NoSQL means normalization does not matter. Normalization is a relational term, but the problem of keeping duplicated data consistent exists in any database. Even when you denormalize, know exactly what tradeoff you’re making.
Try It on a Table You Own
Pick one table from your own project and work the same three questions that drive normalization:
- Write down what the key is.
- Check whether each non-key column depends on the whole key or only part of it. → Only part means a 2NF violation.
- Check whether any non-key column depends on another non-key column. → If so, that is a 3NF violation.
Those 3 questions catch most normalization problems — and each one is just another way of asking where a single fact really lives.
Frequently Asked Questions
- What is database normalization in one sentence?
- Database normalization means storing each fact in exactly one place, and the systematic way to do that is moving through first, second, and third normal form. The rest of normalization is a set of rules for deciding which column belongs in which table so that no fact is duplicated across rows.
- What is an update anomaly?
- An update anomaly happens when the same information is scattered across multiple rows: the moment you update one copy and forget the others, the data becomes inconsistent. For example, if a professor's office is stored in two rows and only one is changed, the database now claims two different offices at once. Normalized designs avoid this by storing the fact once.
- What is the difference between 2NF and 3NF?
- Second normal form removes partial dependencies: every non-key column must depend on the entire key, not just part of a composite key. Third normal form removes transitive dependencies: non-key columns must depend only on the key, not through another non-key column. In both cases the fix is moving the offending column into a separate table.
- What does first normal form require?
- First normal form requires every cell to be atomic, containing only one value. Putting multiple values in a single cell breaks searching and joining. Columns like phone1, phone2, and phone3 have the same problem, because they hide repeated values inside column names instead of modeling them properly.
- What is a functional dependency?
- A functional dependency says that knowing one column determines another. For example, if you know a student ID, the student name is determined. Normalization works by asking what each column depends on: columns that depend on only part of the key violate second normal form, and columns that depend through a non-key column violate third normal form.
Need help with a problem?
Upload your question and get a verified, step-by-step solution in seconds.
Open GPAI Solver →