Pros and cons of normalizing data in a 3NF form

The main purpose of normalization is to reduce data redundancy and avoid inconsistent data. Normalization leads to the separation of unrelated entities into separate entities. In effect, normalization leads to a clean design of the database. Since we do not store redundant data, we save storage space and resources to maintain (update, delete) redundant data.

But, there are cases where we don’t need to fully normalize the data. The provided example in question is a great example to explain why we allow denormalized data. For customer address data, it is desirable to design the city, state, country, and zip codes as separate entities, since they could be represented by their own unique identifiers (state_id, country_id, zip_id) and that various Clients can belong to the same country. state, city and zip code. Suppose we design them as separate entities and try to retrieve data for the following problem:

“Generate report of all customers belonging to ‘US’ and residing in the city ‘TAMPA’ of ‘FLORIDA’ in zip code ‘33601’.”

The query would be something like…

“SELECT c.customer_first_name, c.customer_middle_name, c.customer_last_name FROM customer c, customer_address ca, address_city act, address_zip az, address_state as, address_country ac WHERE c.customer_id=ca.customer_id and c.city_id=act.city_id AND c. zip_id=az.zip_id AND c.state_id = as.state_id and c.country_id = ac.country_id AND ac.country_name = ‘US’ AND as.state_name = ‘FLORIDA’ AND act.city_name = ‘TAMPA’ AND az.zip_code = ‘33601’” Note the joins (c.customer_id=ca.customer_id and c.city_id=act.city_id, etc.) required in the SQL to retrieve the required information. SQL joins are considered very expensive when there is a large amount of data, say we have a terabyte of data inside the client table. The additional four junctions will be very expensive and will result in unacceptable system response time.

If we denormalize the data and allow the country, state, city, and zip code data to reside in the customer_address table, then we could rewrite the above query as —

“SELECT c.customer_first_name, c.customer_middle_name, c.customer_last_name FROM customer c, customer_address ca WHERE c.customer_id=ca.customer_id AND ca.country_name = ‘US’ AND ca.state_name = ‘FLORIDA’ AND ca.city_name = ‘TAMPA ‘ And ca.zip_code = ‘33601’”

After denormalization, the query would run much faster. So, in effect, data normalization in 3NF form is not always practical.

Leave a Reply

Your email address will not be published. Required fields are marked *