Category Archives: Database Design

>SQL Server: Why We Should Prefer Database to Implement Data Integrity

>

Recently, I had explored a client database, in which not a single domain or referential constraint was used to implement data integrity. And such integrities are implemented through code on application side.
At database side following domain and referential constraints can be used to implement data integrity.
  • FOREIGN KEY
  • CHECK
  • DEFAULT
  • NOT NULL
Out of these four constraints, DEFAULT has less importance as compare to other three. Developers mostly like to implement domain integrity constraints through code on application side, and like to avoid implementing these through CHECK constraints, on database side. But one should keep in mind that implementing such business rules in the database using CHECK constraints are always helpful for optimizer to generate efficient execution plans.
For remaining two, one should totally depend on database and these (FOREIGN KEY and NOT NULL) constraints should be implemented on database side. Because, through database you are not only using domain and referential constraints to implement data integrity but also TO FACILITATE THE OPTIMIZER TO GENERATE EFFICIENT QUERY PLAN FOR YOU.
Advertisements

Be Careful When Creating a Table

Creating a table is not a big task, but a well designed table is basic necessity of every application, as whole application performance in future will be based on these tables. Before creating table, go through the process of normalization and keep in mind that your first goal is PERFORMANCE by using fewer resources. Normalize your tables not beyond third normal form, and de-normalize it when its necessary, as de-normalization of data is as important as normalization is.

Table Naming
  • Pascal notation, end your table name with an ‘s’ (like, Orders, ErrorMessages, Products)
  • Do Not use spaces in name
  • Do not use SQL keywords as the name (like, user is a keyword so avoid using keywords)
  • More natural way of naming tables is a prefix tbl before its name (like tblCustomers).

Columns
  • Like table, no spaces, no keyword should be used
  • If column is primary key column, name should be like TABLENAME+’ID’ (like CustomerID)
  • Be careful when selecting data type
    • Don’t use the NVARCHAR or NCHAR data types unless you need to store 16-bit character Unicode data as it takes double space as VARCHAR or CHAR data types and implicit conversion from NVARCHAR or NCHAR to VARCHAR or CHAR adversely reduce performance.
    • Be sure about your need when it comes to size to data type. Always select the smallest data type which meets your need. For example, if all you are going to store 1 to 10 use TINYINT. When INT (which takes 4 bytes) can covers your future data don’t select BIGINT (which takes 8 bytes)
    • Sorting an INTEGER data type is faster then VARHCAR or CHAR, so use INTEGER data types when data is numeric.
NOT NULL property
  • Avoid keeping a column default to NULL. A column should always contain something as indexes don’t cover the NULL values.
SELECT * FROM table WHERE column IS NULL
(above query will use full table scan since index doesn’t cover the values you need)
SELECT column FROM table ORDER BY column
(even for ORDER BY clause full table scan will be used)
Primary Key
  • Table must not be a HEAP table
  • Each table must have a primary key, and for best performance it should be created on column which has INT data type.
  • Never use VARCHAR, FLOAT, REAL, DATE and GUID data type columns as Primary Key
  • Add a surrogate key, when composite key is selected as data identity.
Referential Integrity Cascade DELETE/UPDATE
  • While creating a Foreign Key constraint, enforce cascade delete or cascade update when you want to delete a child when ever you delete its parent. If the child rows are “part” of the parent, then use cascade. Like if you have an employee and a child names table and you never want to warn to delete child record while deleting employee data.
  • Using cascade delete is good to enforce as compared to create a TRIGGER for this purpose, as triggers never fire when you use TRUNCATE TABLE, but in cascade case TRUNCATE will return an error.
  • When you are not sure for above mentioned points, never ever use cascade delete/update as it creates more problems then it facilitates you.

Computed Columns
  • Computed columns don’t follow the normalization rules but if a column beneficial when some sort of data is computed by a query and this query is being run again and again.

Default Value
  • Always provide a default value when column is NOT NULL

Description of column
  • Keep you table we documented by providing each column description through design view or using following t-sql.
EXECUTE sp_addextendedproperty N‘MS_Description’, ‘your column description here’, N‘SCHEMA’, ‘schemaname’, N‘TABLE’, ‘tablename’, N‘COLUMN’, ‘columnname’

Uniqueidentifier Column as Primary Key, a worst choice

 GUID or int as Primary key ???


Though it is not necessary that your Primary Key column is always a cluster index too. By default, Sql Server creates cluster index on column or group of columns which you have declared as your table PK, and most of DBAs don’t like to go against this default behavior of Sql Server.

But problem arises when for uniqueness a column with uniqueidentifier data type is added, for surrogate key to make it Primary Key, finally for your table.

The GUID is a wide column (16 bytes to be specific) and contains a unique combination of 33 uppercase and numeric characters. This column because it is the primary key is going to be stored in, of course, the clustered index .

Also, if a GUID is used instead of an integer identity column then the 33 characters need to be matched for each row that is returned using that column in the where clause.


If a high volume of inserts are done on these tables then GUID’s being large will contribute to page splits, as will the fact that NEWID() generates a random value, which could place a new record on any of the data pages will cause performance problems.


Recommendations

  1. INT must be used as Primary Key instead of GUID because:
  2. INT takes only 4 bytes, saving your physical and memoray storage.
  3. INT as primary key (identity) creates incremental values resulting less then 1% of indexes fregmention during heavy insert.
  4. There are T-SQL operators available for INT like >,= and <