Relational Database Best Practices

Atomize Data
Use the smallest data points that you have and run calcuations upon them. Don’t ever use calculated data in fields
Name Tables in Plurals
Be sure to name your tables with plural names as they will be holding collections of data

 Database Normalization

First Normal Form

To get a table into first normal form, you will need the following features:

  • Primary Key – can never be null
    • Natural Key – Field inside table that can be marked as unique
    • Surrogate Key – Dummy piece of data that allows for unique keys
      • this will become an identity, which is a MS term for an auto-incrementing field
  • No multiple pieces of data in any one table cell or column

Never leave a database in first normal form, the only purpose for 1st normal is to move to 2nd normal

Second Normal Form

2nd Normal takes any repeating column fields and abstracts them to their own table.  These tables are then joined by foreign keys.

The problem with second normal form is that there are often transitive dependencies in the data.  This occurs because all pieces of data depend on the primary key, but sometimes pieces of data depend on eachother.  If they do:

a = b, b = c, so a = c

This means that your dependencies depend on eachother AND the primary key, creating a headache when changing in the future.

Third Normal Form

3rd normal form requires a process known as Decomposition.   You will identify the transitive dependencies, and abstract them by moving the pieces of data to a new table.  You will then create an associative ID (primary key on associative table) and use that as the foreign key for your first table.

Foreign Keys

Foreign keys are used to join tables together to enforce data integrity.  We are telling the tables which pieces of data correspond to the primary keys of other tables.  Additionally, with foreign keys installed, it becomes more difficult to enter incorrect records into your database, as all data must pass integrity rules before being recorded.

Relationships

One To Many

With databases, you are able to define the nature of the relationship.  One to many defines a relationship between table A and B, which I will substitute with real data for example purposes.  Imagine a table of all basketball players and another table of the 5 positions on the court at any one time.

  • At any one time, a player can have only one position
  • However, during that game alone, a position would have more than one player (2, in this case)

Talk it Out!  A lot of programming should make intuitive sense, so talk out the relationship and see if it makes sense.  In the above example, in a Cavs vs. Knicks game, Lebron James only has one position (record) on the court at a time (unless you count point-forward as a hybrid, which would break my database 🙂 while the small forward position has 2 players (records) for Lebron James and Carmelo Anthony.  So a player only has one position, while a position has many players, creating a one-to-many relationship.

Said differently: if we had a table of all NBA players and we wanted to abstract their positions to a different table, each position would be applied to many different players.  This is a one-to-many relationship.

Many To Many Relationships

In real life, Lebron can play multiple positions at any given time.  He is routinely used as the PF is small ball and as the point guard during their Death Lineup, so a player can have many positions and a position can have many players.  This is a many-to-many relationship.

It sounds complicated, but many to many relationships are greatly simplified by the inclusion of a junction table.  This table holds keys for the relationships.  So we would have one record for the small forward position.  But we would have another record for the small forward who plays point forward.  And another one for the SF that moves up for small ball.

When you create the new table, use proper naming conventions.  For above example, the table would be PlayersPositions because we are defining many-to-many relationships between players and positions.  The table would have two columns, one for playerID and one for positionID.  We would then make primary keys out of both columns, to give us better structural integrity.  This also reduces the chance for duplicates.

Finally, create relationships so that the junction table is between the two tables you are joining.  In our case, the Players table and the Positions table.  The PlayersPositions table would have a one-to many relationship between the Players table and the Players position table and the Positions table would also have a one to many relationship with PlayerPositions.  Its the PlayersPositions table that is defining any many-to-many relationships.