Database and Table Design
Instructor: Charles Ott
Text: Chapter 1, pp 30 to 53. Much of this material will be gone over in more detail in weeks to come.
- Planning and designing tables: see below.
- Primary keys
- Data Types
- Creating Simple Tables
- Table Design View
You Will Need to Design Databases
Here is a general rule to which there are very few exceptions: all real databases are custom-designed. You might think that you could make, for example, a generic database design for a video-rental store, and it would be usable by most other video-rental stores, since they all work pretty much the same way. If not a video-rental store, then perhaps you could design a reusable system for liquor stores, or paint manufacturers, or book stores, or some other kind of business that seems standardized.
As it works out, that almost never happens. Every business in the world works a little bit differently, and the database design for that business must reflect the way the business actually works.
How do you find out how a business works? Typically, you get a clipboard and start interviewing employees, you talk to the boss, you look at their current inventory and accounting systems, and you try out the old computer system if they have one. But all database designs are, essentially, hand craftsmanship.
This week, we will be exploring how to hand-craft tables.
Thinking About A Table
Suppose we want a table of our company's products. What information should be in it? As a general rule, each table should contain (1) all of the information about that subject, and (2) no information that is not about that subject. This rule is called normalization, and although we won't go deeply into it in this class, it can be defined rigorously.
- Product Name: such as "Model 26A" or "Lil' Beaut". This seems reasonable.
- Stock Number: (often called an "SKU" for Stock Keeping Unit) such as "K2156". This is the number that is used only within the company to refer to this product. Most businesses have some sort of internal numbering like this.
- Characteristics: such as size, weight, color, and a rating in watts, horsepower, or number of feathers (a special rating for widgets). Whatever it is that people want to know about widgets should be here. All of these items will be separate columns in our table.
- Supplier Name(?): such as "Thompson Widget Manufacturing". At first glance,
this seems reasonable, but hold on a minute. Do we really want to write this here? What happens if we write "Thompson Widget Manufacturing" in the records for fifty kinds of widgets, and then the company changes its name to
"Thompson Industries"? We would have to change all fifty records. Suppose that Thompson is bought out
by "Yokohama Light Widget Works, Pty"? We'd have to change all the records again.
Even more likely, the person who enters these records is apt to make spelling errors, such as "Thompsen" or "Thomson". This means that if we try to make a list of all the products supplied by Thompson, we will miss some.
On the other hand, we do want to know who the supplier is, and that information ought to be in the database.
The problem here is that the name of the supplier is not really part of the information about a particular type of widget. It's information about a company. We will solve this problem by using a foreign key, as explained later in this course.
- Supplier Address (?): This is even worse. The supplier could move every couple of months, and for that matter, might have more than one address. Again, this isn't information about a widget, it's information about a company, and it doesn't belong in the Product table. It does belong, as we will see, in the Supplier table.
- Price: such as "$4.95". We certainly want this, since our company sets the selling price.
- Cost (?): such as "$0.05". (We make a terrific profit on widgets!) This may or may not make sense in our database. It's true that this is information about the widget, but it's not true that it comes from our company. It comes from the supplier. If it doesn't change very often, it might be useful to keep it in our Product table. But if the price changes all the time, we would be better off finding some way to get it from our purchasing records so that we always have the latest price. This is a judgment call, one of many you'll have to make!
- On Hand (?): such as "500", the number of items of this kind on-hand in the
warehouse. This is iffy — whether this number should be in this table depends on whether
these widgets are the kind we would want to keep track of individually, such as automobiles,
or the kind we want to track only in bulk, such as boxes of crackers. For crackers, this
makes sense. We certainly don't want to track each individual box of crackers, so we can
just keep a total of the number of boxes. For automobiles, we're going to want an
individual table record for each unit, to keep track of such things as serial number
and color. In that case, we do not to keep an on-hand total here. Instead,
we'll get the on-hand total by adding up all of the individual records for cars.
This is another judgment call, one that you may have change with experience.
An inflexible rule, from which there are no deviations ever:
Every record in a table must be different.
This means that no two rows of a table can ever have exactly the same information in the same fields. If we have two records about widgets, they must be different models, or different colors, or have some other difference in at least one field.
Every table must have a primary key, which is a column or set of columns that is guaranteed to be different for every record.
Sometimes you will have a data item that will always be different. In the Product table, the "Stock Number" may be different for each kind of widget … or it might not. In some companies, every different item in inventory is required to have a different SKU. In others, the same SKU might be used for, say, similar items with different colors. This is a matter of company policy, which you (as the database designer) are not allowed to change.
If there is a unique data item, you can use that as the primary key. If a combination of fields is unique (for example, the Product Name and Stock Number together), you can create a composite key using those fields.
Otherwise, Access allows you to create a "dataless" key, called an AutoNumber field. This is a number is incremented each time you add a new record, which means that it is guaranteed to be different each time.
In this writer's humble opinion, you should always use a dataless key except with the very largest databases. The problem with using real data for a key is that people make mistakes. If you create a Product table like the one we have discussed, you will almost certainly find that someone, somewhere, gave the same SKU to two products ... and they will insist that it's too complicated to change the SKU, and your database must admit both products with the same number. If you use people's names as a primary key, you will find that lots of people have the same name. Even Social Security Numbers are not entirely reliable: every year a few duplicate numbers turn up. A dataless key takes up one extra column in the table, but is always reliable.
Note: Unlike most databases, Access actually will allow you to create a table without specifying a key, although it will complain. The reason this is allowed in Access is that Access is often called on to import tables created in Excel or Word which may have duplicate records. (In reality, Access can't have duplicate records any more than any other database. When it imports tables, it uses the row number as a primary key.)
It is useful to be able to read these imported tables long enough to find and eliminate the duplicate records. But in any other circumstances, never try to use a table without a primary key.
Here's a partial example of a Product table with a key called "ProductNo":
|ProductNo||Product Name||Stock Number|
ProductNo is a dataless key. That is, it is a number that is added by Access each time a new Product record is created, and incremented so that it is always unique. The important thing about ProductNo is that if you know it, you can easily find the corresponding Product record.
Field Data Types
Each field (or column) that you create must be assigned a data type. This specifies what kind of information will be stored there, such as words, numbers or dates. Your textbook has a complete list. Please note:
- You should almost always use an autonumber field for your primary key. Otherwise, this type is not of much use.
- A number field holds numbers that you might want to do arithmetic on. For example, you would use a number for "OnHand" (because you might want to add or subtract units) or "Weight" (because you might want to add up the weights of Products to get a shipping weight).
- Currency is the same as number except that a dollar sign (or other currency symbol) is added, and the numbers are displayed with two decimal places (that is, dollars and cents).
- A text field can contain any words, and can also hold numbers, but only those numbers that would never be used in arithmetic. For example, telephone numbers and ZIP codes should always be stored in a text field, because it makes no sense to add or subtract these numbers. A text field should always be limited to the largest number of letters you expect to need there.
- A memo field is the same as text, except that it holds multiple lines up to about 65,000 characters.
- A yes/no field is used for anything that has only two values, even though you might called them "checked/unchecked", "on/off" or "true/false". In general, fields like this are shown as check boxes. You can't enter numbers in them. (Note: in programming languages, data like this is called "Boolean".)
- Always use a date/time field for dates. You can write a date in a text field, such as "October 12, 2000", but the date/time field allows you to calculations such as figuring "30 days from now".
Week 2 Homework
- Create an Access database file called something like "YourName_CI214.mdb" and save it on own USB drive. We will use this one file throughout the course.
- Create the four tables that you did on 3x5 cards last week: Customer, Product, Supplier and Employee. Important: you must create these tables using correct data types and good design as we have discussed in this class. If you made mistakes in your first design (which is understandable), you'll need to fix them when you create your Access tables.
- Print out your Customer, Product, Employee and Supplier tables and turn in four pages. Write your name on the first page and staple them together.