Entity-relationship modelling


In this chapter I introduce the concepts of entity-relationship (ER) modelling. At the end of this chapter you should be able to explain and apply these concepts.

1 Databases, this class, and you

Data bases, and the data base management systems that lord over them, are the core information systems technology. They are used---and will be used---to store corporate data, web pages, on-line movies, work flow information, document databases---absolutely everything that is of interest to business. After taking this class you will clearly understand and be able to explain why this is a good thing. This understanding will allow you to see opportunities for exploiting this technology in innovative ways.

Both of the above will be useful for a student (and eventual businessperson) whether they be interested in accounting, marketing, human resources, or finance. Business process reengineering (BPR) (by whatever name) virtually demands that data bases take a more central role in a corporation’s life. By its very definition BPR demands that people from throughout an organisation apply information technology solutions to broad problems. Data bases are one of the more frequently applied solutions. Thus, I propose that a great percentage of students interested in business should be knowledgeable about data bases.

2 A very simple modelling exercise

The whole purpose of ER modelling is to create an accurate reflection of the real world in a database. The ER model doesn’t actually give us a database description. It gives us an intermediate step from which it is easy to define a database. Let’s look at an example. (You will see much more detail on these concepts in the rest of this chapter. For now just try to understand the overall process.)

Suppose you are presented with the following situation and are told to create a database for it:

Every department within our company is in only one division. Each division has more than one department in it. We don’t have an upper limit on the number of departments that a division can have. For example, the New Business Development---the one managed by Mackenzie---and Higher Education departments are both in the Marketing division.

This is a fairly clear description of a situation. Many things are left unsaid that we understand about the situation. For example: each division has a name, and that name is unique within the company. For now, though, let’s focus on the description as it is given.

The first step is to figure out the items of interest in this situation. (In this document you will come across Problems. You should attempt to perform these before continuing the reading. Simply reading the problem and then reading the answer is not sufficient---you should attempt the problem yourself before you continue reading. Understanding these problems are integral to understanding the text. The answer to the problem appears in the text immediately after the problem.)


Problem 1: What are the items of interest here?

It seems here that the situation is concerned with divisions, departments, and employees or managers. It gives some details about which contains which, how they are related to each other, and provides some examples of each, but basically the situation is concerned with these three entity types. Here’s a formal, if somewhat ambiguous, definition.

An entity type is a collection of entities that share a common definition. An entity is a person, place, concept, or thing about which the business needs data.

So, Department is the name of one entity type. One instance of this entity type is the New Business Development department. The Marketing division is an instance of the Division entity type. Mackenzie is one instance of the Employee entity type. Instances of entity types are referred to as entities. Put more simply: You can touch an entity but an entity type is simply an idea. Person is an idea (entity type) while Scott, Nancy, Lindsey, and Mackenzie are touchable (entities). Entity types provide us with a means for making generalisations about entities. For example, instead of saying “Every department within our company is in only one division,” we could have gone down the list of all departments (that is, all entities with entity type Department) and asserted that each one is, indeed, in one division:

“The New Business Development department is in one division. The Higher Education department is in one division. ...” And so on until we’ve noted that each is on only one division.

But we know more than the facts about each individual department being in one division. We know that all new departments will also be in just one division. And if there is a new division, it, too, will have departments that are unique to the division. So, instead of providing information in the form of statements about specific entities, we use a more powerful and concise format and provide information in the form of statements about relationships among entity types.

Thus, in ER modelling we look for relationships among entity types because it is easier and more concise to speak of relationships among general entity types rather than the touchable entities themselves.


Problem 2: Each of the following is either an entity type and/or an entity. If it’s an entity type, then provide examples of entities; if it’s an entity, define a possible entity type.

  1. A municipal bond from Detroit
  2. Ford
  3. Clothes
  4. Employee

The municipal bond is an entity; bond is a possible entity type. Ford is an entity; manufacturer is a possible entity type. Clothes could be either: 1) a type if the entities are pants, shirts, etc.; 2) an entity if the type is product. Employee is an entity type; Angela and Natalie are example entities.

Back to our example: we have identified three entity types and four entities. From the description we can assume that there are more entities for each entity type. Go back and read the situation description if you do not think this is immediately obvious.

From the description there is some sort of relationship between Department and Division and another sort of relationship between Department and Employee. The first relationship is one of containment: each division has one or more departments, but any one department can only be in one division. (Think about an NCAA conference such as the Big 10 (the bucket) having many teams (a bunch of balls). On the other hand, each team (one ball) can only be in one bucket (a conference). In this instance the bucket is the division and the balls are the departments.) The second relationship tells us that an employee has a certain relationship relative to a certain Department, namely, that the employee manages the department. Determining the relationships among entity types is another important step in the process of ER modelling.

A relationship is an association between entity types.


Problem 3: What would you name these two relationships?

The defining characteristic of a relationship is that several entity types are involved. So something like a name or birth date would not be a relationship since only one entity is involved.

Now we have identified three entity types (Employee, Department, Division) and two relationships among these entity types (manages, contains). Now we can begin to represent the problem in the language of ER modelling.

ER models are usually represented graphically. The language we are going to use represents entity types as rectangles and relationships as diamonds. Below is the representation of the situation we are working with.

Notice that the contains relationship is drawn between the two entities that it is associated with. Similarly for the manages relationship. This (simplified) ER model tells us that:

Certainly we know more about the problem than this. Consider the relationship between divisions and departments. We know that divisions have multiple departments and departments can only be contained within one division. Or, for every one division there can be many departments. In the language of ER modelling this is called a 1:M (read: “one to many”) relationship.


Problem 4: What is the relationship between departments and managers? Fill in the blanks with either a one or a many:

  1. For each department there can be, at most, _____ managing employee(s).
  2. For each managing employee there can be, at most, _____ department(s).

The relationship between department and a managing employee is different. It doesn’t say so but we can assume that a department has only one manager. (Certainly you can imagine an instance in which a department has co-managers. That possibility is just as viable as the possibility I have assumed. This is part of the attraction of this type of work. The database professional has to read descriptions for what is said and then imagine what isn’t said. If you were actually creating a database in this example, you would have to ask someone what the situation actually is. But since you are just given this description, you have to come up with some assumption. For this situation we’ll make the above assumption.) We’ll also assume that an employee can also be the manager of, at most, one department. In other words, for every one department there can be, at most, one managing employee. In the language of ER modelling this is called a 1:1 (read: “one to one”) relationship. This information can also be represented in the ER diagram:

As you might have determined, the M part of a relationship is represented by putting an M next to the appropriate entity type in the relationship while the 1 part is represented by a 1. The ER diagram now represents much more information than it did above:

If you are a bit confused about all this 1:M and 1:1 stuff, never fear. You’ll see a lot more clarifying detail later.

Several other questions remain about this situation that are not addressed in the description:

These questions would have to be answered before we complete the ER model. And we will answer these questions later. For now we are going to stop this part of the analysis since the purpose of this example is to demonstrate what ER modelling is all about.

The ER modelling process is not something for which a set of steps can be given and then performed. The process contains almost as much art as science. Some steps are performed many times and many decisions are re-visited and revised. Given these conditions, a broad outline can be given:

  1. Determine what entity types are involved.
  2. Determine which entity types are related.
  3. Refine the definition of the relationships.

Understand now that there are several methods for representing ER models graphically. Some don’t use the diamonds for the relationship---they might just put the word above the line. It’s not really important how the entity types and relationships are represented; it’s just important that they are represented.

Notice what has happened with this situation. Initially we had a text description of the problem. After analysing it and making some necessary assumptions, we created an ER diagram that reflects the situation accurately and makes explicit the relationship among the entity types. This is why we perform ER modelling. We don’t know any more than we used to about the problem---we just have made explicit what we do know. It is quite a straight-forward step to go from this ER model to an implemented database. Remember why we are doing all this: We are finding out all we need to know to create a database that will hold our data. And a well-defined database can be a very useful tool for solving business problems---and it is also in high demand by recruiters. You will learn how to perform the steps necessary to create such a database in later chapters.

3 More detail

In this previous section I used an example to present an overview of how and why ER modelling is performed. In this section I present more detail on some of the basic concepts.

3.1 Relationships

Relationships define which entity types are directly associated with which other entity types. In the example in an earlier section, we saw that divisions are directly associated with departments and departments are directly associated with employees. No direct association between division and employee was given. This does not mean that there is no relationship between division and employee. In fact, the ER diagram tells us that there is a relationship between the two:

Given any one division, there can be many employees managing departments within that division.

Certainly, this is not earth shattering news. But it is in the ER diagram. The above fact is not represented as a separate relationship between division and employee because it can be inferred from existing relationships. An ER diagram should contain the minimum number of relationships necessary to reflect the situation.

3.1.1 Cardinality

Once a relationship between entity types has been established, the analyst should determine its cardinality.

A relationship’s cardinality defines the maximum number of entities of one type that can be associated with an entity of another type.

For relationships between two entity types, there are three basic cardinalities. Each of the following descriptions are given in terms of a relationship between entity type X and entity type Y.

1:1
--- one-to- one --- One entity of type X can be associated with, at most, one entity of type Y. One entity of type Y can be associated with, at most, one entity of type X.

An example: the relationship between car and steering wheel. A car has only one steering wheel and a steering wheel can only be installed in one car.

1:M
--- one-to-many --- One entity of type X can be associated with many entities of type Y. One entity of type Y can be associated with, at most, one entity of type X.

An example: the relationship between building and rooms. A building can have many rooms but a room can be in, at most, one building.

M:M
--- many-to-many --- One entity of type X can be associated with many entities of type Y. One entity of type Y can be associated with many entities of type X. An example: the relationship between a car and its options (such as air conditioning, ABS brakes). A car can have many options and an option can be installed on many cars.


Problem 5: Determine the cardinality of the relationships between the following four pairs of entity types. For each relationship you have to answer two questions:

  1. For each entity of type Y there can be, at most, _____ entity or entities of type X.
  2. For each entity of type X there can be, at most, _____ entity or entities of type Y.

Answering these two questions gives you the answer to the following questions. For example, if you answered M to the first question and 1 for the second question, then this relationship between entity types X and Y is of cardinality M:1. This is read as “for every X there can be only one Y and for every Y there can be many X.” Realize that you will have to make assumptions about the situations below to clarify some of these relationships.

  1. Patient under care of primary care physician
  2. Physician performs operation
  3. Doctors have speciality in disease
  4. Needle injected into patient

It would seem that at any particular time a patient can only have one primary care physician and that any physician can have many patients (M:1). One physician can perform many operations and one operation can be performed by many physicians (M:M). One doctor can have specialities in many diseases and one disease can be the speciality of many doctors (M:M). One needle can be injected into one patient and one patient can have many needles injected into him/her (M:1).

3.1.2 Existence

In the previous section we were concerned with the maximum number of entities of one type that can be associated with an entity of another type. In this section we examine the minimum number of entities in a relationship.

A relationship’s existence defines what we know about the existence of any entity on the other side of a relationship from a given entity. Existence is given as optional, mandatory, or unknown.

This is best clarified with an example. Consider again the example discussed in Section 2. Specifically, focus on the manage relationship between department and employee. We know the cardinality is 1:1. This tells us that at most one department is managed by an employee and an employee can manage, at most, one department. (Be sure you understand the distinction between these two phrases.) The existence of this relationship tells us the fewest number of departments that can be managed by an employee and the fewest number of employees that can manage a department. Only one of the following can be true:

optional
A department need not have any manager.
mandatory
A department must have at least one manager.
unknown
It is unknown whether or not a department has to have a manager.

Similarly, only one of the following may be true:

optional
An employee need not manage any department.
mandatory
An employee must manage at least one department.
unknown
It is unknown whether or not an employee must manage a department.


Problem 6: For each set of three above, which ones would you choose?

It is not entirely clear from the situation description which of the above are true. I make the relatively standard assumptions that a department must have at least one manager and that an employee need not be the manager of any department. Thus, the existence of this relationship is mandatory in one direction and optional in the other.

Going back to the definition of existence, we can also look at this situation in this way:

I assume that the contains relationship is mandatory in both directions. Given this information, the ER diagram is modified in the following manner:

This diagram is beginning to look a little complicated but remember the following pieces of information and it gets a little easier:

Let’s practice this. Look at the manages relationship again.


Problem 7: For each of the relationships listed in Problem 5:


3.2 Entity types

Entity types are things for which it is important that your company capture data. If it is not important, it should not be in the database. In an accounting database you would expect to find entity types for expenses, assets, liabilities, expenditures, deposits, etc. You would not expect to find entity types for colour of check, quality of dollar bills received, etc. The database is supposed to reflect reality---but only the part of reality that is important to the company.

3.2.1 Entity subtypes

Entity types are entities that share a common definition. This allows us to make generalisations about that type. This is a powerful capability; however, sometimes we want to make a generalisation only about a certain subset of those entities and another generalisation about the rest of the entities. Consider a simple example. Suppose you have an accounting database which keeps track of accounts receivable and accounts payable. Of course the database keeps track of the companies to which you owe money and the companies that owe you money. For all these companies, you keep track of their mailing address and a contact person. For the companies that owe you money you keep track of how much they owe you. For the companies that you owe money you keep track of how much you owe them. What to do? Should we have three entity types: one for the whole set and one for each subset? That would be a mess. That is why the concept of entity subtypes was created.

An entity subtype is a collection of entities of the same type to which a narrower definition and additional attributes and/or relationships apply.


Problem 8: In this company example, what is the entity type? What are the subtypes?

In this database you should define a company entity type with two subtypes: AR_co and AP_co. The company entity type stores all facts that are common attributes---in this case, the address and contact person. The AR_co entity subtype tracks the balance owed from this company while the AP_co entity subtype tracks the balance owed to this company.

There are many situations in which subtypes can be created but should not be. Only create subtypes

If one of these two requirements is not met, then do not create the subtype.


Problem 9: What is the relationship among the following? List a few facts common to all items for each question. List a few facts about each subtype that is not common to the other subtype. Also draw the ER diagram for each.

  1. book, publication, magazine
  2. individual, employee, contact person
  3. laptop, computer, desktop

On the other hand, there are some situations that are not so clear cut. Consider the following figure.

Many students would first suggest the diagram on the right---divide customers into investors and attendees and show that investors buy stocks and attendees register for seminars. I suggest that the figure on the left is better. Here’s my thought process:

What is it that makes an investor an investor? She buys stocks. And what is it that makes an attendee an attendee? She registers for seminars. Is there anything about an investor that keeps her from being an attendee? No. Vice versa? No. Do you want to prevent investors from being classified as attendees or vice versa? No and no. So, define relationships buy and register for the customer entity. Investors can be listed by choosing only those customers that are in the buy relationship. Attendees can be listed by choosing only those customers that are in the register relationship.

Thus, if a relationship defines the members of a proposed subtype, then use the relationship instead of the subtype.

3.2.2 Type/instance distinction

For some people this can one of the more difficult concepts to understand, so read carefully. What we are trying to discern here is the difference between a type of a thing and an actual thing. This is a pretty easy concept when comparing people and Joe. People is the type and Joe is the instance. However, modellers generally don’t make the type/instance distinction between an entity type and an entity---they generally make it between two entities. For example, think about “CT481” and “section 2 of CT481 in Winter 1962.” The second is an instance of the first. The section is an actual class that meets at an actual time with an actual teacher and actual students. CT481 is a type of thing that is an idea that only becomes real when you come into contact with one of its instances (e.g., section 2 of CT481 in Winter 1962).

Realize that this is a different distinction than that between entity types and entities. In this example, CT481 is one specific instance of the entity type Course and section 2 of CT481 in Winter 1962 is one specific instance of the entity type Section. Thus, both are entities and neither one is an entity type.


Problem 10: Analogously to the CT481 story above, fill out the story for a Boeing 747 relative to the type/instance distinction and the distinction between entity types and entities.

Boeing 747 is a type of plane and a specific Boeing 747 that flies through the air with passengers in it is an instance of this type. Boeing 747 is a specific instance of the entity type plane type and a flying Boeing 747 with passengers is a specific instance of an entity type plane.

3.3 Attributes

3.3.1 Basics

To this point we have focused on entity types and relationships among them. We have mentioned, in passing, “facts” about entity types and “attributes” of entity types. In this section I hope to make these ideas a little more clear.

Attributes are the characteristics of an entity type that we are interested in.

An attribute is a descriptor whose values are associated with individual entities of a specific type.

The attribute value for any single entity can have only one value at a given time. This value can change over time. An attribute of an employee might be salary. At any one time if you asked for the salary level of a certain employee, then you should get one answer. And if someone else asked the same question about that employee at the exact same time, they would expect to get the same answer. Of course, if you asked this question at a later time you might expect to get a different answer.

Think back to the example in Section 2. Few attributes are mentioned in the description but a few can be inferred. The department entity type has a name attribute, as do the division and employee entity types. Possible attributes for the employee entity type that aren’t mentioned include date of hire, home mailing address, work phone, and work address.

3.3.2 Identifier

Every entity type has an identifier. This identifier uniquely identifies a single (at least one, and no more than one) entity. If you know the value of the identifier, then you know exactly which entity you are dealing with. Further, the identifier’s value will never change over time. Thus, if you know the identifier now, then you can be confident that at any time in the future the identifier for that entity will not have changed.


Problem 11: Social security number is a possible identifier for a person. What is a possible problem with using it as an identifier? Why shouldn’t we use people’s names as an identifier?


Problem 12: For each of the following entity types, come up with a possible identifier:

  1. Book
  2. Car
  3. UM student
  4. UM building
  5. Colour


Problem 13: Suppose that we have a database that has the table car, and this table has an identifier vin (vehicle id number) and another field called model (e.g., “Ford Mustang GT”, “Ford Mustang SHO”, or “Accord Integra GS-R”). Suppose that we want to add information about the type of the model (e.g., “sporty”, “sedan”, “SUV”) to this database. How would you do it? You have two choices---either add model field to the car table or create another table to hold this information.

Consider the two choices:

Adding a field to an existing table
The new relation will have the attributes [vin, model, type]. Before we added the type attribute to this relation, there seemed to be only one entity type (vehicle). Now that we’ve added the type attribute it seems that we have another entity type that we’re collecting information about (model). Remember that the identifier (in this case, vin) uniquely identifies one and only one entity (again, in this case, car). But type is a piece of data about the model, not about the car itself. Every time we added an entity (a vehicle) to this relation, we would have to repeat information about the model that we might have entered before. For example, if there were 5000 Ford Mustang GTs in the relation, then 5000 times we would have to indicate that a Ford Mustang GT is sporty. All in all, this doesn’t seem like a good solution.
Creating a new table
Creating a new table is required since we now have two entity types---vehicles and models. In this first entity type we would have the attributes vin and model while in the second we would have model and type. In the vehicle entity, the model field tells us what model that particular vehicle is. In the model entity, the model field identifies which model we’re talking about while the type field tells us what type of vehicle that instances of this model are.

4 Advanced concepts

The concepts in the previous two sections of this chapter will allow you to model many business situations. The following concepts are needed to round out your repertoire so that you will be ready for almost any situation that comes your way.

4.1 Degree of a relationship

Relationships can be classified by the number of entity types involved. This is referred to as the degree of a relationship. To this point we have concerned ourselves with relationships between two entity types. This is, by far, the most common type of relationships. The most common degrees of relationships are as follows:

binary
This is a relationship between two entity types.
ternary
This is a relationship between three entity types.
recursive
This is a relationship involving only one entity type.

I will not spend any time on binary relationships now because we have discussed them at length already.

4.1.1 Ternary

In the real world there are relationships other than those involving two things. For example, suppose that we want to capture which employees use which skills on which project. We might try to represent this data in a database as three binary relationships between skills and project, project and employee, and employee and skill.

The applies relationship indicates which employee applies which skill. The used on relationship indicates which skill is used on which project. The works on relationship indicates which employee works on which project. But this is not enough to specify which employee uses which skill on which project. Suppose you know the following:

works-on
Lindsey and Mackenzie have worked on projects A and B.
applies
Lindsey has used skills interface design and database design while Mackenzie only used her database design skill.
used on
Both skills have been used on both projects.

Given this information, it is impossible to figure out on which projects Lindsey used which skills. She could have used interface design on project B and database design on project A---or the other way around. Or she might have used both skills on both projects. The database simply does not give us enough information.

In order to capture the necessary information the database needs a ternary relationship. In this case the database needs a relationship, called used- on, among employee, skill, and project.

The used on relationship captures information three pieces at a time. It stores facts such as:

  1. Lindsey used interface design skill on project A.
  2. Mackenzie used database design skill on project A.
  3. Lindsey used interface design skill on project B.
  4. Lindsey used database design skill on project B.
  5. Mackenzie used database design skill on project B.

Notice that this ternary relationship captures the information represented in the three binary relationships:

Implementing ternary relationships does not mean that you have to get rid of the binary relationships. You only get rid of the binary relationships if they capture a subset of the information captured by the ternary relationship. If a binary relationship captures information that differs from the ternary relationship, then the binary relationship should be retained if the information is important to your company. For example, consider the following:

The used on relationship stays the same as in the previous ER diagram. The binary relationships are different.

have skill
An employee has a certain skill. This is different than used on because there are some skills that an employee has that an he or she may not have used on a particular project.
needed
A project needs a particular skill. This is different than used on because there may be some skills for which employees have not been assigned to the project yet.
manages
An employee manages a project. This is a completely different dimension than skill so it could not be captured by used on.


Problem 14: We have three entities: car, customer, and salesperson. Consider the following situations.

#1
There is a ternary relationship called sell relating all three entities. Sells: “Customer cust bought car car from salesperson SP.

Sells
Car Cust SP
Cobra Sam Don
Mustang Sam Don
Mustang Sam Sharon
Cobra Jenn Sharon
Mustang Jenn Sharon

Question: Using this data, who sold Sam the Cobra?

#2
We have three binary relations as shown below:
Buys
Cust Car
SamMustang
SamCobra
JennMustang
JennCobra
Buys from
Cust SP
Sam Don
Sam Sharon
Jenn Sharon
Sells
SP Car
Don Cobra
Don Mustang
Sharon Cobra
Sharon Mustang
buys
Customer cust buys car car.
buys from
Customer cust buys from salesperson SP.
sells
Salesperson SP has sold car car.

Question: Using this data, who sold Sam the Cobra?


In situation #1 you can see from the first line that Don sold Sam the Cobra. In situation #2, looking at buys you can see that Sam did actually buy a Cobra (second line). Looking at buys from you can see that both Don and Sharon sold Sam cars. Looking at sells you can see that both Don and Sharon have sold Cobras. So the answer is either Don or Sharon sold Sam the Cobra. This is not good enough. This demonstrates that having three binary relationships does not capture the same information that one ternary relationship does.

It might be asserted (and has been by a former student) that the ambiguity in the problem is a result of the data base keeping information about car types (Cobra, Mustang, etc.)\ instead of actual cars (Cobra VIN=32, Cobra VIN=33, etc.). This is the case and I’d like to demonstrate why here.

Suppose that the four cars in this data base are numbered 1, 2, 3, 4. We’re going to try to answer the same question, “Who sold Sam the Cobra?”, using just the binary relationships but with information about numbered cars rather than the car types that is used above. The Buys relationship shows that Sam bought cars 1 and 2. The Sells relationship will show which sales person sold car #2 (the Cobra). It does not say to whom, but we already know that Sam bought car #2. So without even consulting the Buys from relationship, we know who sold Sam the Cobra.

This is a good observation but does not change the essential point. Breaking down a ternary relationship into its component binary relationships will sometimes result in a loss of information. It will always result in a loss of data if at least one of the entity types is a type of thing (e.g., a car or skill) as opposed to a specific thing (e.g., an actual car).

The question also remains: Why break up a ternary relationship into its component binary relationships if the ternary relationship captures what’s really going on in the world. A customer does buy a car from a salesperson. That’s really how we think about it and how it really occurs. It’s not: a customer buys a car, a car is sold by a sales person, and a customer buys from a sales person. The real world event involves three entities. Why not construct the data base to reflect this reality?

Thus far in this section the ER diagrams have not represented the cardinality of ternary relationships. There is a different method for determining cardinalities of higher order relationships:


Problem 15: For the used on relationship, what is the cardinality for each entity type?

To determine the cardinality for this relationship, I had to make several assumptions. Other assumptions are possible but I thought these seemed reasonable.

Given these assumptions, the cardinality for each entity type is M. So this is a M:M:M relationship.

4.1.2 Recursive

The final, and possibly the most difficult, relationship is the recursive relationship. This is a relationship that an entity has with itself. But it really doesn’t have to be difficult if you think about it as you would any ordinary binary relationship. Let’s look at an example.

Think of an employee who is the manager of other employees.

A manager manages many employees and an employee has exactly one direct manager. This is pretty straightforward. But, now, realize that a manager is really just another name for an employee. So, replace managers with employees in this diagram.

Now this diagram has the entity type employees represented twice. To remedy this situation, “pull” the relationship diamond down and slide the two employee rectangles so that they are lying on top of each other. Now the diagram looks like the following:

This diagram represents what we want:

Not everyone in the company has a manager. The president will not have a direct manager. This is handled in the data in the table by indicating that the president’s manager is the president. A little trick.

4.2 Attributes of a relationship

When we examined attributes earlier, the attributes were exclusively attached to entity types. However, it is also possible for a relationship to have attributes. Consider the is member relationship below.

A person can be a member of many clubs and a club can have many members. A natural piece of information to want to store is the date the person joined the club. If the attribute is of the person entity, then this would indicate when the person joined a club but we would not know which club. If the attribute is of the club entity, then this would indicate (possibly) when the club was founded or (possibly) when the most recent member joined the club but we would not know the dates on which each person joined. The solution is to make join date an attribute of the is member relationship.

4.3 Entity subtype partitioning

This section describes two different ways in which subtypes of an entity can be related to one another and to the super-type.

4.3.1 Optional versus mandatory

Assume there is an entity type called person, and entity subtypes called customer and employee. When a person is created, the designer of the database has two options:

mandatory
He/she can demand that the person be classified as one of the subtypes.
optional
He/she can allow a person to be created without classifying the person as any subtype.

Neither one is preferable to the other. The proper one to choose depends on the business situation.

Mandatory sub-typing is represented by creating a double line from the super-type (person in the following ER diagram) to the circle. Optional sub-typing is represented by leaving a single line from the super-type to the circle.

So, what does this figure tell you? Since it is a mandatory subtype partitioning (you know this from the double line), whenever data for a new person is entered into the database, it must be classified as either a customer or an employee. The database user cannot simply add information about a generic person---she must know whether this person is a customer or an employee. If this had been an optional subtype partitioning, then when that user was entering data about an employee, she had the option of classifying the person as an employee or as a customer---but did not have to classify the person as either.

4.3.2 Disjoint versus overlapping

Consider now the company super-type and the subtypes AR_co and AP_co. As a designer you can specify whether or not an entity of subtype AR_co can also be an entity of type AP_co. Certainly it is not abnormal to think that you can do business with companies that do business with you. Think of being a consultant for Ameritech or IBM.

The following are the two possibilities:

disjoint
If entities are allowed to be no more than one subtype, then the subtypes are said to be disjoint.
overlapping
If entities can be classified as several subtypes, then the subtypes are said to be overlapping.

Disjoint subtypes are represented by putting a “d” in the circle. Overlapping subtypes are represented by putting a “o” in the circle.

The above figure tells us that this is a disjoint entity sub-typing. This means that whenever data for a new company is entered into the database, the company can be classified as either AR_co or AP_co but not both. If this had been an overlapping entity sub-typing, then when that user was entering data about a company, she would have had the option of classifying the company as both AR_co and AP_co.

 


Problem 16: Suppose there is a student entity type with subtypes CIS majors, finance majors, and accounting majors. Is this a mandatory or optional sub-typing? Are these subtypes disjoint or overlapping?

Suppose I didn’t tell you that this should be an entity subtype problem. Would you represent it this way? What else would you do?


I would think that these would be optional, overlapping entity subtypes. But if I were not going to represent it this way, I may consider having a M:M relationship between student and major.

Certainly, entity subtypes should be classified along both dimensions---that is, you should identify whether the subtype is mandatory or optional and whether it is disjoint or overlapping. All four combinations are possible and each is appropriate at different times.

4.4 Aggregation of entity types

Subtypes are generally thought of in terms of X is a Y (which is why these are commonly referred to as is-a relationships). Another type of relationship that needs to be represented in a database is the part of relationship, more formally called aggregation. When an entity is made up of several different types of other entities, an aggregation relationship may be called for.

Consider the relationship between a car and its engine and body. The engine and body are both part of the car. The relationship is represented as follows in an ER diagram.

4.5 Parallel relationships

Two entities can have more than one type of relationship. This is not surprising; further, it is not difficult to represent in a database or in an ER diagram. Consider the entity types person and insurance policy and the relationships between them of pays for and is insured under.

Look at these relationships one at a time.

These are two distinct relationships. They mean two different things---that is why they are represented as two separate relationships in the ER diagram.

4.6 Weak entities

Weak entities are entities, but with a difference---weak entities only exist because some other entity exists. For example, if you were to define two entities employee and salary-history, then the second would be a weak entity because the record of an employee’s salary history could only exist if a record of an employee also exists. Joe Smith’s salary history wouldn’t make much sense if Joe Smith doesn’t exist in the data base. A weak entity is represented by a double border as shown below.

4.7 Types of attributes

Sometimes it is instructive to classify an attribute by the means in which the value is determined. Here are the three possibilities.

basic
These are values provided to the business. These are the types of attributes that we have been discussing so far. Think of name, address, etc. These values cannot be deduced from the values of other attributes.
designed
This is invented and exists only in the database. An example might be a unique identifier for a department. This value is not changed once it is set.
derived
This is a value that can be calculated from the value of other attributes in the database. An example might be the age of an employee when the birth date is in the database. These attributes should, generally, not be stored in the database but should be calculated when needed.

4.8 Attribute optionality

Not all entities have a value for every attribute; however, some attributes must have a value for all entities.

optional
An entity need not have a value associated with an optional attribute.
mandatory
An entity must have a value associated with a mandatory attribute.

For example, assume the employee entity type has attributes hire date and termination date. Hire date would certainly be classified as a mandatory attribute; if the employee didn’t have a hire date, then the person couldn’t very well be an employee.

Termination date is an optional attribute. You would expect that many people in the database would not have a termination date while others, who are obviously ex-employees, do have a value associated with the termination date attribute.

The optionality of an attribute depends highly on the business situation, how the information is gathered, and how the business updates its database. One company might classify an attribute of an entity type as optional while another company might classify the same attribute of the same type as mandatory. Consider the following example:

Consider the attribute sale price of the catalog item entity type for a computer mail order company. Company A has a policy that they do not put an item into the catalog until it has a price; thus, they do not create a catalog item entity until they can assign a value to the attribute sale price. For this company the sale price attribute is mandatory.

On the other hand, Company B has a policy that they put an item into their catalog as soon as they decide to stock it. This way they can make their product line look as broad as possible. They put Call us for latest quote in the catalog instead of a price. Thus, they do create catalog items even before they have assigned a value to the attribute sale price. For this company the sale price attribute is optional.

Again, in order to classify an attribute as optional or mandatory, you must understand the business situation and practices.


Problem 17: For each of the following attributes, determine whether it is optional or mandatory:

  1. Delivery date of a customer order
  2. Order dateof a customer order
  3. Title of a book
  4. Serial number of a television

4.9 Other information about attributes

The database designer should also determine miscellaneous other information about each attribute:

default
This is the value that an attribute should take if it is not assigned a value. For example, the state field of an employee table might have the default value of MI.
permitted range
These are the values that an attribute is allowed to take. This ensures that a value that gets put into the database is valid. For example, the sale_price field of the inventory table might have a permitted range of sale_price > 0.
composite
A composite attribute is an attribute made up of many other attributes. The reason for creating a composite attribute is that the attribute itself is referred to as a whole. Think of an employee’s address that is made up of the house number, street, city, state, and zip. One way of storing this attribute is to create an attribute address that would contain the whole address. For example, a value of address might be

address = 202 Crest Avenue, Ann Arbor, MI 48103

An alternative to this would be to store these pieces of data in separate fields in the table. For example, the above information might be stored as

street = 202 Crest Avenue
city = Ann Arbor
state = MI
zip = 48103

Separating the attributes in this way allows database users to refer to each field independently. For example, under the second scheme a user could easily and quickly determine the employees who live in Michigan. Under the first scheme this would not be nearly as fast.

5 Interpreting ER diagrams

ER diagram for interpretation exercise

The point of this section is to give you some examples of how ER diagrams are interpreted. I try to give you some of the variations but I certainly do not give you all of them. If your “reading” of a relation is not below, then it is not necessarily wrong. Try to determine if they mean the same thing. If they don’t and you cannot figure out the problem, then come by and talk with me during office hours.

needed
manages
has-skill
used-on
The technique for an n-ary (in this case 3-ary, or ternary) relation is different than for binary relations, but still straight-forward. Hold your hand on n-1 entity types (in this case 2) and determine whether a 1 or an m goes on the remaining arm of the relation. Below, in order, are the project, employee, and skill arms.

Notice that the other two entity types are held constant; that is, for the project arm (the first one) you are determining how many projects can be associated with any single pairing of employees and skills. You can think of it the following way: “I have an employee named Fred. He is skilled in woodworking. How many projects can Fred be a woodworker on?” If it’s “many”, then put an m on the project arm; if it’s “one”, then put a 1 on it.

6 ER exercises

Question 1

What is the cardinality and existence of each of the following relationships in just the direction given? State any assumptions you have to make.

  1. Husband to wife
  2. Student to degree
  3. Child to parent
  4. Player to team
  5. Student to course

Question 2

For each of the following pairs of rules, identify two entity types and one relationship. State the cardinality and existence of the relationship in each case. If you don’t think enough information is available to define either of these, then state an assumption that makes it clear. Draw the ER diagram.

  1. A department employs many persons. A person is employed by, at most, one department.
  2. A manager manages, at most, one department. A department is managed by, at most, one manager.
  3. An author may write many books. A book may be written by many authors.
  4. A team consists of many players. A player plays for only one team.
  5. A lecturer teaches, at most, one course. A course is taught by exactly one lecturer.
  6. A flight-leg connects two airports. An airport is used by many flight-legs.
  7. A purchase order may be for many products. A product may appear on many purchase orders.
  8. A customer may submit many orders. An order is for exactly one customer.

Question 3

Draw an ER diagram for the following. Be sure to indicate the existence and cardinality for each relationship.

  1. A college runs many classes. Each class may be taught by several teachers, and a teacher may teach several classes. A particular class always uses the same room. Because classes may meet at different times or on different evenings, it is possible for different classes to use the same room.
  2. Each employee in an engineering company has at most one recognised skill, but a given skill may be possessed by several employees. An employee is able to operate a given machine-type (e.g., lathe, grinder) if he has one of several skills, but each skill is associated with the operation of only one machine type. Possession of a given skill (e.g., mechanic, electrician) allows an employee to maintain several machine-types, although maintenance of any given machine-type requires a specific skill (e.g., a lathe must be maintained by a mechanic).

Question 4

Draw an ER diagram for each of the following situations. On the diagram be sure to identify the cardinality, existence, and optionality (for subtypes) of each relationship.

  1. A company has a number of employees. Each employee may be assigned to one or more projects, or may not be assigned to a project. A project must have at least one employee assigned, and may have several employees assigned.
  2. A university has a large number of courses in its catalog. Each course may have one or more other courses as prerequisites, or may have no prerequisites.
  3. A college course may have one or more scheduled sections, or may not have a scheduled section.
  4. A hospital patient has a patient history. Each patient has one or more history records (we assume that the initial patient visit is always recorded as an instance of the history). Each patient history record belongs to exactly one patient.
  5. A video store may stock more than one copy of a given movie. It is also true that the store may not have a single copy of a particular movie.

Question 5

Draw an ER diagram to represent the following entity types and the natural relationships among them: Vehicle, Land-vehicle, Air-vehicle, Water-vehicle, Ocean-vessel, River-raft, Helicopter, Rail-vehicle, Road-vehicle, Car, Airplane, Bicycle.

Question 6

Draw an ER diagram that best represents the following situation. There are three types of accounts in a bank, with these attributes:

  1. Checking: Acct-no, Date-opened, Balance, Service-charge

  2. Savings: Acct-no, Date-opened, Balance, Interest-rate

  3. Loan: Acct-no, Date-opened, Balance, Acct-limit

Question 7

Consider the following diagram:

  1. What are the minimum and maximum number of Cs that can be associated with any one A? This is the same as asking “What are the existence and cardinality for the C to A relationship?”

  2. Substitute A:person and B:manage and C:project in the above ER diagram. Suppose that someone told you that the following three facts are true. Are these consistent with the diagram? Answer yes or no for each of the three facts. If your answer is no, then explain why it is inconsistent.
    1. Mackenzie manages the White project.
    2. Lindsey manages the Brown and Black projects.
    3. Joe doesn’t manage any projects.

Question 8

The following facts make up all of the leads relation between person and project:

You do not know whether or not there are any other people or any other projects. Which diagram(s) that is (are) consistent with this set of three facts. For example, you might answer “1, 4” if both 1 and 4 are consistent with the above facts.

Question 9

For each of the following sets of sentences, draw the corresponding ER diagram.

  1. An account can be charged against many projects, though it may not be charged against any. A project must have at least one, though it may have many, accounts charged against it.
  2. Projects must be classified as either top secret or civilian (but not both). There is information specific to top secret projects and specific to civilian projects that we want to record.

  3. An employee must manage exactly one department. A department may or may not have one employee manage it.
  4. Men are only allowed to supervise men. Women are only allowed to supervise women. We do not want to allow the database to hold data representing a man supervising a woman. An employee, regardless of sex, is assigned to exactly one office, with each office having exactly one employee in it. (Be sure to include the office entity in this diagram.)

Question 10

The following descriptions all have to do with a holding company for food service companies. You should answer each one separately from the others.

  1. Each chain consists of 50 to 300 stores that are owned by ACTME, the holding company for several restaurant chains and two caterers.

  2. Menu items are wide ranging, and can be classified by the section of the menu (appetiser, dessert, etc.) in which each is presented and by ethnic group (Italian, Hungarian, etc.) to which it belongs.

  3. The menu of each restaurant changes every couple months; management likes to keep track of past menus to track which ones have been successful.

  4. Each menu item is made of several ingredients (eggs, bacon, etc.) that are used in a certain quantity.

  5. Ingredients can be acquired from several suppliers. Ingredients are acquired by sending orders for several goods using the vendor’s item numbers. The item number for each of these ingredients varies across suppliers, so if you are going to order eggs from supplier # 1, then you might order item 52 while from supplier # 2 you might order item J216. The company keeps price and item numbers for all the ingredients for all its menu items---and even for some ingredients which it is not currently using in any menu items. The quantity needed for each ingredient is also kept. The current price of these goods is maintained though the historical price is not. Because of special deals and volume discounts, the price at which a good is acquired is often different from its list price. Thus, the database must retain the price at which a good is actually acquired.

Scott A. Moore, all rights reserved.