Identity Data Architectures#
As part of the State of Utah's enterprise architecture efforts, we completed an inventory of data repositories in the state in early 2002. There were over 250 databases that contained information about individuals and over 175 that contained records about businesses, and this didn't include spreadsheets, Access databases, and other minor repositories. The problem is that we couldn't really tell which records in one database were related to the records in another.
As you think about the data in your organization, you'll probably find this story resonating with your experiences. Most organizations have volumes of data that has simply grown and expanded over time in a largely unguided fashion. While we like to think of databases as large repositories of multi-purposed information, most databases are simply persistent data storage for a single application.
Data is important in an identity management architecture (IMA), because identities are usually stored on computers as digital records of some kind. This chapter is about building data architectures for identity data. A data architecture links data to specific business goals and processes, categorizes it, identifies metadata, and defines important details about how it is represented.
Build a Data Architecture#
Building a data architecture for identities requires that we consider three different concepts: categorizing identity data, exchanging identity data, and structuring identity data. Figure 16-1 shows these three components along with some of the issues we'll address for each.
Components of a data architecture for identity#
A digital identity is a record that contains one or more names as well as attributes, preferences, and traits of some person or thing. For our purposes in this chapter, we'll restrict that definition to records that contain some unique identifier. Moreover, we'll just refer to the preferences, traits, and attributes as "properties."
These identities might refer to people, applications, manufactured goods, or other things that the organization cares about. If I asked you to list the identities in your organization, you might include only records that identify people such as employee and customer records. You might miss billing records and might not even think of manufacturing data as a kind of identity data.
We commonly think of "identity management" being about authenticating and authorizing people to take certain actions, but for the purposes of this chapter, we need to expand this definition.
Processes Trump Data#We frequently hear the comment that an organization's data is one of its most valuable assets. I think that's probably true, but the fact is that data projects never go anywhere. There's never money for a project to clean up the data and create enterprise data repositories. What do businesses care about? Processes, because processes achieve business results.
When we build data architectures, we're doing so in support of business processes.
Processes Link Identities#The first step in creating data architectures is to gather baseline information about identities in your organization. The baseline inventory identifies high-level data sources and documents pertinent information about them. To do that, we'll start with the processes that we identified in the process inventory and find the identity records that are critical to performing those processes. Let's run through an example to see how that might work.
Suppose you've identified "employee provisioning " as one of the processes important to your organization. That process starts when the decision is made to hire an applicant and includes steps such as the following:
- . Enter applicant data into the HR system.
- . Identify the hiring manager (e.g., the person who will be the new employee's boss).
- . Create a 401K account, payroll account, health insurance, and other benefits.
- . Assign the employee an office.
- . Order, install, and set up a computer including application software.
- . Set up an email account and access.
- . Set up network access.
- . Order and install a phone.
- . Update the right directory or directories with the new telephone number, email address, and office location.
- . Set up a voicemail account.
- . Establish access controls for all of the enterprise applications that the employee will need to work.
- . Issue a credit card for travel expenses.
- Employee record in the HR system. The employee has an SSN that serves as a unique identifier. Most large organizations also assign a unique employee ID.
- Record of offices with their location, size, and other properties. Offices might be assigned to certain groups or departments and would need to be associated with their occupants. The identifier is proprietary.
- Record of the computer and any installed software. These all have serial numbers. In addition, each network adapter has a MAC address that uniquely identifies it.
- Email and network access record in the proper directory or directories. There would be an email address and network identifier assigned as part of this record.
- The phone system has records that identify phone lines. The telephone equipment has a serial number, and the phone number itself represents an endpoint on the telephone network. The phone number would need to be mapped to the office where it's installed.
- The voicemail account would have an identifier and be tied to the phone number.
- Each enterprise application that the employee needs access to (such as the CRM system) would have some way of identifying the employee.
- The credit card represents a separate identity document that the company or its payment partner may track. The credit card number is the unique identifier in this case, and your financial services will likely assign others for online access to the account information.
As we consider a single business process, it's amazing to see how it can translate into multiple identity records. Process links these records even if the infrastructure does not.
The Identity Data Inventory#
Going through the process inventory to find identity records creates an initial identity data inventory . The data inventory is a listing of each identity data source and its contents and other important meta-information. The following attributes should be recorded as part of creating the inventory.
Name of the data store#This could be something created at the time of the inventory, or it may be a name that the owner or custodian uses to identify the data record.
Version#Your organization might not think of data stores as something that is under version control, but it's a useful idea and one you can start with the baseline inventory.
Definition#Define the data record and its purpose; limit to one or two sentences.
Process#What process (or processes) does the data record support? As in who consumes or uses this data.
Identifiers#What fields in the data record are uniquely identifying. There can be more than one identifier. For example, in an HR data record, the SSN and employee number are both identifiers. Many database records have a record number that is unique but has no real identity meaning other than for the record itself. Avoid listing this unless it's meaningful to the business.
Properties#These are all of the other attributes, traits, preferences, and characteristics that are included in the record and are used as part of the identity.
Owner#Who is the owner of the information?
Custodian#Who is the custodian of the information?
Notes#Any other relevant information about the identity record should be recorded here.
One approach to creating the data inventory is to have business units create inventories for identity data they own and then to aggregate the results. Be careful, however, to train the people performing the inventory so that you get consistent results. You will also have to take care to ensure you don't miss data sources that are jointly owned or owned at the enterprise level. As we've seen, business function and, hence, process do not always fall within neat organizational boundaries. You may find that it's more convenient to create the baseline data inventory in conjunction with the process evaluation that we discussed in the last chapter, instead of doing it as a separate step.