Introducing Microsoft System Center Operations Manager 2007

Contents

An Overview of System Center Operations Manager 2007

The Challenges of Monitoring and Management

Addressing the Challenges: What System Center Operations Manager 2007 Provides

A Common Foundation for Monitoring and Managing Desktops, Servers, and More

Customizable, Model-Based Management

Service Monitoring for Distributed Applications

Management Servers and Agents

Understanding Management Servers

Root Management Servers

Management Groups

Understanding Agents

Installing Agents

How Agents Gather and Send Information

Working with Many Agents: Managing Clients

Working with No Agents: Agentless Management

Accessing Management Information

Controlling Access: Role-Based Security

Management Packs

What Management Packs Do: Some Examples

Describing What’s Managed: Objects

Tracking Object State: Monitors

Other Elements of Management Packs

Modifying an Installed Management Pack

Managing Complete Applications: Service Monitoring

Conclusion

About the Author

An Overview of System Center Operations Manager 2007

Modern organizations of any size all have one thing in common: their computing environment isn’t simple. Everybody’s world contains hardware and software of different vintages from different vendors glued together in different ways. Managing this world—keeping these diverse, multifaceted systems running—is unavoidably complex.

Despite this complexity, two fundamental management requirements are clear. The first is the need to monitor the hardware and software in the environment, keeping track of everything from application availability to disk utilization. The operations staff that keeps the environment running must have a clear and complete picture of the current state of their world. The second requirement is the ability to respond intelligently to the information this monitoring produces. Whenever possible, this response should avoid incidents by addressing underlying problems before something fails. In any case, the operations staff must have effective tools for fixing failed systems and applications.

The goal of Microsoft’s System Center Operations Manager 2007 is to meet both of these requirements. The successor to Microsoft Operations Manager (MOM) 2005, the product is focused on managing Windows environments, including desktops, servers, and the software that runs on them. It can also be used to manage non-Windows systems and software, devices such as routers and switches, and more. Released in early 2007, Operations Manager aims at providing a solution for monitoring and managing a Windows-oriented computing world.

The Challenges of Monitoring and Management

Think about what’s required for effective monitoring and management in an enterprise computing environment. Keeping track of what’s happening on the myriad of machines in an organization means dealing with diverse software, including desktop and server operating systems, databases, web servers, and applications, together with all sorts of hardware, such as processors, disk drives, routers, and much, much more. All of these components must inform the people managing this world of their status.

This is bound to generate lots of information. The operations staff will certainly need some kind of dedicated interface that organizes this plethora of data into understandable graphics and numbers. They’d also probably like a Web version of this interface, an option that would increase their ability to manage this world remotely. And for some scenarios, such as creating scripts, a command line interface is the best choice. Yet while a variety of user interfaces are required for working with management data as it’s generated, the ability to generate reports on historical data is also essential. How can the people responsible for maintaining this environment know how they’re doing without some way to track their history? Real-time interfaces are certainly important, but so is an intelligent way to examine long-term trends.

Here’s another challenge: No single organization—vendor or end user—can afford to have people on staff who are expert in managing each part of a complex IT world. Instead, a management and monitoring tool must provide a way to apply packaged expertise via software. The product must also be capable of providing a platform for third parties to create this kind of package.

And there’s more. The IT Information Library (ITIL) and the Microsoft Operations Framework (MOF) both promote an IT Service Management (ITSM) approach. Rather than focusing solely on the details of managing individual technologies, ITSM emphasizes the services that IT provides to the business it’s part of. Given that the business people who pay for it see IT entirely in terms of these services, this approach makes perfect sense. Yet doing it successfully requires explicit support for defining and monitoring distributed applications as a whole. An organization’s email service is made up of many parts, for example, including email server software, database software, and the machines this software runs on. Providing a way to monitor and manage this combination as a single service improves IT’s ability to offer the performance and reliability that the business expects.

Effectively monitoring and managing a modern computing environment requires addressing all of these problems. The next section gives an overview of how Operations Manager does this.

Addressing the Challenges: What System Center Operations Manager 2007 Provides

While Operations Manager provides a variety of functions, three of them stand out as most important: providing a common foundation for monitoring and managing desktops, servers, and more; taking a customizable, model-based approach to management; and supporting service monitoring of complete distributed applications. What follows describes each of these three.

A Common Foundation for Monitoring and Managing Desktops, Servers, and More

Despite the diversity of enterprise computing, a single product can provide a broad foundation for a significant part of an organization’s management challenges. Understanding how Operations Manager does this requires a basic grasp of the product’s architecture. The figure below shows its major components.

As the diagram shows, the software that comprises Operation Manager is divided into servers and agents. The servers, which run on Windows Server 2003 and the forthcoming Windows Server codename “Longhorn”, divide into two categories:

n The Operations Manager management server. This server relies on an operational database, and it’s the primary locus for handling real-time information received from agents. As the diagram shows, it also provides an access point for the product’s various user interfaces.

n The Operations Manager reporting server. This server relies on a data warehouse, a database capable of storing large amounts of information received from agents for long periods. The reporting server can run predefined and custom reports against this historical data.

Unlike management and reporting servers, the Operations Manager agent runs on both client and server machines. This agent runs on Windows 2000, Windows XP, and Windows Vista clients, as well as Windows 2000 Server, Windows Server 2003, and Windows Server codename “Longhorn”. To manage non-Windows devices, such as routers and switches, Operations Manager managers and agents can connect to them using SNMP or the newer WS-Management protocol. There’s also an option that allows retrieving basic management information from Windows systems that aren’t running agents.

Agents send four primary kinds of information to management servers:

n Events, indicating that something interesting has occurred on the managed system. An agent might send an event indicating that a login attempt has failed, for instance, or that a failed hardware component has been brought back to life.

n Alerts, indicating that something has happened that requires an operator’s attention. For example, an agent might send an event for every failed login, but send an alert if four failed logins occur within three minutes on the same account.

n Performance data, regularly sent updates on various aspects of the managed component’s performance.

n Discovery data, information about discovered objects. Rather than requiring an operator to explicitly identify the objects to be managed, each agent can discover them itself, a process that’s described later in this paper.

All of this information is sent to the operational database and/or the data warehouse, and all of it can be accessed through Operations Manager’s user interfaces. Operations staff will most often rely on the Operations Manager console, a Windows application that can display events, show alerts, graph performance over time, and more. A large subset of the Console’s functions can also be performed through the Operations Manager Web console, providing browser-based access to this information. And for creating scripts or for people who just prefer a command line interface, the product also allows access via the Operations Manager command shell.

This broad foundation is essential for modern monitoring and management. It’s not enough, though—more is required. How, for instance, can a single product address the diversity of managed components in a typical enterprise? How Operations Manager addresses this is described next.

Customizable, Model-Based Management

Any attempt to address the broad problem of monitoring and management in a single product faces an unavoidable limitation: No one vendor can have all of the specialized knowledge required to handle the wide range of software and hardware that its customers use. Instead, what’s needed is a generalized framework for packaging specialized management knowledge and behavior, packages that can then be plugged into a common management foundation.

This is exactly what’s provided by Operations Manager’s management packs (MPs). Each MP packages together the knowledge and behavior required to manage a particular component, such as an operating system, a database management system, a server machine, or something else. These MPs are then installed into Operations Manager, as the figure below shows.

Since creating an MP requires specialized knowledge about managing the component this MP targets, each one is typically created by the organization that knows the most about that component. As the figure above suggests, for example, Microsoft has created MPs for client and server versions of Windows as well as for Exchange Server, SQL Server, and other Microsoft products. Other vendors have created MPs for non-Microsoft software and hardware about which they have specialized knowledge. Hewlett-Packard provides an MP for its ProLiant server machines, for example, while Dell offers MPs for its servers.

As the figure shows, each MP can contain several things, including the following:

n Monitors, letting an agent track the state of various parts of a managed component.

n Rules, instructing an agent to collect performance and discovery data, send alerts and events, and more.

n Tasks, defining activities that can be executed by either the agent or the console.

n Knowledge, providing textual advice to help operators diagnose and fix problems.

n Views, offering customized user interfaces for monitoring and managing this component.

n Reports, defining specialized ways to report on information about this managed component.

When an MP is installed, its various parts wind up in different places. The monitors and rules, for instance, are downloaded to the agents on the appropriate machines, while the knowledge and reports remain on the management and reporting servers. Wherever its contents are used, the goal is always the same: providing the specialized knowledge and behavior required to monitor and manage a particular component.

To get a sense of how the various components of an MP might work together, imagine that an application running on some managed system notices that it lacks sufficient disk space to function. This application writes an event into the system’s event log indicating this, then shuts itself down. The Operations Manager agent on this system continually monitors the event log, and so it quickly notices this event. If the application’s MP contains an appropriate rule, the agent will send a specific alert to the management server when this event occurs. The operator sees the alert in the Operations Manager console, and he also sees the MP-provided knowledge associated with this alert. Reading this knowledge, he learns that he should direct the agent to run a task that deletes the temp directory on the application’s machine, then restart the application. This entire process, from detection of the problem to its ultimate resolution, depends on the information contained in the MP.

Of course, it would be better to avoid this problem in the first place. One way to do this is to keep an eye on things such as free disk space on the machine hosting this application, then inform an operator when a problem looms. Doing this requires creating a model of what a healthy managed component looks like, then indicating any deviation from its normal state. In Operations Manager, this is exactly what monitors are for. Each MP defines a set of objects that can be managed, then specifies a group of monitors for those objects. These monitors keep track of the state of each object, making it easier to avoid problems like application crashes before they occur. In the language of Operations Manager, the set of monitors for a managed object comprise a health model for that object. By tying together the health models for the various objects on a system, an overall health model can be created that reflects the state of the system as a whole.

Allowing each MP to define its own set of managed objects makes sense. Yet the best an MP’s creators can do is define generic objects; they can’t know exactly what’s on any given system. For example, the SQL Server MP defines an object representing a database. When this MP is installed on a real system, that machine might have one, two, or more actual databases. How are these concrete instances of the MP’s generic object type found? One approach would be to require an operator to identify each instance manually, a task that nobody would enjoy. Instead, Operations Manager allows each MP to include specific discovery rules (also called just discoveries) that let the agent locate these instances. The goal is to make finding the things that need to be managed as straightforward as possible.

Providing this generalized approach to defining management knowledge and behavior requires a common technology to create these definitions. Like other products in the System Center family, Operations Manager relies on an XML-based language called the System Definition Model (SDM) to do this. All MPs are expressed in SDM, providing a standard format for their creators to use. Defining MPs with SDM also implies a more general, less Windows-specific infrastructure for management. Although Operations Manager remains a Windows-oriented product, it’s significantly less wedded to the Microsoft world than was its predecessor.

In fact, SDM is the basis of an in-progress standard known as the Service Modeling Language (SML). Embraced by Microsoft, BEA, BMC, CA, Cisco, Dell, HP, IBM, and Sun, SML will provide a vendor-neutral foundation for describing managed systems. The value of a model-based approach to this problem is clear, and it’s a fundamental aspect of Operations Manager.

Service Monitoring for Distributed Applications

The goal of virtually every IT department is to provide services to the organization it’s part of. Monitoring and managing the various parts of the IT environment is an essential aspect of doing this. Yet business people don’t really care about the state of individual components. Instead, their concern is for the services they see. Can they send and receive email? Are the applications they need right now running effectively? This service-based concern makes sense, since it reflects what’s most important to the organization as a whole. Yet each service is likely made up of a number of underlying components, including both software and hardware. Looking at each of an application’s components separately isn’t enough.

What’s needed is a way to monitor and manage a distributed application—the combination of components that underlie a particular service—as a whole. Operations Manager provides this through service monitoring. The diagram below illustrates this idea.

Think, for example, about a custom ASP.NET application. As the figure suggests, this application’s main components might include IIS, the application itself, SQL Server, a specific database, the network used to connect these components, and more. From a technology point of view, all of these are distinct pieces, and without some way to group them together, an operator would be hard pressed to know that they actually comprise a single distributed application. From the perspective of a business user, however, all that matters is the service this entire application provides. If any part of it is malfunctioning, the entire service is likely to be unavailable. Letting the operator know that, say, a failed disk drive is actually part of this business-critical application can help that operator understand the importance of getting it back on line as quickly as possible. Rather than viewing their world as discrete components, operations staff can instead have a perspective that’s closer to what their customers see: a service-based view.

Management Servers and Agents

Getting a grip on Operations Manager requires understanding a number of different concepts. None of these ideas are more fundamental than management servers and agents, and so the place to begin is by looking more deeply at these bedrock parts of the product.

Understanding Management Servers

Management servers are at the center of Operations Manager. Agents send them alerts, events, performance data, and more, and they’re also the access point for the product’s user interfaces. While the basic architecture is straightforward, as shown earlier, understanding Operations Manager requires knowing a bit more about management servers. This section takes a slightly more detailed look at this important part of the product.

Root Management Servers

Every agent communicates with exactly one management server. While many organizations could potentially meet their needs with a single management server, it’s common to install two or more management servers, then divide agents among these servers. In this case, the first server that’s installed becomes the root management server. This root server communicates with any other management servers that have been installed, and it also communicates with its own agents.

The root management server performs several unique functions. All of Operations Manager’s user interfaces connect only to the root management server, for example, as shown earlier. Given this central role, it’s common to cluster a root management server. (All of these connected management servers rely on a single operational database, so it’s also a good idea to cluster it.) If a root management server fails, an administrator can promote another management server, allowing it to become the new root.

Management Groups

A collection of Operations Manager servers and agents is known as a management group. Each management group contains one root management server, zero or more other management servers, an operational database, and zero or more agents. The figure below illustrates how the various parts of a management group fit together.

Operations Manager can support many agents in a single management group—the exact number depends on a variety of factors--but organizations that need more than this can install multiple management groups. And although it’s neither required nor shown in the diagram, a management group can also contain a reporting server.

As mentioned earlier, each agent is assigned to one primary management server. If its assigned server becomes unavailable, an agent will automatically begin communicating with another management server in its management group. When its primary management server reappears, the agent will switch back to it. In both cases, no administrative intervention is necessary. While an administrator can explicitly control which management server an agent should communicate with if its primary server fails, this isn’t required.

Another option is to create tiered management groups. With this approach, the root management server in a local management group is associated with the root management server in a connected management group. Once this is done, it’s possible to monitor and manage the connected group from the console of the local group. There are some limitations—the console doesn’t support performing all administrative actions in the connected group—but this option can make sense in some situations. For example, creating tiered management groups can be a useful way to connect groups within any large Operations Manager deployment. It can also make sense when connecting a management group at headquarters with a subordinate management group in a branch office, particularly if the branch is accessed via a lower-speed connection.

Especially in large enterprises, installing more than one systems management product is common. Making a multi-vendor management environment work well can require connecting these products together. To allow this, Operations Manager includes the Operations Manager Connector Framework (MCF). MCF allows other management products to exchange alerts and other information with an Operations Manager management server, making it easier to use multiple tools in a single organization.

Some organizations, such as government agencies, have a legal requirement to track failed logins, multiple login attempts, and other security-related aspects of their IT environments. To provide direct support for this, Operations Manager includes the Audit Collection Service (ACS). This service relies on its own database maintained by a management server. If ACS is used, relevant security information is sent to the ACS database rather than to the standard operational database, making it simpler for organizations to comply with their legal mandate.

Understanding Agents

Management servers are an important part of Operations Manager, but they’d be useless without agents. This section takes a closer look at what agents do and how they do it.

Installing Agents

Before an agent can do anything useful, it must be installed on a target computer. One option is to install agents manually on individual machines. Yet especially in a large organization, installing agents individually on all of the managed systems can be an onerous task. To make installation easier, Operations Manager provides a wizard that lets an operator query Active Directory for machines matching some criteria, then install agents on all of them. And for organizations that use Microsoft Systems Management Server (SMS) or its successor, System Center Configuration Manager 2007, either of these products can also be used to install agents.

However it’s installed, a new agent needs to determine which management server it should communicate with. For installations that use Active Directory, the wizard allows specifying the management server each agent should talk to. For manually installed agents, the person performing the installation can explicitly specify a management server. If no server is specified, a manually installed agent as well as one installed by a tool such as Configuration Manager will contact Active Directory when it first starts running to learn which management server it should communicate with.

How Agents Gather and Send Information

Once it’s installed, an agent’s behavior is defined entirely by the management packs that are downloaded to that machine. The monitors, rules, and other information in each MP tell the agent what objects it should monitor and determine what information it sends to a management server. To acquire this information, agents can do several things, including:

n Watching the event log on this machine. An agent reads everything that’s placed in this log.

n Accessing Windows Management Instrumentation (WMI). WMI is an interface to the Windows operating system that allows access to a variety of information about the hardware and software on a machine. This interface is Microsoft’s implementation of the Web-Based Enterprise Management (WBEM) standard created by the Distributed Management Task Force (DMTF).

n Running scripts provided by a management pack to collect specific information.

n Accessing performance counters.

n Executing synthetic transactions. A synthetic transaction accesses an application as if it were a user of that application, such as by attempting to login or requesting a web page. In some situations, such as with applications that generate little useful management information, synthetic transactions are the best way to learn about an application’s state. They can also be used to determine current characteristics, such as whether the login process is taking an abnormal amount of time.

Based on the rules and monitors in the management packs installed on its system, an agent sends events, alerts, and performance data to its associated management server. The management server writes all of this information to both the operational database and the data warehouse, making it available for immediate use and for creating reports. Management servers can also communicate with agents, instructing them to do things such as change a rule in a management pack, install a new management pack, or run a task.

With MOM 2005, 90% of the traffic between an agent and a management server was commonly performance data sent by rules. To minimize this traffic (and the storage requirements it implies), Operations Manager allows setting the relevant rules in a management pack so that no new performance information is transmitted unless a value has changed by, say, at least 5%. The management server can then infer that nothing has changed significantly if it receives no new information. For example, think about an agent that’s monitoring free disk space on a server. This number is likely to be the same over many hours, and so it makes sense to send performance information only when a significant change occurs.

It’s worth pointing out that not every shutdown of an application or machine indicates a problem; scheduled maintenance can require scheduled downtime. To prevent an agent from sending needless alerts in this case, an operator can put some or all of the objects on a system into maintenance mode. If just one database needs fixing, for example, that single object could be placed into maintenance mode. Similarly, an entire application or an entire machine can be placed in maintenance mode if necessary. The goal is to avoid distracting operators with meaningless alerts during scheduled shutdowns.

Like any other software, agents execute under some identity. A simple approach would be to have the entire agent run under a single identity using a single account. While this works, it’s not an optimal solution (although it is what’s done in MOM 2005). The problem with this is that the agent’s identity needs to have the union of all permissions required by the management packs installed on that system. The result is that agents tend to have identities with lots of privileges, something that doesn’t make most IT operations people especially happy. To avoid this problem, Operations Manager introduces the idea of run-as execution. Rather than assign a single identity to an agent, an administrator can instead define individual identities—separate accounts—for different things this agent does. If necessary, individual parts of a management pack, such as a monitor or a rule, can be assigned specific identities, then run as that identity. Rather than assigning agents a single account with many privileges, each function the agent performs can instead have only the permissions it needs.

Agents communicate with management servers using a Microsoft-defined protocol. Each agent maintain a queue of information to be sent, which allows prioritizing traffic—alerts always go to the front of the queue. If connectivity is lost, the queue stores information until the management server is reachable again. The communication protocol between agents and management servers also provides compression to reduce the bandwidth required, along with Kerberos-based security. Both the management server and the agent must prove their identities using Kerberos mutual authentication, and information sent between the two is encrypted using the standard Kerberos approach.

For agents that aren’t part of a Windows domain, such as those running on web servers in a DMZ, security between agents and managers can use certificates installed on both sides rather than Kerberos. A management server can also use certificate-based security to communicate with another management server. Referred to as a gateway server, this second management server might be in an untrusted Active Directory forest, for example. This option is also useful for a service provider that wishes to manage other organizations by connecting to their management servers across the Internet.

Working with Many Agents: Managing Clients

Managing desktops is different from managing servers in a number of ways. One of the most important—and most obvious—is that there are a lot more desktop machines than there are servers. To allow operators to work effectively with large numbers of clients, Operations Manager provides a couple of options.

One approach, called aggregate client monitoring, exploits the fact that operators typically don’t need to monitor the exact state of every client machine. Instead, an operator can define client groups, then keep track of the state of the entire group. She can still run tasks and do other things to individual machines, but having a single state available for all of the machines in the group makes monitoring easier. This option also allows running reports on client groups showing things such as the percentage of machines that have acceptable performance each month or the number of systems that experienced down time when upgraded to Office 2007. In fact, it’s likely that most organizations will find that the ability to create these broad reports is the most valuable aspect of aggregate client monitoring.

A second option for working with clients effectively is mission-critical client monitoring. Here, an operator chooses specific desktop machines to monitor directly. Operations staff in a financial services firm might choose to include each trader’s desktop machine, for example, while those in a retail environment might specify all of the critical point-of-sale systems. This approach lets the most important clients be monitored directly without requiring that every desktop machine get this level of attention.

Combining the two approaches is also possible. IT operations staff might use aggregate monitoring to manage most desktops in groups, for example, while still choosing specific clients for mission-critical monitoring. And like most things in Operations Manager, how a client is monitored is determined by the management pack that’s installed on that system.

Working with No Agents: Agentless Management

Not everything that needs to be managed is capable of running an Operations Manager agent. Because of this, it’s also possible to manage systems without agents. The absence of an agent limits what can be done, but this approach can still be useful in some situations.

One option, agentless exception monitoring (AEM), relies on the Windows Error Reporting client, a standard part of Windows, rather than an Operations Manager agent. This client notices application and operating system crashes, then asks the user for permission to send this information to Microsoft (a prompt with which most people are familiar). With AEM, this information can be sent to Operations Manager servers, letting operations staff examine it, create reports based on it, and decide whether it should be forwarded to Microsoft’s main database. While AEM provides only limited information about Windows machines, it does offer a way to track some important aspects of a machine and the applications running on it.

Another option, one that targets non-Windows devices, is the ability to monitor other systems using SNMP or WS-Management. Using this approach, Operations Manager can work with routers, switches, printers, and anything else that supports either of these standard protocols. The diagram below shows a simple illustration of how this might look.

As the diagram shows, both management servers and agents are capable of monitoring devices (although agents are the most common choice). While Operations Manager provides built-in support for SNMP and WS-Management, this kind of monitoring also requires installing a management pack that knows how to work with the device being monitored. Management packs that allow this are available for equipment from Cisco, HP, and other vendors.

It’s fair to say that the focus of Operations Manager is monitoring and managing Windows desktops and Windows servers. Still, the ability to work with other kinds of devices can also be important. The product’s support for SNMP and WS-Management, together with the appropriate management packs, makes this possible.

Accessing Management Information

Agents generate a large amount of information. To let operations staff use this information effectively, Operations Manager provides two options: immediate access via interactive user interfaces and the ability to run historical reports. This section looks at both.

User Interfaces

As shown earlier, Operations Manager includes three distinct user interfaces: the console, the Web console, and the command shell. All three are useful, and understanding Operations Manager requires knowing something about each one.

The Console

The Operations Manager console is the primary interface for most users of the product. Information sent by agents, including events, alerts, and performance data, can be written into this database, and so the console presents a current view of what’s happening in the environment. Operations staff can also run reports from the console, providing them with a historical perspective.

Presenting the vast amount of available information in a coherent way is challenging. Operations Manager addresses this challenge by displaying monitoring information in a number of different views, each accessible through the console. The best way to get a sense of what this looks like is to see some of the most important of these views.

The screen shot below shows the console’s State view. This example shows the state of SQL Server’s database engine, but State views are available for many objects in the environment. As this screen shows, this view provides a quick way to see what parts of the object are healthy (those with green labels) and which are not (those with red labels). The content of this view is derived from information sent by the monitors defined in the management pack for this component.

Having a summary picture of an object’s state is useful. It’s also important to know when something has happened to an object that requires attention. The console’s Alerts view provides this. As the screen shot below illustrates, this view shows the active alerts in this managed environment. In this example, two databases are offline, and so two alerts are displayed. Details for one of these alerts are shown in the lower pane, including knowledge supplied by the management pack for this component. As described earlier, the goal of this knowledge is to help the operator resolve whatever is causing this alert.

Both monitors and rules can send an alert, and either one can also send an event. Operations Manager provides an Events view to display these events, although that view isn’t shown here. Performance data, however, is sent solely by rules. To display this information, the console provides a Performance view, an example of which is shown below. This example graphs the performance of a server machine’s processor, but management packs can include rules that send a variety of other performance data.

Having different views is nice—in fact, it’s essential—but being able to see several things at once is also useful. To allow this, the Operations Manager console lets its users create Dashboard views. Each dashboard shows a customized combination of other views. For example, the screen shot below shows a dashboard created to monitor SQL Server, and it includes state information, performance data (this time showing free disk space), and more. In a typical organization, dashboards are likely to be a common approach to monitoring the computing environment.

The Web Console

Most often, an operator will use the Operations Manager console, which typically runs on a machine that’s inside an organization’s firewall. Yet what happens when the operator is at home or in a hotel room, but still needs access to Operations Manager? The Web console was created for situations like these. Using this tool, an operator can perform a (large) subset of the functions possible with the main Operations Manager console.

Like the main console, the Web console provides a variety of views, including a State view, an Alerts view, a Performance view, a Dashboard view, and more. In general, these views look much their counterparts in the main console. Here’s the Web console version of the Alerts view, for instance:

As in the main console Alerts view, this one shows active alerts, then allows a closer look at an alert’s details. This once again includes any knowledge supplied by the management pack about how to resolve this alert. Other Web console views provide similar information with similar layouts to the corresponding main console view.

The Web console isn’t intended to replace Operations Manager’s main console. Instead, providing a Web-based option lets some of the most important management functions be performed from any place with an Internet connection. The goal is to make life better for the people who keep distributed environments running.

The Command Shell

Graphical interfaces are usually the right choice for monitoring an environment. How else could a range of diverse information be displayed in an intelligible way? Given this reality, it’s fair to say that the Operations Manager console and the Web console will be the most popular interfaces to this product. Yet there are cases where a standard graphical interface isn’t the best option. While it’s great for displaying information, a point-and-click approach can be inefficient and slow for running commands or creating scripts. In situations like this, a traditional command line interface can be a better choice.

To allow this, Operations Manager provides the command shell. Built using Microsoft PowerShell, it gives users a command line interface as well as the ability to create programs called cmdlets. Operations Manager provides a set of built-in cmdlets, such as get-Alert to access alerts on a particular managed component, get-ManagementPack to learn about an installed management pack, and others. Its users can also create their own cmdlets. For example, suppose an operator wishes to disable all rules in all management packs that target databases. Doing this manually via the console would be possible, but it would also be painful. Writing a script for this, perhaps relying on one or more built-in cmdlets, would probably be easier.

Other Options

With three user interface options—the console, the Web console, and the command shell—it might seem like Operations Manager covers all of the possible bases. But what if a third party, such as another software firm or in-house developers at a large organization, wishes to create a custom interface to the product? To allow this, Operations Manager provides a software development kit (SDK). This set of programmable interfaces makes available all of the functionality provided by the console, and so third parties can create software that does anything the console allows. While this approach probably won’t be a mainstream choice, it’s an important option to have in some cases.

Interactive user interfaces are certainly important—they’re essential—but nobody spends all of their time in front of a screen. Yet problems can arise that require attention even when no one sees an on-screen alert. To handle cases like this, Operations Manager allows operators to determine which alerts should send notifications. For example, an operator might wish to receive a notification for any alert generated by a Windows server system in her firm’s main office that has remained unresolved for more than 60 minutes. This notification might be sent as an email, an instant message, an SMS text message, or something else, and it provides a way to reach an operator who’s not currently sitting at the console. The goal is to allow people to learn about problems that need their attention no matter where they might be.

Reporting

Interactive access to management information is fundamental to effective management. Yet seeing trends and understanding long-term behavior of the managed components requires more than an interactive interface. The ability to generate reports is also essential.

As shown earlier, reporting in Operations Manager depends on a reporting server. This server is built on SQL Server Reporting Services, although it makes some additions to this base technology. Relying on the Operations Manager data warehouse, a reporting server can be installed on the same machine as a management server or on its own machine. Management servers send data directly to the data warehouse—there’s no need to move it manually from the operational database before running reports.

Operations Manager provides a number of built-in reports. Among others, these generic reports include:

n Performance reports, which can display the performance of various things over a specified period of time.

n Alert reports, providing a view into the alert histories of managed components.

n Event reports, allowing long-term tracking of events sent by a component.

n Availability reports, showing the history of availability for managed components.

For example, the performance report below shows CPU utilization on a Windows Server machine over a ten-day period. As in SQL Server Reporting Services, reports can be created as PDF files, as shown here, or in other formats.

For all of the built-in reports, IT operations staff can determine the components that should be included in the report, set the time span covered (including defining relative dates, e.g., “two days ago”), and control other options. Once they’re defined, reports can be run on demand or scheduled to run regularly. A performance report might be set to run at 10 pm each Sunday night, for example, while availability reports might run daily or at other intervals.

Operations Manager reports can also be used in other ways. A report can be interactive, for instance, so someone looking at an availability report might click on a particular machine in that report, then be presented with the console’s current State view for this machine. It’s also possible to see other views, execute tasks, and run other reports directly from within the current report.

Although reports must be run from the console—the Web console can’t be used—their output can be accessed directly from the Web. A report can be sent to a document library in Windows SharePoint Services, for example, making it easier for IT managers and others who don’t commonly have direct console access to view them. In fact, some of Operations Manager’s built-in reports are specifically targeted at IT managers rather than more technically oriented operations people.

Reporting is a core part of most management technologies, and Operations Manager is no exception. By providing its own data warehouse, the product allows an organization to maintain large amounts of historical data about the computing environment. By providing a range of built-in reports, it lets operations staff and others access this information in useful ways.

Controlling Access: Role-Based Security

Operations Manager potentially allows access to anything in the managed environment. Yet letting every user of this tool have full access to everything isn’t what most organizations want. There must be some way to control who can access information, run tasks, and do other management work. The approach Operations Manager uses to do this is called role-based security.

A role is defined as the combination of a profile and a scope. A profile defines what operations someone can do, while a scope specifies the objects on which she’s allowed to perform these operations. The intersection of the two yields a limited set of objects on which an operator is allowed to perform only a defined set of operations.

For example, a profile might allow someone to view alerts and run tasks, but not allow him to change any rules in management packs. Operations Manager provides a group of built-in profiles, including the following:

n Administrator: A user with this profile can do anything—all operations are allowed on any objects (in fact, scopes don’t apply to administrators). This profile will typically be limited to a small number of people in an organization, since most operations staff won’t need this level of power.

n Author: As the name suggests, a user with this profile can make changes to the environment. This includes things such as creating and modifying rules and monitors in installed management packs. An author can also monitor the environment, viewing events, alerts, and other information

n Operator: Users with this profile are expected to focus primarily on monitoring the environment. Accordingly, they can view events, alerts, and other information, but they’re not allowed to create new rules, define new management packs, or make other changes.

Whatever profile a role is based on, a person in that role can only perform operations on the objects specified by its scope (with the exception of roles where scopes don’t apply, such as administrator). For example, someone whose job was solely focused on keeping the email system running might be given a role with an operator profile and a scope containing only Exchange Server objects, views, and tasks. A co-worker whose responsibilities were focused on managing an SAP application might be given a role with an author profile and a scope containing only SAP-related objects, views, and tasks. Standard roles are also defined, such as a Report Operator role that controls the ability to run reports of various kinds. Used intelligently, roles can give an organization fine-grained control over what their operations staff is allowed to do.

Management Packs

Operations Manager provides a general foundation for monitoring and managing systems and software. This foundation knows nothing about how to do these things for specific components, however. Providing this specialized knowledge is the responsibility of management packs.

As mentioned earlier, each MP is described using the XML-based SDM. The Operations Manager console provides an authoring view that can be used to create MPs, and Microsoft has announced plans to provide a standalone MP authoring tool. And since they’re just XML, a determined author could theoretically create an MP using Notepad, although this isn’t likely to be a very productive approach. The important thing to understand is that an MP primarily contains configuration information, not executable software.

Each MP is installed into the operational database, with different parts of its contents then downloaded and used by different parts of Operations Manager. It’s worth nothing that the MP format used by Operations Manager isn’t the same as that used by its predecessor, MOM 2005. Because of this, MPs created for this earlier technology must be converted using a Microsoft-supplied tool, then re-installed into the operational database.

MPs are the brains of Operations Manager, determining exactly how monitoring and management happen for each managed component. Understanding them requires knowing something about what they contain and how they function. The rest of this section digs a bit deeper into the structure and contents of this important aspect of Operations Manager.

What Management Packs Do: Some Examples

To get a sense of what management packs do and of how diverse their functions can be, it’s useful to look briefly at a few examples. In all of these cases, the MPs provide information about the availability of this component—is it running?—and basics such as how much space is left on the disk it relies on. Each one also provides performance information about the component it targets. Beyond these basics, however, different MPs provide quite different things.

For example, MPs that target Windows server operating systems let an operator determine things such as which Windows services are running, which applications (if any) are crashing repeatedly, and whether IP address conflicts are occurring. Operations Manager also provides MPs for Windows client systems that let an operator learn whether this machine can access the Internet, read from and write to file servers, and perform other functions.

Just as Operations Manager supports both server and client operating systems, it also supports MPs for applications running in both places. Once again, the basics are the same, which each MP able to report whether an application is running, monitor its performance, and provide other standard information. Each one also provides application-specific information as well. For example, the MP for Exchange Server lets an operator see detailed information about mailboxes and messages, while the SQL Server MP provides data about the number of deadlocks that occur, the execution of stored procedures, and more.

Microsoft also provides an MP for Microsoft Office that lets an operator see whether Office applications are crashing or hanging, measure their resource consumption, and determine how responsive they are. He can also determine whether they’re working normally, such as checking whether Outlook can send and receive mail. All of these things are vitally important to users, and so they’re also important to operations staff.

Describing What’s Managed: Objects

Every management pack defines a model, described in SDM, of the component that it’s managing. This model is expressed as one or more classes, each representing something that can be monitored and managed. A class also defines attributes, values that can describe an object of that class. When an MP’s information is sent down to an agent, the agent relies on specific discovery rules in the MP to find the actual instances of the classes this pack defines. To discover these instances, an agent might look for a specific registry key, query WMI, or perform some other action. However it's done, the result is a hierarchy of objects, each of some class and with a specific set of attributes, representing the things this MP targets.

A note on terminology: When talking to management pack authors, Microsoft uses the terms “class” and “instance”, concepts that are familiar to developers. In the Operations Manager console, however, the terms “target” and “object” are used instead. This paper uses “class” rather than “target”, but the terms “instance” and “object” are used interchangeably throughout.

When an agent is first deployed, it knows that it’s running on a Windows computer. Accordingly, it creates an instance of a Windows computer object, then asks its management server for all rules (including discovery rules), monitors, and other relevant aspects of the Windows MP. Once they’re downloaded, the discovery rules can find other objects on this machine, such as SQL Server or a DNS server. The agent then requests that the rules, monitors, discoveries, and so on for these classes also be downloaded from the various MPs in which they’re contained. This process of progressive discovery continues until all managed objects on the system have been found.

Using this approach, an agent constructs a hierarchy of managed objects from multiple MPs. The figure below shows a simple picture of how this might look.

The MP for the Windows Server operating system defines a class for the computer it runs on, while the SQL Server MP defines classes representing a SQL Server database and an instance of SQL Server itself. In this simplified example, the computer object appears directly above two instances of SQL Server. One of these instances has two SQL Server database objects below it, while the other has only a single database object. All of this, along with populating the attributes associated with each object, is created automatically by the agent. And to catch any changes that occur, the discovery process is regularly re-run, including each time the machine reboots or the agent on that machine is restarted.

One more aspect of the figure above requires explanation: the green checks in each object. This symbol represents the object’s state, as illustrated in the console’s State view earlier. Each object’s state provides a quick summary of its condition. One object’s state can affect another, however, allowing a more intelligent perspective on what’s really happening. All of this depends on monitors, and how it works is described next.

Tracking Object State: Monitors

A primary goal of management is keeping software and the hardware it depends on running well. One way to do this is to wait until something fails, then fix it. This approach can work, but it’s usually not the best solution. Just as with our personal health, avoiding problems before they happen is much better. Rather than just react to potentially catastrophic failures, we can keep track of the health of the objects we’re managing to prevent serious problems whenever possible. In other words, we can create and monitor a health model for a component.

Operations Manager relies on monitors to do this. Each monitor reflects the state of some aspect of an object, changing as that state changes. For example, a monitor tracking disk utilization might be in one of three states: green if the disk is less than 75% full, yellow if it’s between 75% and 90% full, and red if the disk is more than 90% utilized. A monitor tracking a particular application’s availability might have only two states: green if the application is running and red if it’s not.

The author of each management pack defines what monitors it contains, how many states each monitor has, and what aspect of the managed object this monitor tracks. A monitor can determine what its state should be in several different ways. It might examine particular performance counters every 90 seconds, for example, or regularly issue a particular WMI query. A monitor might also watch the event log for events that affect its state. Think, for example, about a monitor representing whether an application can communicate with a particular database. The application might write an event to the event log when this communication fails, causing the monitor to change its state to red. When communication is restored, the application might write a new event indicating this, causing the monitor to change its state back to green. This example illustrates an important fact about monitors (and about application manageability in general): Applications should be written in certain ways to make themselves manageable—they should be instrumented—and the creators of management packs must know how to take advantage of this instrumentation.

Whenever a monitor changes its state, this change is sent to both the operational database and the data warehouse. This information allows the operator to see the current state of an object or group of objects. The console State view shown earlier is just a window into the set of monitors that represent the state of one or more managed objects.

All of the monitors for a particular managed object are organized into a hierarchy. Every monitor hierarchy has four standard monitors that live just below its root: performance, security, configuration, and availability. Each monitor defined by a management pack appears below one of these four, with the choice made by the management pack’s author. Any change in a monitor’s state can cause a change in the state of the monitor above it. This allows problems to bubble up through the hierarchy of monitors and the objects whose states they represent.

The figure above, which uses the same simple set of objects shown earlier, illustrates how monitor states can percolate through a hierarchy of managed objects. In this example Database 1 has a problem: perhaps a disk drive has failed. The monitor that watches this aspect of the database notices this and sets its state to red. This state change causes the standard availability monitor for this object to also set its state to red, which in turn sets the monitor for the object’s overall state to red.

The monitor for the database’s overall state also appears in the monitor hierarchy for its parent object, SQL Server 1. This object has two databases, however, only one of which has failed. Accordingly, this object’s availability monitor is set to yellow rather than red, a decision that was made by the author of the SQL Server management pack. This is once again reflected in the overall state for this object, the state of which is also set to yellow.

Just as the overall state of the database appeared in the SQL Server 1 monitor hierarchy, the overall state of this SQL Server instance appears in the Computer object. (Recall that Computer is defined in a different MP from the other objects shown here—MP boundaries don’t limit this kind of monitor interaction.) The Computer object also sets its availability monitor and its overall state to yellow, indicating that while the computer is functioning, there is a problem that needs attention.

Along with state changes, monitors can also send alerts, events, and even performance data. Their primary role, however, is to provide an accurate representation of state. By allowing the state of an object to depend on the state of objects below it, monitors provide an intelligent way to model the health of an entire system. Yet doing this requires creating an effective set of monitors and relationships between those monitors. Put another way, the people who create each management pack must be able to define an appropriate health model for the component this pack targets. To do this for its own products, Microsoft relies on input from the teams that create them, from its customers, and from its own services group.

Other Elements of Management Packs

Monitors are essential to every management pack. As mentioned earlier, however, MPs can also contain a number of other things. This section gives brief descriptions of these, including rules, tasks, knowledge, views, reports, and synthetic transactions.

Rules

Monitors and the health models they enable are fundamental to how Operations Manager does its work. There are cases, however, where monitors aren’t appropriate. Suppose a system needs to collect data regularly from several performance counters, for instance, then send this information to the management server and data warehouse. Because they’re designed to model states, monitors aren’t capable of doing this.

To address this kind of problem, MPs include rules. A simple way to think about rules is as an if/then statement. For example, an MP for an application might contain rules such as the following:

n If a message indicating that the application is shutting down appears in the event log, send an alert.

n If a logon attempt fails, send an event indicating this failure.

n If five minutes have elapsed since the last update was sent and the new value is more than 2% different from the previous one, send the value of the machine’s free disk space performance counter.

As these examples show, rules can send alerts, events, or performance data. Rules can also run scripts, allowing a rule to attempt to restart a failed application. Even the discovery process described earlier depends on specialized sets of discovery rules.

The distinction between monitors and rules can seem subtle, but it’s not: monitors maintain state, rules don’t. Unlike monitors, rules are just expressions of things an agent should do. If something changes the state of an object, it should be modeled using a monitor. If it doesn’t change an object’s state, an MP is likely to use a rule instead.

Tasks

A task is a script or other executable code that runs either on the management server or on the server, client, or other device that’s being managed. Tasks can potentially perform any kind of activity, including restarting a failed application, deleting files, and more, subject to the limitations of the identity they’re running under. Like other aspects of an MP, each task is associated with a particular managed object. Running chkdsk only makes sense on a disk drive, for example, while a task that restarts Exchange Server is only meaningful on a system that’s running Exchange. If necessary, an operator can also run the same task simultaneously on multiple managed systems.

Monitors can have two special kinds of tasks associated with them: diagnostic tasks that try to discover the cause of a problem, and recovery tasks that try to fix the problem. These tasks can be run automatically when the monitor enters an error state, providing an automated way to solve problems. They can also be run manually, since automated recovery isn’t always the preferred approach.

Knowledge

While tasks can help diagnose and fix problems, they aren’t much good to an operator unless she knows which ones to use in a particular situation. And like it or not, the skill level and experience of operations staff isn’t always what their managers would like it to be. By providing pre-packaged knowledge, a management pack can help less capable staff find and fix problems more effectively.

As shown in an earlier screenshot, knowledge appears as human-readable text in the console, and its goal is to help an operator diagnose and fix problems. Embedded in this text can be links to tasks, allowing the author of this knowledge to walk an operator through the recovery process. For example, the operator might first be instructed to run task A, then based on the result of this task, run either task B or task C. Knowledge can also contain embedded links to performance views and to reports, giving the operator direct access to information needed to solve a problem. And as with every aspect of a management pack, an MP’s knowledge must be created by people who deeply understand the component this pack addresses. If this isn’t the case, the information it contains isn’t likely to be of much use to the operators who depend on it.

Views

The Operations Manager console provides standard views for State, Alerts, Performance, and more, as shown earlier. Yet a particular MP might find it useful to include specialized views of its own. Since each pack defines its own unique set of objects, its creators might also choose to provide customized views that show only these objects, or only alerts on these objects, or some other more specialized perspective. MPs can contain custom views to address cases like these, and the people who create those MPs frequently take advantage of this ability: custom views are common.

Reports

Just as a management pack can contain views customized for the objects that MP targets, it can also contain custom reports. For example, a management pack might include a customized definition of one of Operations Manager’s built-in reports, specifying the exact objects that the report should target. The creator of a management pack can also build custom reports from scratch with the Report Definition Language (RDL) used with SQL Server Reporting Services. More complex reports can also have stored procedures, use indexes, and more, allowing reports that access lots of data to offer better performance.

Modifying an Installed Management Pack

No matter how good the creators of a particular management pack might be, there’s no way that they’ll set everything perfectly for all of the environments that MP will be used in. Maybe a particular rule doesn’t make sense in an organization, or perhaps an unnecessary alert is being sent. If an MP allows it, operators with the right security permissions are able to change some or all of what this MP defines to match their requirements. An MP can also be sealed, however, which means that it can’t be directly modified. All Microsoft-provided MPs are sealed, for example, as are some provided by third parties.

Whether or not an MP is sealed, an operator can create overrides. Rather than permanently changing the MP, an override makes a change without directly modifying the underlying monitor or rule or other element. This lets the operator revert to the MP’s original settings if necessary, an option that’s often useful to have.

Managing Complete Applications: Service Monitoring

In the beginning, systems management focused on managing servers. Today, this focus has expanded to include clients, applications, and more. Yet in most cases, the real goal is to manage the services that people actually use: email, line-of-business applications, and others. All of these services are provided by combinations of hardware and software, and so managing them as a whole requires some way to group together the relevant components into a single manageable entity. This is exactly what Operations Manager’s service monitoring allows.

Using the Distributed Application Designer, a tool accessible via the authoring section of the Operations Manager console, an administrator can define the various components that make up a service. This designer provides standard templates for defining common application types, such as messaging and line-of-business applications. These templates can be customized as needed to reflect the details of a particular service. Once the definition is complete, the tool generates a management pack for this service, complete with a monitor-based health model. This MP can then be installed on the relevant agents just like any other MP.

The screen shot below shows an example of how a distributed application defined in this way might look. Using the console’s Diagram view (which can also be applied to other aspects of the managed environment), it shows the various components that make up this particular service: a web application, the database it depends on, and more. As with any other health model, this one shows a hierarchy of objects, each with a state. In this example, one of the databases is in a red state, a problem that bubbles up to the overall state for this service.

Monitoring and managing individual components of a service is clearly useful. Yet adding the ability to manage all of these components as a unified group with a single health model can provide significantly more business value. As systems management continues to move toward ITIL-based IT service management, tools that directly support this service-oriented view become more important.

Conclusion

The computing environment of most organizations gets more complex every day. The tools IT operations staffs use to monitor and manage this environment must keep pace, adding the capabilities people need to do their jobs. This evolution has been reflected in Microsoft’s offerings. From its beginning as Microsoft Operations Manager 2000, a product focused on managing servers, System Center Operations Manager 2007 now supports client management, the ability to view distributed applications as unified services, state-based management with health models, and more.

It’s a safe bet that information technology will continue to advance. Even when those changes are improvements—and they usually are—they still increase the management complexity of the environment. Given this, no one should expect System Center Operations Manager 2007 to be the last word in systems management. Yet for organizations with a significant investment in Microsoft software, this tool can play a useful role in monitoring and managing their world.

Wednesday, February 27, 2013

Introducing Microsoft System Center Operations Manager 2007

Conclusion

No comments:

Post a Comment