Quantcast
Channel: Jimmy Harper's Operations Manager Blog
Viewing all 20 articles
Browse latest View live

ACS report: All events for specified user (even more than the built-in version)

$
0
0

A customer recently showed me that the built-in ACS report  "Forensic_-_All_Events_For_Specified_User" seemed to be missing some events.  This report queries the Adtserver.dvall5 view in the OperationsManageAC database and looks for values in the PrimaryDomain\PrimaryUser fields that match the user name that is entered for the report. 

The problem that my customer was seeing was that "Group Change" events (event id 632,633,636,637,650,651,655,656,660,661,665,666) were stored a little differently.  The Primary Domain/User fields contain the name of the user that was added to or removed from the group, and the Client Domain/User fields contain the name of the user that made the change.  So, when you enter the domain\username and run the report, any group change event returned is for events where that user was added to (or removed from) a group and not events where that user added other users to a group.  So which one do you want??  Maybe you want both, since they are both technically related to that user.  Every event is defined in the EventSchema.xml file, so are there any others that put the username in something other than the Primary User field (the other fields for usernames are ClientUser, Targetuser, and HeaderUser)?

It would make sense to use the Header Domain/User fields, since that is where we usually store the name of the user that 'caused' the event, but another issue that I've seen is (for reasons I do not know), sometimes security events are logged without the domain name, sometimes they have the NetBIOS domain name, and sometimes the fully-qualified domain name.  In these cases, some events would not be shown since you have to enter "Domain\Username" as a parameter.

So, I put together a custom ACS report that does the following:

  1. Separates the Domain and Username fields in the report parameters, so they can be entered separately
  2. Queries the HeaderDomain, PrimaryDomain, ClientDomain, and TargetDomain fields to get a list of domain names
  3. Includes a <ALL> option in the Domain parameter, which will allow you to include events where the domain name is empty
  4. Queries the Primary User/Domain, Client User/Domain, Target User/Domain and Header User/Domain fields for the Domain/Username

image

So, the end result is a report that will show ALL ACS events that include the specified user name.  This will include events where that user "did something" and events where "something was done" to that user account.

 

NOTE:
The queried domain list may take some time to populate, which makes the report take a while to open.  The code can easily be changed to include a static domain list, or to only contain "<ALL>".


Some Custom ACS Reports

$
0
0

Here are some ACS reports that I’ve written for various customers recently.  If you have ACS installed in the same Reporting Services instance as OpsMgr Reporting, then you can just import the attached Management Pack (CustomACSReports.xml).  Otherwise, you’ll need to import each .rdl file separately.

Here is a description of each report, along with some screenshots.

Event Search
This report allow the user to search for specific security events (selected from a pre-defined list). The user can select choose a specific server or search from events from all servers. The user can also specify search strings for the UserName or Description in the event. The report returns the top 100 events from the specified date range.

Authentication Failure Summary
This report queries the ACS database for Authentication Failure errors logged during a user specified time range (default is 1 week. The Event IDs queried for are Event ID 675 (Windows Server 2003) and Event ID 4771 (Windows Server 2008). The Events are grouped by the error code, and the error message and count for each error code are listed in a table. When the user clicks on one of the errors, the Authentication Failure Detail report is run for that error message.

Authentication Failure Detail
This report queries the ACS database for Authentication Failure errors with a specific error code logged during a user specified time range (default is 1 week. The Event IDs queried for are Event ID 675 (Windows Server 2003) and Event ID 4771 (Windows Server 2008). The Events are grouped by the IP Address and User Name, and the count for each is displayed in a table.

AD Object Changes
This report will show details of events related to changes in Active Directory. The report will query the ACS database for Event ID 566 / 5136 and show the Event Time, UserName, Domain Controller, Object Type, Object Name, accessed Properties, and the New Value of the property (Win2k8 only). The report also includes options to search for a specific string in the Object Name and/or Property Name.

Exchange AD Object Activity
This report shows events related to changes to Exchange Objects in Active Directory. The report will query the ACS database for Event ID 566 and 5136 within the specified time range, where the object name contains the string "CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=". The report groups the events by UserName, and shows the Event Time, Domain Controller, Object Type, Object Name, and accessed Properties. The report also includes an option to exclude changes made by computer accounts.

Account Lockout and Authentication Failure by User
This report accepts a date range, username, and domain and will list all occurrences of the following events for the specified user within the specified date range: Event 644 / 4740 (Account Lockout), Event 529 / 4625 (Unknown Username or Bad Password) , Event 675 / 4771 (Kerberos Pre-Authentication Failure), Event 680 / 4776 (NTLM Authentication Failure)

Account Lockout by User
This report accepts a date range, username, and domain and will list the time and computer name for all account lockout events (Event ID 644 / 4740) for the specified user within the specified date range.

Account Lockout Trends
This report accepts a date range and Domain name and will query for all Account Lockout events (Event ID 644 / 4740) within the specified date range and domain. The report contains charts which show average number of account lockouts for each hour of the day and each day of the week, and a trending chart which will show the number of account lockouts over the specified time range. The report also lists all of the lockouts in a table, grouped by Domain, User, Workstation, and Time.

Top 10 Accounts Failing Authentication
This report will query the ACS database for Authentication Failure events (Event ID 680 and 4776) within the specified time range. The report contains a table which will show the 10 user accounts with the most failures, grouped by Workstation and Error Code.

User Account Management Activity
This report will show the number of various account management events within a specified time range, grouped by domain. The events displayed are Accounts Changed (642,4738), Accounts Created (624,4720), Accounts Enabled (626,4722), Accounts Disabled(629,4725), Accounts Deleted (Event ID 630,4726), Names Changed (685,4781), Password Resets (628,4724), Accounts Unlocked (671,4767). Clicking on any of the numbers on the report will launch the "Automated Account Change Trends" report for more details.

ACS Events for Specified User
This report accepts a Username, Domain, and date range and will display all events where the specified User/Domain is in the TargetUser/TargetDomain, PrimaryUser/PrimaryDomain, ClientUser/ClientDomain, or HeaderUser/HeaderDomain fields. The domain list is pre-populated.

Event_Report_Basic
This report displays the Computer Name and Date/Time for a specific Event ID within a specified date range.

image 

image 

image 

image 

image 

image 

image 

image 

image 

image

image

ACS: EventSchema.xml changes for Server 2008 Account Lockout Events

$
0
0

Just realized that I haven’t blogged on this yet.  By default, the “Calling Machine” property of Account Lockout events from Windows Server 2008 servers is not entered in the ACS database….this will affect some of the Account Lockout reports that I have previously posted.  Below are the details and the fix:

 

For Windows 2000/2003 Account Lockout events (Event ID 644), we store the Target Account Name in the String01 column and the Caller Machine Name in the String02 column (Target Account Name is also stored in the TargetUser column.

For Windows Server 2008 Account Lockout events (Event ID 4740), we do not store anything in String01 or String02.  This doesn't really affect the Target Account Name property, since it is already stored as TargetUser, but we are no longer collecting the Calling Machine Name in the database.

To maintain parity with Server 2000/2003 Account Lockout events, we need make the following changes to the EventSchema.xml (on the ACS Collector Server) to store Target Account Name and Calling ComputerName in string01/string02:

 

NOTE:

  • The EventSchema.xml file is located in the C:\Windows\System32\Security\AdtServer folder on the ACS Collector server
  • Be sure to back up the existing EventSchema.xml file before making any changes
  • After making the change, restart the ACS Collector service on the Collector Server
  • This change will NOT affect any existing events in the database, it will only affect events that are collected AFTER making the change

 

Before:
        <Event SourceId="4740" SourceName="SE_AUDITID_ETW_ACCOUNT_AUTO_LOCKED">
          <Call Name="AppendString" Param1="1" Param2="0" />
          <Call Name="AppendString" Param1="2" Param2="0" />
          <Call Name="AppendString" Param1="3" Param2="0" />
          <Call Name="AppendString" Param1="4" Param2="0" />
          <Call Name="AppendString" Param1="5" Param2="0" />
          <Call Name="AppendString" Param1="6" Param2="0" />
          <Call Name="AppendString" Param1="7" Param2="0" />
          <Param TypeName="typeTargetUser" />
          <Param TypeName="typeTargetDomain" />
          <Param TypeName="typeTargetSid" />
          <Param TypeName="typePrimarySid" />
          <Param TypeName="typePrimaryUser" />
          <Param TypeName="typePrimaryDomain" />
          <Param TypeName="typePrimaryLogonId" />
        </Event>


After:
        <Event SourceId="4740" SourceName="SE_AUDITID_ETW_ACCOUNT_AUTO_LOCKED">
          <Call Name="AppendString" Param1="1" Param2="0" />
          <Call Name="AppendString" Param1="2" Param2="0" />
          <Call Name="AppendString" Param1="3" Param2="0" />
          <Call Name="AppendString" Param1="4" Param2="0" />
          <Call Name="AppendString" Param1="5" Param2="0" />
          <Call Name="AppendString" Param1="6" Param2="0" />
          <Call Name="AppendString" Param1="7" Param2="0" />
          <Call Name="AppendString" Param1="1" Param2="0" />
          <Call Name="AppendString" Param1="2" Param2="0" />

          <Param TypeName="typeTargetUser" />
          <Param TypeName="typeTargetDomain" />
          <Param TypeName="typeTargetSid" />
          <Param TypeName="typePrimarySid" />
          <Param TypeName="typePrimaryUser" />
          <Param TypeName="typePrimaryDomain" />
          <Param TypeName="typePrimaryLogonId" />
          <Param TypeName="typeString" />
          <Param TypeName="typeString" />

        </Event>

ACS Database Dashboard Report

$
0
0

 

Here’s a little report that I put together to get a look at what is going on in the ACS Database.  The report will show:

  • Number of Events in the DB
  • Number of days represented in the DB
  • Max Events Per Day
  • Average Events Per Day
  • Partition Count
  • DB Size
  • DB Free Space
  • Top 10 Events
  • Event Count By Day

 

Here’s a screenshot of the report….no parameters, just run it (it may take a while to run on a large ACS database)

image

 

I’ve also included another report that will show the general ACS configuration settings, details of each partition, and a list of forwarders…not super useful, but could help in troubleshooting/verifying database grooming.

image

ACS “Event Count” reports

$
0
0

Here are a couple of ACS reports to show the event count by User, Forwarder, and Event ID.  Just import the .rdl files into a folder in SQL Server Reporting Services.

 

The first report “Event Count Overview” shows the top 10 Events, Forwarders, and User Names for the specified date range:

image

 

Click on any of the bars in the chart or the chart titles to launch the “Event Count Details” report.  This report shows more detail about the item that you clicked on.  For example, if you click on the bar for a specific event Id in the “Top 10 Events” section, it will show the top 10 User Names and Forwarders for this event:

image

 

If you click on the bar for a specific Forwarder in the “Top 10 Forwarders” section, it will show the top 10 User Names and Events IDs for this forwarder:

 

image

 

If you click on one of the chart titles it will show details for all events in the chart.  For example, clicking on “Top 10 Events” shows the top 10 User Names and Forwarders for each of the top 10 events:

 

 image

 

You can also run the “Event Count Details” report on its own to show the Top N User Name, Event or Forwarder based on a User, Event, or Computer Name filter:

image

ACS report: All events for specified user (even more than the built-in version)

$
0
0

A customer recently showed me that the built-in ACS report  "Forensic_-_All_Events_For_Specified_User" seemed to be missing some events.  This report queries the Adtserver.dvall5 view in the OperationsManageAC database and looks for values in the PrimaryDomain\PrimaryUser fields that match the user name that is entered for the report. 

The problem that my customer was seeing was that "Group Change" events (event id 632,633,636,637,650,651,655,656,660,661,665,666) were stored a little differently.  The Primary Domain/User fields contain the name of the user that was added to or removed from the group, and the Client Domain/User fields contain the name of the user that made the change.  So, when you enter the domain\username and run the report, any group change event returned is for events where that user was added to (or removed from) a group and not events where that user added other users to a group.  So which one do you want??  Maybe you want both, since they are both technically related to that user.  Every event is defined in the EventSchema.xml file, so are there any others that put the username in something other than the Primary User field (the other fields for usernames are ClientUser, Targetuser, and HeaderUser)?

It would make sense to use the Header Domain/User fields, since that is where we usually store the name of the user that 'caused' the event, but another issue that I've seen is (for reasons I do not know), sometimes security events are logged without the domain name, sometimes they have the NetBIOS domain name, and sometimes the fully-qualified domain name.  In these cases, some events would not be shown since you have to enter "Domain\Username" as a parameter.

So, I put together a custom ACS report that does the following:

  1. Separates the Domain and Username fields in the report parameters, so they can be entered separately
  2. Queries the HeaderDomain, PrimaryDomain, ClientDomain, and TargetDomain fields to get a list of domain names
  3. Includes a <ALL> option in the Domain parameter, which will allow you to include events where the domain name is empty
  4. Queries the Primary User/Domain, Client User/Domain, Target User/Domain and Header User/Domain fields for the Domain/Username

image

So, the end result is a report that will show ALL ACS events that include the specified user name.  This will include events where that user "did something" and events where "something was done" to that user account.

 

NOTE:
The queried domain list may take some time to populate, which makes the report take a while to open.  The code can easily be changed to include a static domain list, or to only contain "<ALL>".

Some Custom ACS Reports

$
0
0

Here are some ACS reports that I’ve written for various customers recently.  If you have ACS installed in the same Reporting Services instance as OpsMgr Reporting, then you can just import the attached Management Pack (CustomACSReports.xml).  Otherwise, you’ll need to import each .rdl file separately.

Here is a description of each report, along with some screenshots.

Event Search
This report allow the user to search for specific security events (selected from a pre-defined list). The user can select choose a specific server or search from events from all servers. The user can also specify search strings for the UserName or Description in the event. The report returns the top 100 events from the specified date range.

Authentication Failure Summary
This report queries the ACS database for Authentication Failure errors logged during a user specified time range (default is 1 week. The Event IDs queried for are Event ID 675 (Windows Server 2003) and Event ID 4771 (Windows Server 2008). The Events are grouped by the error code, and the error message and count for each error code are listed in a table. When the user clicks on one of the errors, the Authentication Failure Detail report is run for that error message.

Authentication Failure Detail
This report queries the ACS database for Authentication Failure errors with a specific error code logged during a user specified time range (default is 1 week. The Event IDs queried for are Event ID 675 (Windows Server 2003) and Event ID 4771 (Windows Server 2008). The Events are grouped by the IP Address and User Name, and the count for each is displayed in a table.

AD Object Changes
This report will show details of events related to changes in Active Directory. The report will query the ACS database for Event ID 566 / 5136 and show the Event Time, UserName, Domain Controller, Object Type, Object Name, accessed Properties, and the New Value of the property (Win2k8 only). The report also includes options to search for a specific string in the Object Name and/or Property Name.

Exchange AD Object Activity
This report shows events related to changes to Exchange Objects in Active Directory. The report will query the ACS database for Event ID 566 and 5136 within the specified time range, where the object name contains the string "CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=". The report groups the events by UserName, and shows the Event Time, Domain Controller, Object Type, Object Name, and accessed Properties. The report also includes an option to exclude changes made by computer accounts.

Account Lockout and Authentication Failure by User
This report accepts a date range, username, and domain and will list all occurrences of the following events for the specified user within the specified date range: Event 644 / 4740 (Account Lockout), Event 529 / 4625 (Unknown Username or Bad Password) , Event 675 / 4771 (Kerberos Pre-Authentication Failure), Event 680 / 4776 (NTLM Authentication Failure)

Account Lockout by User
This report accepts a date range, username, and domain and will list the time and computer name for all account lockout events (Event ID 644 / 4740) for the specified user within the specified date range.

Account Lockout Trends
This report accepts a date range and Domain name and will query for all Account Lockout events (Event ID 644 / 4740) within the specified date range and domain. The report contains charts which show average number of account lockouts for each hour of the day and each day of the week, and a trending chart which will show the number of account lockouts over the specified time range. The report also lists all of the lockouts in a table, grouped by Domain, User, Workstation, and Time.

Top 10 Accounts Failing Authentication
This report will query the ACS database for Authentication Failure events (Event ID 680 and 4776) within the specified time range. The report contains a table which will show the 10 user accounts with the most failures, grouped by Workstation and Error Code.

User Account Management Activity
This report will show the number of various account management events within a specified time range, grouped by domain. The events displayed are Accounts Changed (642,4738), Accounts Created (624,4720), Accounts Enabled (626,4722), Accounts Disabled(629,4725), Accounts Deleted (Event ID 630,4726), Names Changed (685,4781), Password Resets (628,4724), Accounts Unlocked (671,4767). Clicking on any of the numbers on the report will launch the "Automated Account Change Trends" report for more details.

ACS Events for Specified User
This report accepts a Username, Domain, and date range and will display all events where the specified User/Domain is in the TargetUser/TargetDomain, PrimaryUser/PrimaryDomain, ClientUser/ClientDomain, or HeaderUser/HeaderDomain fields. The domain list is pre-populated.

Event_Report_Basic
This report displays the Computer Name and Date/Time for a specific Event ID within a specified date range.

image 

image 

image 

image 

image 

image 

image 

image 

image 

image

image

ACS: EventSchema.xml changes for Server 2008 Account Lockout Events

$
0
0

Just realized that I haven’t blogged on this yet.  By default, the “Calling Machine” property of Account Lockout events from Windows Server 2008 servers is not entered in the ACS database….this will affect some of the Account Lockout reports that I have previously posted.  Below are the details and the fix:

 

For Windows 2000/2003 Account Lockout events (Event ID 644), we store the Target Account Name in the String01 column and the Caller Machine Name in the String02 column (Target Account Name is also stored in the TargetUser column.

For Windows Server 2008 Account Lockout events (Event ID 4740), we do not store anything in String01 or String02.  This doesn't really affect the Target Account Name property, since it is already stored as TargetUser, but we are no longer collecting the Calling Machine Name in the database.

To maintain parity with Server 2000/2003 Account Lockout events, we need make the following changes to the EventSchema.xml (on the ACS Collector Server) to store Target Account Name and Calling ComputerName in string01/string02:

 

NOTE:

  • The EventSchema.xml file is located in the C:\Windows\System32\Security\AdtServer folder on the ACS Collector server
  • Be sure to back up the existing EventSchema.xml file before making any changes
  • After making the change, restart the ACS Collector service on the Collector Server
  • This change will NOT affect any existing events in the database, it will only affect events that are collected AFTER making the change

 

Before:
        <Event SourceId="4740" SourceName="SE_AUDITID_ETW_ACCOUNT_AUTO_LOCKED">
          <Call Name="AppendString" Param1="1" Param2="0" />
          <Call Name="AppendString" Param1="2" Param2="0" />
          <Call Name="AppendString" Param1="3" Param2="0" />
          <Call Name="AppendString" Param1="4" Param2="0" />
          <Call Name="AppendString" Param1="5" Param2="0" />
          <Call Name="AppendString" Param1="6" Param2="0" />
          <Call Name="AppendString" Param1="7" Param2="0" />
          <Param TypeName="typeTargetUser" />
          <Param TypeName="typeTargetDomain" />
          <Param TypeName="typeTargetSid" />
          <Param TypeName="typePrimarySid" />
          <Param TypeName="typePrimaryUser" />
          <Param TypeName="typePrimaryDomain" />
          <Param TypeName="typePrimaryLogonId" />
        </Event>


After:
        <Event SourceId="4740" SourceName="SE_AUDITID_ETW_ACCOUNT_AUTO_LOCKED">
          <Call Name="AppendString" Param1="1" Param2="0" />
          <Call Name="AppendString" Param1="2" Param2="0" />
          <Call Name="AppendString" Param1="3" Param2="0" />
          <Call Name="AppendString" Param1="4" Param2="0" />
          <Call Name="AppendString" Param1="5" Param2="0" />
          <Call Name="AppendString" Param1="6" Param2="0" />
          <Call Name="AppendString" Param1="7" Param2="0" />
          <Call Name="AppendString" Param1="1" Param2="0" />
          <Call Name="AppendString" Param1="2" Param2="0" />

          <Param TypeName="typeTargetUser" />
          <Param TypeName="typeTargetDomain" />
          <Param TypeName="typeTargetSid" />
          <Param TypeName="typePrimarySid" />
          <Param TypeName="typePrimaryUser" />
          <Param TypeName="typePrimaryDomain" />
          <Param TypeName="typePrimaryLogonId" />
          <Param TypeName="typeString" />
          <Param TypeName="typeString" />

        </Event>


ACS Database Dashboard Report

$
0
0

 

Here’s a little report that I put together to get a look at what is going on in the ACS Database.  The report will show:

  • Number of Events in the DB
  • Number of days represented in the DB
  • Max Events Per Day
  • Average Events Per Day
  • Partition Count
  • DB Size
  • DB Free Space
  • Top 10 Events
  • Event Count By Day

 

Here’s a screenshot of the report….no parameters, just run it (it may take a while to run on a large ACS database)

image

 

I’ve also included another report that will show the general ACS configuration settings, details of each partition, and a list of forwarders…not super useful, but could help in troubleshooting/verifying database grooming.

image

ACS “Event Count” reports

$
0
0

Here are a couple of ACS reports to show the event count by User, Forwarder, and Event ID.  Just import the .rdl files into a folder in SQL Server Reporting Services.

 

The first report “Event Count Overview” shows the top 10 Events, Forwarders, and User Names for the specified date range:

image

 

Click on any of the bars in the chart or the chart titles to launch the “Event Count Details” report.  This report shows more detail about the item that you clicked on.  For example, if you click on the bar for a specific event Id in the “Top 10 Events” section, it will show the top 10 User Names and Forwarders for this event:

image

 

If you click on the bar for a specific Forwarder in the “Top 10 Forwarders” section, it will show the top 10 User Names and Events IDs for this forwarder:

 

image

 

If you click on one of the chart titles it will show details for all events in the chart.  For example, clicking on “Top 10 Events” shows the top 10 User Names and Forwarders for each of the top 10 events:

 

 image

 

You can also run the “Event Count Details” report on its own to show the Top N User Name, Event or Forwarder based on a User, Event, or Computer Name filter:

image

Several Management Packs Updated, fix for MaxConcurrentAPI Monitor included

$
0
0

We recently released updated versions of several Management Packs, including the Windows Server OS MP. Below are links to the updated MPs and the fixes that I am aware of:

 

Windows Server Operating System – Version 6.0.7230.0

  • Bug fixed: Microsoft.Windows.Server.LogicalDiskDiscovery.Module.Type.vbs script does not discover logical disks with large disk size
  • Update to support two configurable threshold values (waiters and timeouts) for triggering alert ‘MAX concurrent API Reached’

One change that may require some attention is the update to the “Max Concurrent API Monitor”.

In the previous version of the MP (6.0.7061.0), the Max Concurrent API monitor would run a script that looks at three Netlogon semaphore counters (Waiters, Holders, and Timeouts).  If any of them are greater than 0 and less than 4gb, then we generate an alert. Some customers were seeing false alerts from the Monitor due to some of the counters (especially Semaphore Holders) being greater than zero during non-problematic conditions.

In the new version of the Management Pack, the Max Concurrent API Monitor has the following changes:

  • The Semaphore Holders counter has been removed from the criteria that generates the alert
  • The Alert will only be generated if Semaphore Waiters or Timeouts exceed a defined threshold (instead of 0).  These thresholds can be configured as needed via Overrides.  The default thresholds are:
  • Semaphore Waiters = 50
  • Semaphore Timeouts = 2000

More information on the MaxConcurrentAPI setting can be found here.

 

Active Directory – Version 6.0.8293.0

  • Issue fixed: AD-Trust Monitor does not come back to healthy state
  • Issue fixed: AD_Database_and_Log.vbs does not support using ‘.’ as decimal sign for non-English account.

 

Cluster – Version 6.0.7230.0

  • Issue fixed: Cluster 2008 MP does not collect certain performance metrics.

 

DNS – Version 7.1.10259.0

  • Issue fixed: DNSMetrics2012R2Probe script can cause high CPU in MonitoringHost.exe

 

DHCP – Version 6.0.7230.0

  • Various issues fixed.

 

MSMQ 5.0 (6.0.6709.88)
MSMQ 6.0  (7.0.8569.0)
MSMQ 6.3 (7.1.10109.0)

  • Update to support monitoring workgroup machines

CPU/Memory Monitors – Include Top Processes in Alert Description

$
0
0

Here is a Management Pack that I wrote for a customer a while back.  The requirement was to take the alerting for CPU Utilization and Available Memory from the Windows Server Management Pack and add the top 5 processes consuming CPU/Memory to the Alert Description.

The MP that I wrote for this is attached

  • The MP contains replicas of the “Available Megabytes of Memory” and “Total CPU Utilization Percentage” Monitors from the Windows Server MPs
  • The only change is a modification to the VBScript to get the Top 5 processes and include them in the Property Bag and Alert.
  • The memory monitor will show the top 5 instances of Process\Private Bytes
  • The CPU monitor will show the top 5 instances of Process\% Processor Time
  • The MP has separate Monitors for Server 2003, 2008, 2008 R2 and 2012

 

Screenshots of the alerts are below:

 

image

 

image

SAMPLE.Windows.Server.CPU_Memory.Monitoring.zip

Alerting on Deadlocks with the SQL Server Management Pack

$
0
0

Today a customer asked me how to configure SCOM to generate Alerts for SQL Deadlocks.  Looking in the SQL Server Management Pack, I found that we have event log Rules for deadlocks for SQL 2005, 2008, and 2012:

 

image

 

 

The Rules are targeted at DB Engine and alert on Event ID 1205 in the Application Event Log:

 

image

image

 

 

However, my customer generated a deadlock and no SCOM Alert was generated.  Looking in the Application Event Log on the SQL Server, we saw that the 1205 event was not logged.

 

After doing some digging, I found that that SQL Server does not log this event by default…which was confirmed by running Select * from sys.messages where message_id=1205 on the master database…the results showed is_event_logged=0:

 

image

 

 

To change this, we ran Exec sp_altermessage 1205, 'WITH_LOG', 'true' and verified the change (is_event_logged=1):

 

image

 

 

Now I generate a deadlock and get the 1205 event in the Application Event Log:

 

image

 

 

And I get an Alert from the SQL Server Management Pack:

 

image

 

To generate a deadlock, I used the steps documented here.

Pre-reqs for SCOM 2012 Consoles

$
0
0

Whenever I’m installing SCOM 2012, it takes me a little extra time to get the pre-reqs installed for the Operations Console and Web Console, so I’m adding this to my blog for an easy place to find the download links and steps.

 

Operations Console

Before installing the SCOM 2012 Operations Console, you will need to install SQL CLR Types and Report View 2012 Runtime:

 

· SQL CLR Types

X86 Package(SQLSysClrTypes.msi)
X64 Package (SQLSysClrTypes.msi)

· Report Viewer 2012 Runtime – Download here

 

 

Web Console

Before installing the SCOM 2012 Web Console role, you will need to install the IIS Role and a number of services.  Tim McFadden has documented the steps for doing this with PowerShell in Windows Server 2008 R2 and Windows Server 2012…links to Tim’s blog and overview of the steps below…

 

Windows Server 2008 R2

  1. Run the following PowerShell commands to install the prerequisites:

    Import-Module ServerManager

    Add-WindowsFeature NET-Framework-Core,Web-Static-Content,Web-Default-Doc,Web-Dir-Browsing,Web-Http-Errors,Web-Http-Logging,Web-Request-Monitor,Web-Filtering,Web-Stat-Compression,AS-Web-Support,Web-Metabase,Web-Asp-Net,Web-Windows-Auth –restart

  2. Run the following command from elevated command prompt to register the ASP .NET 4 with IIS.

    c:\windows\Microsoft.NET\Framework64\v4.0.30319\aspnet_regiis.exe –r

  3. Run the following command to enable IIS to work with .net 4.0

    c:\windows\system32\inetsrv\appcmd set config /section:isapiCgiRestriction /[path=`'C:\Windows\Microsoft.NET\Framework64\v4.0.30319\aspnet_isapi.dll`'].allowed:True

  4. Restart the server

 

Windows Server 2012

    Run the following PowerShell commands to install the prerequisites:

    Import-Module ServerManager

    Add-WindowsFeature NET-Framework-Core,AS-HTTP-Activation,Web-Static-Content,Web-Default-Doc,Web-Dir-Browsing,Web-Http-Errors,Web-Http-Logging,Web-Request-Monitor,Web-Filtering,Web-Stat-Compression,AS-Web-Support,Web-Metabase,Web-Asp-Net,Web-Windows-Auth –restart

Alerting when performance counters cannot be resolved by SCOM

$
0
0

A customer of mine is seeing some random issues with SCOM not being able to resolve Exchange performance counters and would like to get an alert when this happens so we can troubleshoot right away.

When SCOM cannot resolve a performance counter, it will log one of two events:

  • Event ID 10102 (could not resolve counter)
  • Event ID 10103 (could not resolve counter instance)

 

I created a simple Management Pack with an Alert Rule to alert on these events. The attached MP contains one Rule with the following criteria:

Name: Alert on Missing Performance Counters
Target: Health Service
Criteria:
              Event log: Operations Manager
              Event Source: Health Service Modules
             Event ID:10102 / 10103

The Rule will generate an alert when event ID 10102 (could not resolve counter) or 10103 (could not resolve counter instance) are logged.  Alert suppression is enabled for the Object, Counter, and Instance parameters so it will generate a separate alert for each counter that cannot be resolved on a monitored computer.

The Rule is disabled by default…I would generally recommend only enabling it on computers that you are having problems with at first, then on others if needed.

Here is a screen shot of the Alert:

image

MissingPerfCounterAlerting.xml


Agent Version and Health report

$
0
0

A customer asked me to put together a SCOM data warehouse report that would list Agents from all Management Groups, and some general Agent version and health information. The report I put together is attached.  It’s a pretty simple report that can be used to get a list of agents that are offline, agents with a specific patch installed, or agents with a specific Agent version.  You can also get one big list of all agents, export to Excel and filter/sort as needed. It also works well for environments with multiple Management Groups sharing one Data Warehouse.

The report has the following fields

  • Agent Name
  • Agent Version
  • Agent Version Number
  • Patch List
  • Management Group
  • HS Availability
  • HSOutage Last Modified
  • Health Service Watcher Previous Day Healthy
  • Health Service Watcher Previous Day Unhealthy
  • Health Service Watcher Previous Day Maintenance
  • Heartbeat Previous Day Healthy
  • Heartbeat Previous Day Unhealthy
  • Heartbeat Previous Day Maintenance
  • OSVersion

 

The HS Availability field indicates whether the agent is currently connecting to SCOM. This is based this on data from the vHealthServiceOutage view in the data warehouse, which tracks when Health Service outages begin and end for an Agent. The “HSOutage Last Modified” field will tell you when the agent availability last changes.

The “Previous Day Healthy/Unhealthy/Maintenance” fields show the percentage of the previous day that the Health Service Watcher and Heartbeat monitor for the agent were in a Healthy/Unhealthy/Maintenance state. I do see occasional issues with these numbers being inaccurate, but these fields plus the HSOutage fields should give a pretty good indication of whether or not an agent is online.

 

The following parameters are available for filtering:

Agent Name

Include only agents that contain the text entered

Leave this blank to include all agents

 

Patch List

Include only Agents whose PatchList property contains the text entered

Leave this blank to include all agents

 

Health Service Availability

Select to include only Available agents, Unavailable agents, or both

 

Agent Version

Select to include/exclude SCOM 2007, SCOM 2012, or SCOM 2012 R2 Agents (all are selected by default)

 

Management Group

Select to include/exclude agents from specific Management Groups (all are included by default)

image

  image

Agent Version and Health.rdl

Exchange 2010 Correlation Engine Not Generating Alerts

$
0
0

While helping a customer migrate from SCOM 2007 to SCOM 2012 R2, we noticed a large reduction in Alerts for Exchange 2010.  After investigating further, we noticed that we were not getting any Alerts that come from the Exchange 2010 Correlation Engine.

After verifying with my customer that the Exchange 2010 Monitors were generating state changes, but no alerts were coming in, we reviewed the event log for the Correlation Engine (it logs to the Application Event log on the server it is installed on) and saw event 717 being logged every 5 minutes (more details below).

After doing some research, I found the following information which resolved the problem (provided by Microsoft Escalation Engineer Manoj Parvathaneni).

Quick Summary:
We had to install the Correlation Engine on a server without any SCOM 2012 server components installed, and include some SCOM 2007 dlls in the Correlation Engine installation directory.

Symptoms

When the Exchange Correlation Engine is installed on a System Center 2012 Operations Manager Management Server, the Automatic Alert Resolution feature in the Exchange Correlation Engine ceases to function and the following event is logged in the Application event log on the Management Server:

Log Name: Application
Source: MSExchangeMonitoringCorrelation
Event ID: 717
Task Category: General
Level: Warning
Keywords: Classic
User: N/A
Description:
Connection with the Operations Manager Root Management Server failed.

Error: System.InvalidCastException: [A]System.Collections.Generic.List`1[Microsoft.EnterpriseManagement.Monitoring.MonitoringAlertUpdateStatusIndigo] cannot be cast to [B]System.Collections.Generic.List`1[Microsoft.EnterpriseManagement.Monitoring.MonitoringAlertUpdateStatusIndigo]. Type A originates from 'mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' in the context 'LoadNeither' at location 'C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll'. Type B originates from 'mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' in the context 'LoadNeither' at location 'C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll'.
at Microsoft.EnterpriseManagement.Utilities.GetDeserializeObject[T](Byte[] serializedObject)
at Microsoft.EnterpriseManagement.DataAbstractionLayer.InstanceSpaceOperations.UpdateAlerts(Byte[] alerts, String comments, Nullable`1 modifyingConnectorId)
at Microsoft.EnterpriseManagement.Monitoring.MonitoringAlert.UpdateAlertsInternal[T](IList`1 alerts, String comments, Nullable`1 modifyingConnectorId, ManagementGroup managementGroup)
at Microsoft.EnterpriseManagement.Monitoring.MonitoringAlert.UpdateInternal(String comments, Nullable`1 modifyingConnectorId)
at Microsoft.EnterpriseManagement.Monitoring.MonitoringAlert.Update(String comments)
at Microsoft.Exchange.Monitoring.CorrelationEngine.MomSdkProxy.ResolveEntityAlert(Entity entity, MonitoringAlert monAlert)
at Microsoft.Exchange.Monitoring.CorrelationEngine.MomSdkProxy.ResolveEntityAlerts(Entity entity, String alertCriteriaString)
at Microsoft.Exchange.Monitoring.CorrelationEngine.CorrelationEngine.CorrelateWithinEntity(Node entity)
at Microsoft.Exchange.Monitoring.CorrelationEngine.CorrelationEngine.CorrelateBatchTask(Object batchData)

Number of occurrence: 3

You'll also notice the following error in the Exchange Correlation Engine logs:

2013-11-18T08:16:15.673Z,4,Information,"CorrelateBatchTask: caught exception [System.InvalidCastException: [A]System.Collections.Generic.List`1[Microsoft.EnterpriseManagement.Monitoring.MonitoringAlertUpdateStatusIndigo] cannot be cast to [B]System.Collections.Generic.List`1[Microsoft.EnterpriseManagement.Monitoring.MonitoringAlertUpdateStatusIndigo]. Type A originates from 'mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' in the context 'LoadNeither' at location 'C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll'. Type B originates from 'mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' in the context 'LoadNeither' at location 'C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll'.    at Microsoft.EnterpriseManagement.Utilities.GetDeserializeObject[T](Byte[] serializedObject)    at Microsoft.EnterpriseManagement.DataAbstractionLayer.InstanceSpaceOperations.UpdateAlerts(Byte[] alerts, String comments, Nullable`1 modifyingConnectorId)    at Microsoft.EnterpriseManagement.Monitoring.MonitoringAlert.UpdateAlertsInternal[T](IList`1 alerts, String comments, Nullable`1 modifyingConnectorId, ManagementGroup managementGroup)    at Microsoft.EnterpriseManagement.Monitoring.MonitoringAlert.UpdateInternal(String comments, Nullable`1 modifyingConnectorId)    at Microsoft.EnterpriseManagement.Monitoring.MonitoringAlert.Update(String comments)    at Microsoft.Exchange.Monitoring.CorrelationEngine.MomSdkProxy.ResolveEntityAlert(Entity entity, MonitoringAlert monAlert)    at Microsoft.Exchange.Monitoring.CorrelationEngine.MomSdkProxy.ResolveEntityAlerts(Entity entity, String alertCriteriaString)    at Microsoft.Exchange.Monitoring.CorrelationEngine.CorrelationEngine.CorrelateWithinEntity(Node entity)    at Microsoft.Exchange.Monitoring.CorrelationEngine.CorrelationEngine.CorrelateBatchTask(Object batchData)].

In addition, the Exchange Correlation Engine will not report any new Alerts or report the alerts to SCOM 2012 inconsistently.

Resolution

The exception above can be resolved by using one of the following methods:

Method 1: Install Exchange Correlation Engine on a dedicated system without the SCOM 2012 Management Server component

  1. Install Exchange Correlation engine on this machine.
  2. Stop the Microsoft Exchange Monitoring Correlation service.
  3. Copy SCOM 2007 R2 SDK binaries into the CE install directory (Program Files\Microsoft\Exchange Server\v14\Bin).
    • Microsoft.EnterpriseManagement.OperationsManager.Common.dll
    • Microsoft.EnterpriseManagement.OperationsManager.dll
    • MomBidLdr.dll

**UPDATE: Files attached to this blog

Where can you find the above binary files?

  • The SCOM 2007 R2 SDK Binaries Microsoft.EnterpriseManagement.OperationsManager.Common.dll and Microsoft.EnterpriseManagement.OperationsManager.dll can be found on a system that has SCOM 2007 R2 Console in the following directory Program Files\System Center Operations Manager 2007\SDK Binaries.
  • The MOmBidLdr.dll can be found can be found on a system that has SCOM 2007 R2 Console in the following directory Program Files\System Center Operations Manager 2007

Method 2: Disable the Automatic Alert Resolution Feature in the Correlation Engine

By following this approach the customer will lose the ability to resolve the alerts automatically via the Exchange Correlation Engine.

To disable Automatic Alert Resolution, perform the following steps:

  1. Log on to the server that's hosting the Microsoft Exchange Monitoring Correlation Engine service.
  2. Locate the Correlation Engine configuration file named Microsoft.Exchange.Monitoring.CorrelationEngine.exe.config. By default, the file is located in C:\Program Files\Microsoft\Exchange Server\V14\Bin\, where C:\ is the Exchange installation directory.
  3. Open Microsoft.Exchange.Monitoring.CorrelationEngine.exe.config in a text editor such as Notepad
  4. Locate the following line in the configuration file:
    <add key="AutoResolveAlerts" value="true" />
  5. Change <add key="AutoResolveAlerts" value="true" /> to <add key="AutoResolveAlerts" value="false" />
  6. Restart the Microsoft Exchange Monitoring Correlation Engine service.

More Information

Exchange Correlation Engine:
The Correlation Engine(CE) is a stand-alone Windows service that uses the Operations Manager SDK interface to first retrieve the health model (or instance space) and then process state change events. By maintaining the health model in memory, and processing state change events, the Correlation Engine is able to determine when to raise an alert based on the state of the system. The Correlation Engine solely relies on the Operations Manager SDK. It does not use the agent to perform any of the correlation and other aspects of its operations. For further details, you can also reference the Exchange 2010 Management Pack guide.

Automatic Alert Resolution feature in Exchange Correlation Engine:
The Automatic Alert Resolution feature automatically closes related alerts when the Exchange 2010 Management Pack determines that the underlying issue is no longer a problem. This feature is provided by the Correlation Engine, and is enabled by default. Using Automatic Alert Resolution can cause multiple alerts to be logged if the same alert is logged again for another instance of the problem before the associated ticket has been resolved by support teams.

You may also want to disable this feature under the following or other conditions:

  • If you're using ticketing or another support system that wouldn't work correctly if alerts are automatically resolved.

If you're using a connector with Operations Manager 2007. A connector is a custom service or program that allows Operations Manager to communicate with external systems. For example, you may want to disable this feature if you're using a connector that allows an external application to track Exchange 2010 Management Pack alerts.

SCOM2007FilesForEXCHCorrelationEngine.zip

Management Pack to Monitor and Reduce Health Service Store Size

$
0
0

A customer of mine monitors some agents that only have 1gb of storage, so disk space must be conserved as much as possible. The Health Service store file (a JET database named HealthServiceStore.edb) on these Agents is around 100mb-200mb, which is pretty normal for a SCOM Agent.

I researched this a bit and found that unused space in the Health Service store is not cleaned up automatically. To clean it up and reduce the size of the file, you can do an offline defragmentation of the file by doing the following:

  1. Stop the Microsoft Monitoring Agent service (HealthService)
  2. Open an admin command prompt and run the following command (modify the file path as needed)

    esentutl /d "C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\Health Service Store\HealthServiceStore.edb"

  3. Restart the Microsoft Monitoring Agent service

Doing this in my test environment will generally reduce the file size from ~110mb to ~25mb

To help my customer detect and automate this, I’ve created the attached Management Pack to monitor, collect, and reduce the size of the Health Service store file.

 The Management Pack contains:

  • A Monitor to check the size of the Health Service database file
    • Monitor name is “Health Service Store File Size Check”
    • Targeted at the “Agent” class
    • Disabled by default
    • Warning Alert/State Change is generated if the file size is greater than the defined threshold
    • File size threshold is 100mb by default and can be configured via override
  • A Rule to collect the size of the Health Service database file
    • Rule name is “Collection: Health Service Database File Size”
    • Targeted at the “Agent” class
    • Collected performance counter is
      • Object: Health Service Database
      • Counter: File Size
      • Instance: <Path to health service store file>
  • A Task to manually do an offline defrag of the Health Service database file
    • Task name is “Health Service Database Offline Defrag”
    • Targeted at the “Agent” class
  • A Recovery to automatically do an offline defrag of the Health Service database file when the monitor detects the size is over the defined threshold
    • Disabled by default

Details of the workflows

  • The Monitor runs a PowerShell script that gets the Health Service State directory from the registry, then checks the size of the HealthServiceStore.edb file and compares against the defined threshold. The script returns a Property Bag to SCOM with the file path, size, threshold, and status (above/below threshold).
  • The Collection Rule uses the same script as the Monitor, and just uses the file size property and maps it to performance counter data.
  • The offline defrag Task and Recovery use a VBScript based on Matt Taylor’s task to restart SCOM Health Service script…all I did is modify it to run the offline defrag before restarting the service.

 

Screen shots from the Management Pack

Monitor

monitor

 

Rule

rule

 

Task

task

task

 

Health Explorer

explorer

 

explorer

 

Alert

alert

 

Performance Collection

perf

HealthServiceStore.Monitoring.zip

PowerShell Command to Approve SCOM Agents Listed in a Text File

$
0
0

 

My customer has a lot of manually installed SCOM Agents that have not yet been approved in the console, and wanted a quick way to approve only the ones that are listed in a text file.

The following PowerShell command can take care of this pretty easily (modify the C:\Servers.txt path as needed). Also, be sure that the servers are in FQDN format in the text file (Server.Domain.com).

Foreach ($Computer in (Get-Content C:\Servers.txt)) {Get-SCOMPendingManagement | where {$_.AgentPendingActionType -eq "ManualApproval" -and $_.AgentName -eq $Computer}| Approve-SCOMPendingManagement}

SCOM Task to Add or Remove Management Groups on Agents

$
0
0

A customer of mine found that they had numerous SCOM Agents which were multi-homed to defunct Management Groups, and they were looking for an easy way to remove these old Management Groups from the Agents. They were also building out a new pre-production environment and would need  to multi-home some of the production agents to it.

I put together a Management Pack that does the following

  • Creates a new class named “Agent Management Groups”
  • Runs a discovery on all Agents to discover each Management Group they are configured to connect to
  • Provides tasks to Add and Remove Management Groups on the Agents

 

To use this Management Pack

  1. Import the Management Pack
  2. Wait a little while for the Discovery to happen
  3. Go to the Agent Management Groups\Agents view in the SCOM Console to see the list of Agents and the Management Groups they are configured to communicate with.
  4. Use the Agent Management Group Tasks in the Tasks pane to Add or Remove Management Groups.

NOTE: Running these task)s will restart the Microsoft Monitoring Agent (Health Service on the Agent. I have seen occasional cases where the service stops and does not start back up. In these cases, you will need to remotely connect to the Agent (or log on to it) and start the service.

Agents

 

Add a Management Group to Agents

Select the Agents to modify, then select the “Add a Management Group” task

AddMG-1

 

In the “Run Task” window, select “Arguments” and click on Override.

AddMG-2

 

Enter the Management Group name and Management Server name in the Arguments override

AddMG-3

 

Run the task

AddMG-4

 

When the task completes successfully, you should need the new Management Groups on the Agents within 5 minutes (when the discovery runs again)

 

Remove a Management Group from Agents

Let’s say you have decommissioned a Management Group (MG1), but several agents are still configured to use it.

Simply enter MG1 in the search bar to find all Agents that are configured with to connect to that Management Group, select all of them (CTRL—A), and select the “Remove a Management Group” task.

RemoveMG-1

 

Override the Arguments parameter and enter the name of the Management Group to remove, then run the task

RemoveMG-2

 

Within a few minutes, you should see the Management Group removed from the view.

A third Task is also included – “Remove all but one Management Group”. In this task, you override the Arguments parameter with the name of the Management Group that you want to KEEP, and all others will be removed.

Agent.Management.Groups

Viewing all 20 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>