Mister IT incident management process
Modified on: Wed, Apr 2 2025 1:57 PMPURPOSE
The primary purpose is to restore normal service operation as quickly as possible and minimize the adverse impact to customers and business operations, thus ensuring that the best possible levels of service quality and availability are maintained.
SCOPE
Applies to all software and information technology services in use at Mister Car Wash.
PROCESS
The steps below run through the process from reporting the incident through closing the incident
- Incident is reported or detected – anyone noticing or hearing of an incident has the responsibility of working through this process. Team members must monitor and triage tickets throughout the day.
- Incident identification - Failures or potential failures need to be detected early so that the incident management process can be started quickly.
- Incident logging - All incidents must be fully logged in the Service Desk (even those that seem like duplicates) so that anyone assisting in the resolution has immediate access to all information and to maintain a full historical record. Recording each reported issue helps to understand how widespread the issue is.
- Incident categorization - Both Urgency and Impact need to be assessed. If the issue has already been reported, relate the issues together.
- Incident prioritization - Once the impact and urgency are assessed, use “Incident Priority” matrix shown below to identify the priority.
- Incident management – If this a priority 0-critical or 1-high incident, declare the Incident Owner and the Communication Owner. If possible, two different people should perform these roles so that the time communicating with users does not interfere with resolving the issue itself.
- Priorities 0 or 1: Contact the ticket owner and confirm they’re aware of the assigned critical or high priority ticket.
- Incident notification – Communication Owner uses the Communication and Notification table to notify appropriate people based on priority.
- Initial diagnosis – IT will utilize the collected information on the symptoms to initiate a search of the Knowledge Base to find an appropriate solution. IT will resolve the incident and close the incident if the resolution is successful.
- Incident escalation - If the necessary information to resolve the incident is not in the Knowledge Base, the appropriate support group (including vendor support) must be consulted for further diagnostics and attempted resolution. Using timelines in the Resolution and Notification table, escalate and provide status updates.
- Incident resolution – Work on diagnosis and solution until requester verifies that the resolution was satisfactory. An incident resolution does not require that the underlying cause of the incident has been corrected. The resolution only needs to make it possible for the requester to be able to continue their work.
- Incident closure - Once the requester verifies the issue is resolved, close the ticket.
- Confirm and/or update incident categorization so it is correct.
- Update ticket with resolution details
- Update Knowledge Article with all troubleshooting and remediation steps
- Determine if this incident could recur and decide preventative action.
Use the Incident Priority Matrix below, when an incident is reported or detected.
Incident Priority |
IMPACT | ||||
High | Medium | Low | |||
Service or major portion of a service is unavailable | Issue prevents personnel from performing business critical, time sensitive functions. | Issue prevents personnel from performing a portion of their duties. | |||
URGENCY | HIGH | Significant Damage is occurring or will occur rapidly (One or more stores, or all HQ affected) | Urgent | High | Medium |
Medium | Damage increases considerably over time (Part of a store, or HQ departments affected) | High | Medium | Low | |
Low | Damage marginally increases over time. (One or two personnel affected) | Medium | Low | Low |
Examples:
Low | Issues that are not significantly affecting the site's ability to process cars.
|
Medium | Issues that limit's the site's ability to process cars
|
High |
|
Urgent |
|
Categorization & Prioritization
The goals of proper categorization are to:
- Identify the Impact and Urgency to determine the Priority
- Indicate what support groups need to be involved
- Capture meaningful metrics on system reliability
All incidents are important to the user, but incidents that affect large groups or mission critical functions need to be addressed before those affecting 1 or 2 people.
Resolution and Notification
Every issue will be addressed as soon as possible, however, most will need to be triaged and worked in priority order. The Resolution and Notification Table below defines how to report an issue, the expected response time, resolution time, communication method and communication frequency.
For example, requesters reporting Critical Incidents, should expect a response within 15 minutes. Resources will be pulled from lower priority issues to try to resolve the issue within one hour. The requester should call (versus email or portal) to report this priority of issue. Both the requester and IT must notify their manager immediately and provide hourly updates.
Priority | How to Report | Expected Response Within | Target Resolution Time | When to Notify Manager | How to Respond | Response Update Frequency |
Urgent | Phone | 15 Minutes | 1 Hour | Immediately | Phone, Email Follow-Up | Every hour until resolved |
High | Phone | 15 Minutes | 1 Day | Immediately | Phone, Email Follow-Up | Every day until resolved |
Medium | Portal, Email, Phone | 4 Hours | 2 Days | 1 day, if not resolved | Every two days until resolved | |
Low | Portal, Email | 24 Hours | 5 Days | 2 days, if not resolved | Every 5 days until resolved |
Issue Escalation
Requesters, IT and management play important roles in incident management. If the requester has not been contacted within the “Expected Response Time” or if the issue is not resolved by the “Target Resolution” time, he/she must escalate the issue to his/her manager. If IT has not been able to resolve the issue by the targeted resolution time, he/she must escalate the issue to his/her manager. Escalate to Mister IT using the tiers defined below.
Escalation | Contact |
Tier 1 | Support Center Supervisors |
Tier 2 | Director of IT Support |
Tier 3 | Director of IT |
Tier 4 | Director of IT Operations |
Tier 5 | Chief Technology Officer |
Responsibility Matrix (RACI)
Obligation | Role Description |
Responsible | Responsible to perform the assigned task |
Accountable (only 1 person) | Accountable to make certain work is assigned and performed |
Consulted | Consulted about how to perform the task appropriately |
Informed | Informed about key events regarding the task |
This RACI pertains to Priority Issues identified as CRITICAL or HIGH.
Activity | Requestor | IT Staff | Service Desk Manager | Communication Owner | Incident Owner | Director/ IT Leadership | Regional Manager | Technical Expert |
Record Incident in Service Desk | A, R | |||||||
Incident Assignment | R | A, R | ||||||
Incident Categorization | A, R | |||||||
Incident Prioritization | A, R | |||||||
Declare incident Owner & Communication Owner | I | R | A | I | I | |||
Incident Notification | I | A | R | C | I | I | ||
Incident Diagnosis | C | I | C | I | A, R | C | ||
Incident Resolution | C | I | I | A, R | I | I | I | |
Incident Escalation | R | I | C | I | A, R | I | R | C |
Incident Closure | C | I | I | I | A, R | I | I |
Key Performance Indicators (KPIs)
- Average Cost per Incident: Fixed and variable costs divided by the total number of incidents
- Average Initial Response Time: Total time between when an incident is reported to when IT responds divided by total number of incidents
- Average Resolution Time: Average time taken to resolve an incident
- Percentage of Incidents by Priority: Proportion of total incidents broken down by priority
- Total Incidents by priority: Quantity of incidents within a defined timeframe
Definition and Terms
- Communication Owner: Person responsible for communicating status of the incident and gathering any additional information about the incident for the incident owner.
- Impact: Is a measure of the effect of an incident, problem or change, determined by how many personnel or functions that are affected.
- Incident: An unplanned interruption or reduction in quality of an IT Service.
- Incident Manager: Person responsible for driving and continually improving the incident management process.
- Incident Owner: Person responsible for ensuring the incident is resolved.
- Priority: A category used to identify the relative importance of an incident.
- Urgency: Is a measure of how long it will be until the issue has a significant impact on the business.
Related Procedures
- IT Essentials Guide for General Managers
- How to Contact Mister IT Support
- Mister IT Support Escalation Contacts
Revision History
Revised Date | Revised By | Revisions |
08/15/2019 | Lauren Babson | Document created |
08/03/2021 | Tam Rininger | Added Supervisors to the escalation plan. |
08/19/2021 | Tam Rininger | Removed Dir of IT Ops to the escalation Plan. |
10/14/2024 | Andrew Poskey | Moved to Fresh |
01/20/2025 | Yolanda Terrazas-Franco | Updated "VP of IT" to Chief Technology Officer in the Escalation Plan. |
01/27/2025 | Yolanda Terrazas-Franco | Added Director of IT and Director of IT Operations to the Issue Escalation Contacts. |
3/13/2025 | Andrew Poskey | Updated title and content formatting |
4/2/2025 | Andrew Poskey | Updated changes from Word Doc, Added related article section. |