How to troubleshoot AWA(Agent Chat) assignment issues?


Description

The "Advanced Work Assignment" application (which is also used in Agent Chat), at times, causes incorrect assignment to an agent, that is, over-assignment or under-assignment. For example, when an agent has a max capacity of 5, AWA should not offer work items to him after the agent has reached his max capacity 5, but AWA kept assigning more than 5, this is over-assignment. Also, at times, AWA doesn't assign items until the agent's max capacity, when he has the bandwidth, this is called under-assignment.

This article will help to troubleshoot these kinds of assignment issues.

 

Instructions

Following are the essential steps to troubleshoot the over-assignment / under-assignment issue,

Step1: Identity what is the max capacity of the agent:

The agent's max capacity can be identified via below steps,

1. Connect to the affected instance as maint/admin/agent

2. Navigate to the awa_agent_capacity.list

3. Filter for the agent and Service Channel

4. The "Capacity" column will reveal what is the max capacity of the agent per Service Channel

5. This capacity column will be filled in only when the agent has capacity override defined in the Service Channel

6. If there is no capacity override defined for the agent, this capacity column will be empty, in that case, you need to go to Service Channel to verify the default max capacity defined at channel level as shown in the following screenshots,

 

Step2: Find out the current capacity of the agent

The current "capacity in use" of the agent is calculated based on the number of work items offered to him currently and the number of documents that are assigned to the agent now and are in the work-in-progress state.

You can find out the current capacity of the agent via the below steps,

1. Navigate to awa_work_item.list and filter for the agent with state="Pending Accept" per Queue

2. Go to the document table, for Agent chat, this will be interaction table, for case management, this will be case table, for incident management, this will be incident table. On the document table, filter for the agent and for active=true

3. AWA system will sum of these 2 counts to calculate the agent's current capacity-in-use

At this stage, we will know, the agent's max capacity and whats the agent's current capacity-in-use. Then, we need to verify if any documents assigned manually and if there are any skills restrictions to further troubleshoot the assignment issue.

Step3: Verify if any documents assigned manually

When the system has AWA active, still users can assign out the documents manually via list or form like going to interaction/case/incident form or list and changing the "assigned to" field to any agent.

The key point here is, the AWA will not restrict the manual assignment within the max capacity of the agent. Hence, in that case, the manual assignment could lead to over-assignment, that is, above the max capacity limit of the agent.

However, when the agent is manually assigned beyond his capacity AWA will stop assigning work items to him since he won't have any more bandwidth. AWA will only consider him for assignment only when the agent's "capacity in use" goes below the level of max capacity.

For example, when an agent has a max capacity of 4, but he was assigned with 4 by AWA, and 4 manually by his manager, then his "capacity in use" will be 8, which is over-assignment as per his capacity. After the agent closes 5 items, his "capacity in use" will become 3, then AWA can consider him for assignment.

These are the steps to verify over-assignment issues. The next will be more for the under-assignment issue, that is, the agent has enough capacity but AWA is not considering him for assignment.

Step4: Verify if there are any skill restrictions

When AWA is not assigning out to an agent though he has enough bandwidth, possibly it can be a problem with mandatory skill enforced for the assignment, but the agent is not possessing the required skills.

Following are the steps to verify the skill restriction,

1. For the work items document record, check what skills required for it. The document record can be incident/case or interaction (for chat).

2. For the affected work item, copy the document record number, for example CS36559811 and go to task_m2m_skill table and filter for it,

3. This will tell what are the skills required for the task to be picked by up an agent.

4. Now we know the skills required for the document, lets find out whether the available agent has those skills.

5. You can find this via going to the table sys_user_has_skill and filter for the agent.

6. If agent has the required skills, then system should consider him for assignment. 


More details about this process also clarified on the KB0951909 - "Section C - Mandatory Skills"

 

Additional Information

While troubleshooting assignment issue, the following are the additional/miscellaneous things to know,

 1. The capacity calculation is very costly resource-consuming process and hence it is calculated at a certain frequency and stored in "awa_agent_capacity" cached table. Hence, if there are any issues with values on this table, it can also lead to assignment issues. Hence, we need to verify the values on this table as per above steps.

2. Known PRB's,

"PRB1417652 - Concurrent reads and updates to the awa_agent_capacity table causes incorrect workload values"

"PRB1376053 - Set Work Flow to false in all AWA GlideRecord queries by system property"

Here, PRB1417652 is fixed from Paris Patch 7 onwards, and for the instances which are running on Pre  Paris Patch 7 release, might have workaround applied via a scheduled job fix that runs for every 5 min. That is incorrect capacity is fixed for every 5min, but if the AWA assigner thread ran within this 5 min that is before the capacity was fixed by this job there is a chance of over/under assignment. In that case, the interval of the job needs to be increased from 5min to 2min to minimize the impact of this PRB.

Here, PRB1376053 is fixed from Paris release onwards. This problem deals with the fact that when AWA record watcher responders (Eligibility/workload) has invoked a javaclass, if that javaclass queries document table, it will fire before query BR's on those table, just like anywhere else on the platform. But, if those BR's are slow or causing recursive calls, they can delay the record watcher responder process. That can cause delay with updating the capacity and hence could cause assignment issues.

For example, 

- Agent A has max capacity 5

- He has already 5 work items offered by AWA, he accepted them, thus his capacity-in-use is 5

- He closed 3 documents, now his capacity-in-use is 2 (5-3=2)

- His manager assigns him 5 more documents manually, so his capacity-in-use become 5+2=7 (which is beyond his max capacity 5, since it was manual assignment)

- Then his manager un-assigns 4 out of those manually assigned documents, this will cause record watcher responder of AWA to decrease his capacity by 4

 - This record watcher update should be fired instantly, but in case there is slow query BR on document table, it can cause delay with it

- Hence, during that in between time, AWA might consider his wrong capacity

- Additonally, in between the fix job can come and correct the capacity to right value 

- On top of that, delayed record watcher updates come later and decrease the capacity further, thus can cause over-assignment

This will be a very rase edge scenario.

To avoid it, you have to make sure that query BR's are not slower on the document table so that record watcher responder update events are processed quickly.

Or to avoid the execution of the query BR's from record watcher responder thread via setting up system property "glide.awa.query_br_disable=false" (available only from Paris release)

3. Following log patterns are useful to look for while troubleshooting assignment issues,

The Scheduled job fix will be printing following log marker

"CSM Number of agents with incorrect capacity"

The glide.record_watcher.evaluator worker thread will be printing following log marker while updating capacity,

Update workload. Agent: <sys_id_of_agent>, channel: <sys_id_of_channel>

Also record watcher thread will be printing following log after processing the record watcher event with a delay,

glide.record_watcher.evaluator.<thread_number> SYSTEM Processed record