At Bazaarvoice we ended January with a two day global internal company hackathon. Myself, Angela Lappin, Sorcha McNamee & Lars Wiedenhoefer teamed up to become the ultimate hackathon team, known as “The Comprehenders”!
How we arrived at our project idea and the deeper meaning behind it is a story that’s both exhilarating and loaded with insights and learning progress. Seriously. There was a lot of technical and team fun involved. Even while social distancing across the streets of Belfast and internationally across the ocean all the way to Austin, Texas.
A hackathon, as you very well know, provides time to step back, take inventory, and work on something inspiring and cool. We got together and asked ourselves what the support trends of other application areas are. We know ours, but how could we quickly know what’s in a support queue of other teams. What are our customers saying and thinking? As an engineer with product delivery time pressures, how can we still be customer oriented and empathetic to their needs?
We attempted reading through support tickets and NPS comments to see if we could find trending key words and the overall sentiment. Quickly, of course, we discovered that we could only read through one or two cases before other tasks became very urgently important and demanded our attention. The inspiration came from our daily work with Amazon Web Services. We noticed a feature called AWS Comprehend. Amazon describes Comprehend as “a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. No machine learning experience required.”
This sounded like the perfect service for us to try and quickly get up and running over the duration of the two day hackathon.
The overall goal for this project was for us to see if we could utilize AWS Comprehend to analyse Salesforce issues or NPS comments in a way that the outcome indicates the customer sentiment and briefly explains what all the issues for a Bazaarvoice project are about.
Before we dove into building we gathered together to create the following architecture diagram to understand how this project would be structured and what services we would need to use:
Building the solution
First steps: Preparing/cleaning the data
Like in a lot of projects, real world data is rarely in a perfect condition and ready to be analysed, this was particularly clear to us when we received the data from Salesforce and NPS. The exported data was very messy, had chunks of repeatable unnecessary information including email headers/footers. We knew this was the first task we needed to address.
We came together and created a Node.js script that could read in these CSV files, identify the useful columns, strip out repeatable/unnecessary information and format the data back into a new clean CSV file.
With our data cleaned and ready to go, we had one final step to do to allow Comprehend to access it — upload the data to an AWS S3 Bucket.
Within the S3 bucket for the project we defined a file structure to make it easier to handle the different types of inputs and outputs this project would require.
Step 2: Time to comprehend
Now with the data in place it was time to kick off some comprehend jobs! AWS Comprehend has a number of different type of analysis jobs that you can run on your datasets — as a team we decided that the jobs that would have most relevance to what we are trying to achieve are:
- Sentiment Analysis — Determining how the user felt based on their comments (Positive/ Negative)
- Key Phrase — Pulling out the most commonly used phrases from the reviews.
Running the comprehend job also involved creating a role to give comprehend permission to the data in S3. With the job type selected, role set up and name assigned, all that was left was to hit the run button and wait for some results to populate!
The data outputted from comprehend is in a zip folder, so there was one small step we had to take to download and unzip the folder to get the output file. We then uploaded this output file to our output folder in the s3 bucket!
Step 3: Preparing the data for visualisation with Glue & Athena
In order to give the data that was output from Comprehend more structure we needed to use a Glue Crawler. The Glue Crawler scans our file and determines the schema of the data, then it can automatically explore and catalog this data for us. Once the Glue Catalog has been created from the Crawler, we are able to access it within the Athena console.
When viewing the Catalog data in Athena, we noticed that certain columns contained nested data, which would be very hard for us to visualise properly. However, to un-nest the data we were able to run some SQL commands within the Athena Query Editor, to abstract values and move them into columns of their own!
With the data abstracted and saved — there was only one more thing left to do…VISUALISE!
Final Step: Visualising the data
For data visualisation we decided to use AWS Quicksight as it could connect directly to our outputs from Athena.
In QuickSight, when the dataset was connected we could create all different types of graphs from it. These graphs could be shared across the team and merged together into a dashboard so that we could see the overall picture of everything we achieved!
Lets see some results!
The graph screenshots included in these results where generated from a small subsection of data and are purely for demonstration purposes only. These are therefore not a true reflection of the Bazaarvoice customer support queues.
Customer sentiment results
From the visualisations created in QuickSight we could start to make comparisons between the Salesforce and NPS data. Firstly, we can see that in the NPS data there is an even amount of negative, neutral and positive comments. This is because the NPS feedback is asked for when the customer is using the application. However, our Salesforce dataset is from customers who are not happy with the product/service therefore there is very few positive sentiment received — we did expect that this would be the case, but we were pleasantly surprised that 11.39% of the support cases actually had a positive sentiment.
We coloured the positive sentiment green and kept the neutral, negative, and mixed sentiment red, just to illustrate the answer to the question: How positive is your customer support queue?
Key phrases analysis results
Salesforce key phrases
NPS key phrases
Once we started viewing the data output from the Key Phrases analysis jobs through visualisations we realised that the results weren’t quite what we expected. We started to see that the words it was picking up as “key phrases” where works like “The”, “This”, “Your” etc. that overall skewed the results.
This is something that we assumed that Comprehend would have been able to handle for us out of the box and in the future we would look into removing the common words or looking into a tool that could handle this for us.
Challenges we faced
Throughout our journey with this project, we ran into some challenges, the following are some of the main issues we faced and how we overcame them.
Cleaning data sources
A lot of Salesforce case descriptions and comments are derived from emails; therefore there was blocks of text that would cause AWS Comprehend to derive insights that we aren’t interested in.
We created a Node.js script to strip these common pieces of email data out of the dataset. We noticed text patterns that could be used as a marker for splitting a string, so we could pick out relevant text for analysing.
We also were able to parse the Salesforce description and case comments columns into the format that is best consumable for AWS Comprehend.
Nested data in Athena
Once we viewed the Comprehend output data in Athena, we noticed that certain columns contained nested data, which would make it difficult for us to visualise insights effectively.
To un-nest the data, we were able to run SQL commands within the Athena Query Editor, to create a new table with these values abstracted so that they could all have columns of their own.
However, with key phrases analysis, the un-nested data still had an other layer of nested data. To overcome this we decided to create an interim table to unnest the data once, over which we queried the interim table again, to created a final table of un-nested data.
Working on this project was so much fun and has opened our eyes on how we could enhance this in the future and gain more meaningful insights. Below are some of the ideas of things we would like to look into the future to improve on our current solution:
Key phrases analysis
We would definitely like to look into how to get better results form our key phrases analysis as we weren’t very happy with how the output we received was so skewed by common words. For this we would need to spend time investigating what is the best way to handle this, is there a further cleaning step we could do, or is there potentially another tool that we could use that could handle this for us.
From this hackathon we can already see so many possibilities of other areas we could use this in. We feel like it would be great to try and analyse the customer support tickets that get raised through Jira. In particular, we could generate a regular sentiment analysis report that could help us benchmark the overall success of our customer support handling process.
Use a PII scrubber to take out any personal data before running it through AWS Comprehend.
Automate data retrieval
In the hackathon we manually exported and downloaded the required data files from Salesforce and NPS. However, this wouldn’t be feasible if this project continued to grow across more teams and other areas of the business. It would be good if we could interact with the API’s for these services so that we could fully automate this data retrieval and then pass it directly into our cleaning service.
This was a great project to skim the surface of the number of possibilities that we could pursue with continuing to investigate in this area of analysing our customer support queues. It is clear to see that The Comprehenders journey will not end here, so stay tuned to see what else we can achieve together!