Research

Text Analysis: Keyword Attribution Model9 min read

Oct 3, 2021 6 min

Text Analysis: Keyword Attribution Model9 min read

Reading Time: 6 minutes

This is a guide to identifying groups and audiences, with better alignment with content and user intent. Equally, this methodology ensures focus on attainable keywords and helps in discovering otherwise missed keyword clusters. With a logical approach, you can explain the keyword process you take to both clients and stakeholders with more clarity.

To download the files used in the BrightonSEO talk you have (or are currently) watching quickly, please use the links below. If you face issues, right-click the link and select “Save as.”

Keyword group database: click here.The batch of keywords used in the talk: click here. R-Studio and R(language): click here and here.

Both Mac and Windows can run R code and are both readily available to download (for free) on the active links above. You will find the full script and tutorial at the bottom of this page.

The descriptions of the functions in R are in the BrightonSEO video. If you haven’t watched or are still stuck, I would recommend breaking down the packages one-by-one and searching online for definitions. Equally, if you scroll beyond this intro section, I detail the steps and explain most of the script’s function.

If you didn’t attend BrightonSEO, you could purchase the full video via the HeySummit video library.

Below you will find the steps accompanying the conference talk.

Introduction to Keyword Attribution

Generally speaking, most keyword research is in PPC-form, with cost per click and metrics of that nature featured within a tool used for the opposite search scenario. The likes of Keyword Planner helps to highlight this. The service, which has seen upgrades now-and-then, is a tool most folks in Search use for keyword analysis. Using this tool is unwise. Not only that, but it can cause misleadings in search volume. Equally, on the whole, it can lead to short-term thinking in the hope of a quick win.

With attribution, you can identify audiences and keyword groups, which better align with the terms mentioned above for both content and user intent. With a real focus on attainable keywords and drawing your attention to keyword clusters you have missed; otherwise, this is a logical approach that you can easily show to both clients and stakeholders to explain your reasoning.

In line with the video tutorial, I will be assuming that you already have R installed and are ready to start working on the script to manipulate the downloadable batch of keywords (as CSV, above). The keyword database that will direct the groups used for filtering is in the same section.

The preface for the analysis is that work with a client who is a clothing retailer and who want to find more relevant keywords that they may not have considered targeting.

Requirements

To manipulate data with the function used in this script, you will need to install and library (save to R’s memory) the following packages.

# Install Packages
install.packages("dplyr")
install.packages("data.table")
install.packages("readr")
# Library Packages
library(dplyr)
library(data.table)
library(readr)

Import Keyword Dataset (CSV)

As noted above, you can install the document used during the BrightonSEO talk. If you haven’t watched the video, the document is a raw sample containing clothing keywords. The export is an exported Keyword Planner file.

# Import CSV and save as 'keyword dataset' dataframe
keyword_dataset <- read.csv("keyword_dataset.csv", stringsAsFactors = FALSE)

Data Preprocessing, Subsetting

Essentially, the process used to extract specific keywords, known as ‘text analysis’, requires an exact match. In other words, you need to have a list of keywords that are the same, both in spelling and format, as the database of keywords you are searching in.

# Data Preprocessing
keyword_dataset$Keyword <- tolower(keyword_dataset$Keyword) # Housekeeping - make all lowercase; formatting is consistient (avoid missing keywords)
# Remove unnecessary Keyword Planner metrics
keyword_dataset <- subset(keyword_dataset, select = -c(CPC, Previous.position, Number.of.Results, Trends, SERP.Features.by.Keyword, URL))

It is always best to ensure there are no discrepancies between a) the list of keywords and b) the exported dataset. To resolve any potential formatting errors, I use the ‘tolower’ function. The function makes the list of keywords in the dataset lowercase. It does mean that the list of keywords you search with when filtering later must be in lowercase. However, it is an easy solution and ensures that you have not got to worry about double-checking for individual capital letters in what may be a list of thousands of keywords.

While future variations may change depending on where you generated your raw file of keywords, often, Keyword Planner will provide you metrics you don’t need. Using the ‘subset’ function, you can remove them.

Example of Filtering Keywords

Below is an example of how to filter for keywords. Essentially, the function works by first selecting the dataframe you wish to query [keyword_dataset], followed by the column name with a dollar sign ($) in front of name [$Keyword] followed by a ‘keyword’ to target, which you are indicating via the like function.

I have included three examples, which show what you will eventually pull off at a larger scale (with more keywords).

# filter exactlyfor a value (keyword) -
shoes_keyword_dataset <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% "shoes")

# filter exactlyfor a value (keyword) - within the KEYWORD column($) - like (%like%) 'SHOES'
shoes_keyword_dataset <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'shoes')

# filter exactlyfor a value (keyword) - within the KEYWORD column($) - like (%like%) 'CLOTHING'
clothing_keyword_dataset <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'clothing')

OR Logical Operator

Based on the logic above, you should now understand how to filter the raw file with single keywords. When it comes to filtering for multiple keywords, you need to add the ‘OR’ operator [|]. Seeing as R runs the script from left-to-right, you need to indicate at the end of each closing string that this is a variable against several values (keywords). With the added operator, R will now store the previous keywords to memory and collate the ‘hits’ to the dataframe. If not, R will read the strings individually, stopping before the second request. If you have ever used JavaScript, you will be familiar with the functionality, which is almost identical (||) to the one used in R.

Note on Groups, First Filter

It is important to note that while the database may be relevant for most, you may find that some may not. In which case, please omit these sections from your script and only include the relevant passages. As directed by the keyword database (download above), the first group is keywords relating to questions.

# Question-based Keywords
question_keywords <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'how to' |
   keyword_dataset$Keyword %like% 'what' |
   keyword_dataset$Keyword %like% 'how' |
   keyword_dataset$Keyword %like% 'when' |
   keyword_dataset$Keyword %like% 'who' |
   keyword_dataset$Keyword %like% 'what' |
   keyword_dataset$Keyword %like% 'where' |
   keyword_dataset$Keyword %like% 'when' |
   keyword_dataset$Keyword %like% 'why ' |
   keyword_dataset$Keyword %like% 'how many' |
   keyword_dataset$Keyword %like% 'near me')

Remaining Groups to Filter

The following passage contains the twelve remaining groups. Please scroll within the snippet or use the copy button. Note: The below does not include all filters for product categories.

# General / Clothing
general_clothing <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'outfits' |
             keyword_dataset$Keyword %like% 'overshirt' |
             keyword_dataset$Keyword %like% 'jacket' |
             keyword_dataset$Keyword %like% 'Party' |
             keyword_dataset$Keyword %like% 'sleeve' |
             keyword_dataset$Keyword %like% 'socks' |
             keyword_dataset$Keyword %like% 'cashmere' |
             keyword_dataset$Keyword %like% 'cardigan' |
             keyword_dataset$Keyword %like% 'sweatshirt' |
             keyword_dataset$Keyword %like% 'sweater' |
             keyword_dataset$Keyword %like% 'jumper' |
             keyword_dataset$Keyword %like% 'shirt' |
             keyword_dataset$Keyword %like% 'dress' |
             keyword_dataset$Keyword %like% 'flannel' |
             keyword_dataset$Keyword %like% 'Crocs' |
             keyword_dataset$Keyword %like% 'suits' |
             keyword_dataset$Keyword %like% 'tee' |
             keyword_dataset$Keyword %like% 'leggings' |
             keyword_dataset$Keyword %like% 'palazzo pants' |
             keyword_dataset$Keyword %like% 'jeans' |
             keyword_dataset$Keyword %like% 'polo' |
             keyword_dataset$Keyword %like% 'tracksuit' |
             keyword_dataset$Keyword %like% 'jumpers' |
             keyword_dataset$Keyword %like% 'tie' |
             keyword_dataset$Keyword %like% 'trainers' |
             keyword_dataset$Keyword %like% 'outfits' |
             keyword_dataset$Keyword %like% 'overshirt' |
             keyword_dataset$Keyword %like% 'jacket' |
             keyword_dataset$Keyword %like% 'party' |
             keyword_dataset$Keyword %like% 'sleeve' |
             keyword_dataset$Keyword %like% 'socks' |
             keyword_dataset$Keyword %like% 'cashmere' |
             keyword_dataset$Keyword %like% 'cardigan' |
             keyword_dataset$Keyword %like% 'sweatshirt' |
             keyword_dataset$Keyword %like% 'sweater' |
             keyword_dataset$Keyword %like% 'jumper' |
             keyword_dataset$Keyword %like% 'shirt' |
             keyword_dataset$Keyword %like% 'dress' |
             keyword_dataset$Keyword %like% 'flannel' |
             keyword_dataset$Keyword %like% 'Crocs' |
             keyword_dataset$Keyword %like% 'suits' |
             keyword_dataset$Keyword %like% 'tee' |
             keyword_dataset$Keyword %like% 'leggings' |
             keyword_dataset$Keyword %like% 'jeans' |
             keyword_dataset$Keyword %like% 'polo' |
             keyword_dataset$Keyword %like% 'tracksuit' |
             keyword_dataset$Keyword %like% 'jumpers' |
             keyword_dataset$Keyword %like% 'tie' |
             keyword_dataset$Keyword %like% 'trainers' |
             keyword_dataset$Keyword %like% 'shorts' |
             keyword_dataset$Keyword %like% 'yoga' |
             keyword_dataset$Keyword %like% 'gear' |
             keyword_dataset$Keyword %like% 'cleat' |
             keyword_dataset$Keyword %like% 'jersey' |
             keyword_dataset$Keyword %like% 'merch' |
             keyword_dataset$Keyword %like% 't shirts' |
             keyword_dataset$Keyword %like% 'workwear' |
             keyword_dataset$Keyword %like% 'tank top' |
             keyword_dataset$Keyword %like% 'top' |
             keyword_dataset$Keyword %like% 'pants' |
             keyword_dataset$Keyword %like% 'vans' |
             keyword_dataset$Keyword %like% 'sportswear' |
             keyword_dataset$Keyword %like% 'fleece' |
             keyword_dataset$Keyword %like% 'undergarments' |
             keyword_dataset$Keyword %like% 'shirts' |
             keyword_dataset$Keyword %like% 'shirt' |
             keyword_dataset$Keyword %like% 'coat' |
             keyword_dataset$Keyword %like% 'wenven' |
             keyword_dataset$Keyword %like% 'knitwear')

general_clothing$Group <- "General Clothing"

Rbind, Export Final Dataset

The final step is to combine all the new dataframes you have created. You do this using the ‘rbind’ function. All you need to do is add the names of dataframes (separated by comma). Then, once combined, export the dataframe as a CSV.

combined_dataset <- rbind(female_clothing, general_clothing, general_brand, general_transaction,
                     female_footware, female_nightwear, female_lingerie, female_maternity,
                     female_clothing, male_footwear, male_nightwear, male_clothing)

Conclusion: Text Analysis R

As you can see, once you have decided upon the keyword categories and got a bulked out list to query, the process of filtering and grouping is relatively simple. I would recommend downloading the files attached if this is the first time using this concept. Beyond that, once you feel comfortable, test and see what works (i.e. generates more ‘hits’ with the corresponding database). It would be unjust not to mention Merge Words, which I would recommend using when you start building out your version. It is a tool that makes concatenating strings of code a much simpler affair and is free.

Your email address will not be published. Required fields are marked *