Home     Blog     The Engine Room     Staging     Morning Coffee     Google Algorithm Updates     Thumbnail Artwork     About     Contact     Site Updates     Article Policy     Terms & Conditions

Keyword Attribution Model, Text Analysis

This is a guide to identifying groups and audiences, with better alignment with content and user intent. Equally, this methodology ensures focus on attainable keywords and helps in discovering otherwise missed keyword clusters. With a logical approach, you can explain the keyword process you take to both clients and stakeholders with more clarity.

Keyword Attribution Model, Text Analysis
Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia.
Harry Austen via Unplash
Harry Austen

by Harry Austen

Published Dec. 07, 2020 GMT

Updated Feb. 26, 2021 GMT

To download the files used in the BrightonSEO talk you have (or are currently) watching quickly, please use the links below. If you face issues, right-click the link and select “Save as.”

Keyword group database: click here.
The batch of keywords used in the talk: click here.
R-Studio and R(language): click here and here.

Both Mac and Windows can run R code and are both readily available to download (for free) on the active links above. You will find the full script and tutorial at the bottom of this page.

The descriptions of the functions in R are in the BrightonSEO video. If you haven’t watched or are still stuck, I would recommend breaking down the packages one-by-one and searching online for definitions. Equally, if you scroll beyond this intro section, I detail the steps and explain most of the script’s function.

If you didn’t attend BrightonSEO, you could purchase the full video via the HeySummit video library.

Below you will find the steps accompanying the conference talk.

Table of Contents

Introduction to Keyword Attribution

Generally speaking, most keyword research is in PPC-form, with cost per click and metrics of that nature featured within a tool used for the opposite search scenario. The likes of Keyword Planner helps to highlight this. The service, which has seen upgrades now-and-then, is a tool most folks in Search use for keyword analysis. Using this tool is unwise. Not only that, but it can cause misleadings in search volume. Equally, on the whole, it can lead to short-term thinking in the hope of a quick win.

With attribution, you can identify audiences and keyword groups, which better align with the terms mentioned above for both content and user intent. With a real focus on attainable keywords and drawing your attention to keyword clusters you have missed; otherwise, this is a logical approach that you can easily show to both clients and stakeholders to explain your reasoning.

In line with the video tutorial, I will be assuming that you already have R installed and are ready to start working on the script to manipulate the downloadable batch of keywords (as CSV, above). The keyword database that will direct the groups used for filtering is in the same section.

The preface for the analysis is that work with a client who is a clothing retailer and who want to find more relevant keywords that they may not have considered targeting.

Install & Library Packages

To manipulate data with the function used in this script, you will need to install and library (save to R’s memory) the following packages.

# Install Packages
install.packages("dplyr")
install.packages("data.table")
install.packages("readr")
# Library Packages
library(dplyr)
library(data.table)
library(readr) # reading in the csv with search volume needing to be turned into characters

Import Keyword Dataset (CSV)

As noted above, you can install the document used during the BrightonSEO talk. If you haven’t watched the video, the document is a raw sample containing clothing keywords. The export is an exported Keyword Planner file.

# Import CSV and save as 'keyword dataset' dataframe
keyword_dataset <- read.csv("keyword_dataset.csv", stringsAsFactors = FALSE)

Data Preprocessing, Subsetting

Essentially, the process used to extract specific keywords, known as ‘text analysis’, requires an exact match. In other words, you need to have a list of keywords that are the same, both in spelling and format, as the database of keywords you are searching in.

# Data Preprocessing
keyword_dataset$Keyword <- tolower(keyword_dataset$Keyword) # Housekeeping - make all lowercase; formatting is consistient (avoid missing keywords)
# Remove unnecessary Keyword Planner metrics
keyword_dataset <- subset(keyword_dataset, select = -c(CPC, Previous.position, Number.of.Results, Trends, SERP.Features.by.Keyword, URL))

It is always best to ensure there are no discrepancies between a) the list of keywords and b) the exported dataset. To resolve any potential formatting errors, I use the ‘tolower’ function. The function makes the list of keywords in the dataset lowercase. It does mean that the list of keywords you search with when filtering later must be in lowercase. However, it is an easy solution and ensures that you have not got to worry about double-checking for individual capital letters in what may be a list of thousands of keywords.

While future variations may change depending on where you generated your raw file of keywords, often, Keyword Planner will provide you metrics you don’t need. Using the ‘subset’ function, you can remove them.

Example of Filtering Keywords

Below is an example of how to filter for keywords. Essentially, the function works by first selecting the dataframe you wish to query [keyword_dataset], followed by the column name with a dollar sign ($) in front of name [$Keyword] followed by a ‘keyword’ to target, which you are indicating via the like function.

I have included three examples, which show what you will eventually pull off at a larger scale (with more keywords).

# filter exactly for a value (keyword) - 
shoes_keyword_dataset <- keyword_dataset %>% 
   filter(keyword_dataset$Keyword %like% "shoes")

# filter exactly for a value (keyword) - within the KEYWORD column($) - like (%like%) 'SHOES'
shoes_keyword_dataset <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'shoes')

# filter exactly for a value (keyword) - within the KEYWORD column($) - like (%like%) 'CLOTHING'
clothing_keyword_dataset <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'clothing')

OR Logical Operator

Based on the logic above, you should now understand how to filter the raw file with single keywords. When it comes to filtering for multiple keywords, you need to add the ‘OR’ operator [|]. Seeing as R runs the script from left-to-right, you need to indicate at the end of each closing string that this is a variable against several values (keywords). With the added operator, R will now store the previous keywords to memory and collate the ‘hits’ to the dataframe. If not, R will read the strings individually, stopping before the second request. If you have ever used JavaScript, you will be familiar with the functionality, which is almost identical (||) to the one used in R.

Note on Groups, First Filter

It is important to note that while the database may be relevant for most, you may find that some may not. In which case, please omit these sections from your script and only include the relevant passages. As directed by the keyword database (download above), the first group is keywords relating to questions.

# Question-based Keywords
question_keywords <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'how to' |
   keyword_dataset$Keyword %like% 'what' |
   keyword_dataset$Keyword %like% 'how' |
   keyword_dataset$Keyword %like% 'when' |
   keyword_dataset$Keyword %like% 'who' |
   keyword_dataset$Keyword %like% 'what' |
   keyword_dataset$Keyword %like% 'where' |
   keyword_dataset$Keyword %like% 'when' |
   keyword_dataset$Keyword %like% 'why ' |
   keyword_dataset$Keyword %like% 'how many' |
   keyword_dataset$Keyword %like% 'near me')

Remaining Groups to Filter

The following passage contains the twelve remaining groups. Please scroll within the snippet or use the copy button (top-right) to see/paste the expanded code.


# General / Clothing
general_clothing <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'outfits' |
             keyword_dataset$Keyword %like% 'overshirt' |
             keyword_dataset$Keyword %like% 'jacket' |
             keyword_dataset$Keyword %like% 'Party' |
             keyword_dataset$Keyword %like% 'sleeve' |
             keyword_dataset$Keyword %like% 'socks' |
             keyword_dataset$Keyword %like% 'cashmere' |
             keyword_dataset$Keyword %like% 'cardigan' |
             keyword_dataset$Keyword %like% 'sweatshirt' |
             keyword_dataset$Keyword %like% 'sweater' |
             keyword_dataset$Keyword %like% 'jumper' |
             keyword_dataset$Keyword %like% 'shirt' |
             keyword_dataset$Keyword %like% 'dress' |
             keyword_dataset$Keyword %like% 'flannel' |
             keyword_dataset$Keyword %like% 'Crocs' |
             keyword_dataset$Keyword %like% 'suits' |
             keyword_dataset$Keyword %like% 'tee' |
             keyword_dataset$Keyword %like% 'leggings' |
             keyword_dataset$Keyword %like% 'palazzo pants' |
             keyword_dataset$Keyword %like% 'jeans' |
             keyword_dataset$Keyword %like% 'polo' |
             keyword_dataset$Keyword %like% 'tracksuit' |
             keyword_dataset$Keyword %like% 'jumpers' |
             keyword_dataset$Keyword %like% 'tie' |
             keyword_dataset$Keyword %like% 'trainers' |
             keyword_dataset$Keyword %like% 'outfits' |
             keyword_dataset$Keyword %like% 'overshirt' |
             keyword_dataset$Keyword %like% 'jacket' |
             keyword_dataset$Keyword %like% 'party' |
             keyword_dataset$Keyword %like% 'sleeve' |
             keyword_dataset$Keyword %like% 'socks' |
             keyword_dataset$Keyword %like% 'cashmere' |
             keyword_dataset$Keyword %like% 'cardigan' |
             keyword_dataset$Keyword %like% 'sweatshirt' |
             keyword_dataset$Keyword %like% 'sweater' |
             keyword_dataset$Keyword %like% 'jumper' |
             keyword_dataset$Keyword %like% 'shirt' |
             keyword_dataset$Keyword %like% 'dress' |
             keyword_dataset$Keyword %like% 'flannel' |
             keyword_dataset$Keyword %like% 'Crocs' |
             keyword_dataset$Keyword %like% 'suits' |
             keyword_dataset$Keyword %like% 'tee' |
             keyword_dataset$Keyword %like% 'leggings' |
             keyword_dataset$Keyword %like% 'jeans' |
             keyword_dataset$Keyword %like% 'polo' |
             keyword_dataset$Keyword %like% 'tracksuit' |
             keyword_dataset$Keyword %like% 'jumpers' |
             keyword_dataset$Keyword %like% 'tie' |
             keyword_dataset$Keyword %like% 'trainers' |
             keyword_dataset$Keyword %like% 'shorts' |
             keyword_dataset$Keyword %like% 'yoga' |
             keyword_dataset$Keyword %like% 'gear' |
             keyword_dataset$Keyword %like% 'cleat' |
             keyword_dataset$Keyword %like% 'jersey' |
             keyword_dataset$Keyword %like% 'merch' |
             keyword_dataset$Keyword %like% 't shirts' |
             keyword_dataset$Keyword %like% 'workwear' |
             keyword_dataset$Keyword %like% 'tank top' |
             keyword_dataset$Keyword %like% 'top' |
             keyword_dataset$Keyword %like% 'pants' |
             keyword_dataset$Keyword %like% 'vans' |
             keyword_dataset$Keyword %like% 'sportswear' |
             keyword_dataset$Keyword %like% 'fleece' |
             keyword_dataset$Keyword %like% 'undergarments' |
             keyword_dataset$Keyword %like% 'shirts' |
             keyword_dataset$Keyword %like% 'shirt' |
             keyword_dataset$Keyword %like% 'coat' |
             keyword_dataset$Keyword %like% 'wenven' |
             keyword_dataset$Keyword %like% 'knitwear')

general_clothing$Group <- "General Clothing"

# Female / Clothing
female_clothing <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'womens trousers' |
             keyword_dataset$Keyword %like% 'swimwear' |
             keyword_dataset$Keyword %like% 'womens swimwear' |
             keyword_dataset$Keyword %like% 'hoodies' |
             keyword_dataset$Keyword %like% 'shorts & skirts' |
             keyword_dataset$Keyword %like% 'womens outfits' |
             keyword_dataset$Keyword %like% 'mens multipacks' |
             keyword_dataset$Keyword %like% 'womens swimwear' |
             keyword_dataset$Keyword %like% 'womens womens swimwear' |
             keyword_dataset$Keyword %like% 'womens hoodies' |
             keyword_dataset$Keyword %like% 'womens shorts & skirts' |
             keyword_dataset$Keyword %like% 'womens womens outfits' |
             keyword_dataset$Keyword %like% 'womens mens multipacks' |
             keyword_dataset$Keyword %like% 'swimwear' |
             keyword_dataset$Keyword %like% 'womens swimwear' |
             keyword_dataset$Keyword %like% 'hoodies' |
             keyword_dataset$Keyword %like% 'shorts & skirts' |
             keyword_dataset$Keyword %like% 'womens trousers' |
             keyword_dataset$Keyword %like% 'womens pyjamas' |
             keyword_dataset$Keyword %like% 'womens pyjama tops' |
             keyword_dataset$Keyword %like% 'womens pyjama bottoms' |
             keyword_dataset$Keyword %like% 'womens pyjama sets' |
             keyword_dataset$Keyword %like% 'womens dressing gowns' |
             keyword_dataset$Keyword %like% 'womens nightdresses' |
             keyword_dataset$Keyword %like% 'womens slippers' |
             keyword_dataset$Keyword %like% 'womens character' |
             keyword_dataset$Keyword %like% 'womens short pyjamas' |
             keyword_dataset$Keyword %like% 'womens bride & hen' |
             keyword_dataset$Keyword %like% 'womens loungewear' |
             keyword_dataset$Keyword %like% 'womens womens joggers' |
             keyword_dataset$Keyword %like% 'womens pyjamas' |
             keyword_dataset$Keyword %like% 'womens pyjama tops' |
             keyword_dataset$Keyword %like% 'womens pyjama bottoms' |
             keyword_dataset$Keyword %like% 'womens pyjama sets' |
             keyword_dataset$Keyword %like% 'womens dressing gowns' |
             keyword_dataset$Keyword %like% 'womens nightdresses' |
             keyword_dataset$Keyword %like% 'womens slippers' |
             keyword_dataset$Keyword %like% 'womens short pyjamas' |
             keyword_dataset$Keyword %like% 'womens outfits' |
             keyword_dataset$Keyword %like% 'womens overshirt' |
             keyword_dataset$Keyword %like% 'womens jacket' |
             keyword_dataset$Keyword %like% 'womens workwear' |
             keyword_dataset$Keyword %like% 'womens fleece' |
             keyword_dataset$Keyword %like% 'womens undergarments')

female_clothing$Group <- "Female Clothing"



# General / Brand
general_brand <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'nike' |
             keyword_dataset$Keyword %like% 'air force one' |
             keyword_dataset$Keyword %like% 'air max' |
             keyword_dataset$Keyword %like% 'infinity' |
             keyword_dataset$Keyword %like% 'react ' |
             keyword_dataset$Keyword %like% 'mercurial' |
             keyword_dataset$Keyword %like% 'air max' |
             keyword_dataset$Keyword %like% 'revolution' |
             keyword_dataset$Keyword %like% 'max' |
             keyword_dataset$Keyword %like% 'todos' |
             keyword_dataset$Keyword %like% 'lebron' |
             keyword_dataset$Keyword %like% 'pegasus' |
             keyword_dataset$Keyword %like% 'air zoom' |
             keyword_dataset$Keyword %like% 'blazer' |
             keyword_dataset$Keyword %like% 'vista lite' |
             keyword_dataset$Keyword %like% 'max bella' |
             keyword_dataset$Keyword %like% 'adidas' |
             keyword_dataset$Keyword %like% 'jordon' |
             keyword_dataset$Keyword %like% 'converse' |
             keyword_dataset$Keyword %like% 'orolay' |
             keyword_dataset$Keyword %like% 'yuedge' |
             keyword_dataset$Keyword %like% 'yeezy' |
             keyword_dataset$Keyword %like% 'boost 350 ' |
             keyword_dataset$Keyword %like% 'wave runner' |
             keyword_dataset$Keyword %like% 'woodstock' |
             keyword_dataset$Keyword %like% 'vans' |
             keyword_dataset$Keyword %like% 'arcteryx ' |
             keyword_dataset$Keyword %like% 'big and tall' |
             keyword_dataset$Keyword %like% 'bjj gi' |
             keyword_dataset$Keyword %like% 'black and white')


general_brand$Group <- "General Brand"


# General / Transaction
general_transaction <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'buy' |
             keyword_dataset$Keyword %like% 'purchase' |
             keyword_dataset$Keyword %like% 'sell' |
             keyword_dataset$Keyword %like% 'transaction' |
             keyword_dataset$Keyword %like% 'merchant' |
             keyword_dataset$Keyword %like% 'shop' |
             keyword_dataset$Keyword %like% 'sale' |
             keyword_dataset$Keyword %like% 'promo' |
             keyword_dataset$Keyword %like% 'clearance')

general_transaction$Group <- "General Transaction"



# Female / Footware
female_footware <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'womens boots' |
             keyword_dataset$Keyword %like% 'ankle boots' |
             keyword_dataset$Keyword %like% 'womens trainers' |
             keyword_dataset$Keyword %like% 'womens flats' |
             keyword_dataset$Keyword %like% 'heels' |
             keyword_dataset$Keyword %like% 'womens sandals ' |
             keyword_dataset$Keyword %like% 'ballet shoes' |
             keyword_dataset$Keyword %like% 'leather' |
             keyword_dataset$Keyword %like% 'slippers' |
             keyword_dataset$Keyword %like% 'wellies' |
             keyword_dataset$Keyword %like% 'flip flops' |
             keyword_dataset$Keyword %like% 'pumps' |
             keyword_dataset$Keyword %like% 'suede' |
             keyword_dataset$Keyword %like% 'womens boots' |
             keyword_dataset$Keyword %like% 'womens ankle boots' |
             keyword_dataset$Keyword %like% 'womens trainers' |
             keyword_dataset$Keyword %like% 'womens flats' |
             keyword_dataset$Keyword %like% 'womens heels & wedges' |
             keyword_dataset$Keyword %like% 'womens sandals' |
             keyword_dataset$Keyword %like% 'womens ballet shoes' |
             keyword_dataset$Keyword %like% 'womens leather' |
             keyword_dataset$Keyword %like% 'womens slippers' |
             keyword_dataset$Keyword %like% 'womens wellies' |
             keyword_dataset$Keyword %like% 'womens flip flops' |
             keyword_dataset$Keyword %like% 'womens pumps' |
             keyword_dataset$Keyword %like% 'womens suede' |
             keyword_dataset$Keyword %like% 'womens shoes' |
             keyword_dataset$Keyword %like% 'womens cleats')


female_footware$Group <- "Female Footware"

# Female Nightware
female_nightwear <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'womens joggers' |
             keyword_dataset$Keyword %like% 'womens pyjamas' |
             keyword_dataset$Keyword %like% 'womens pyjama tops' |
             keyword_dataset$Keyword %like% 'womens pyjama bottoms' |
             keyword_dataset$Keyword %like% 'womens pyjama sets' |
             keyword_dataset$Keyword %like% 'womens dressing gowns' |
             keyword_dataset$Keyword %like% 'womens nightdresses' |
             keyword_dataset$Keyword %like% 'womens slippers' |
             keyword_dataset$Keyword %like% 'womens character' |
             keyword_dataset$Keyword %like% 'womens short pyjamas' |
             keyword_dataset$Keyword %like% 'womens bride & hen' |
             keyword_dataset$Keyword %like% 'womens loungewear' |
             keyword_dataset$Keyword %like% 'womens womens joggers')

   
female_nightwear$Group <- "Female Footware"



# Female / Lingerie
female_lingerie <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'bras' |
             keyword_dataset$Keyword %like% 'knickers' |
             keyword_dataset$Keyword %like% 'lingerie sets' |
             keyword_dataset$Keyword %like% 'maternity lingerie' |
             keyword_dataset$Keyword %like% 'entice' |
             keyword_dataset$Keyword %like% 'nude lingerie' |
             keyword_dataset$Keyword %like% 'shapewear' |
             keyword_dataset$Keyword %like% 'bodysuits' |
             keyword_dataset$Keyword %like% 'womens bras' |
             keyword_dataset$Keyword %like% 'womens knickers' |
             keyword_dataset$Keyword %like% 'womens lingerie sets' |
             keyword_dataset$Keyword %like% 'womens maternity lingerie' |
             keyword_dataset$Keyword %like% 'womens entice' |
             keyword_dataset$Keyword %like% 'womens nude lingerie' |
             keyword_dataset$Keyword %like% 'womens shapewear' |
             keyword_dataset$Keyword %like% 'womens bodysuits')
   
female_lingerie$Group <- "Female Lingerie"


# Female / Maternity
female_maternity <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'maternity bottoms' |
             keyword_dataset$Keyword %like% 'maternity coats' |
             keyword_dataset$Keyword %like% 'maternity dresses' |
             keyword_dataset$Keyword %like% 'maternity jeans' |
             keyword_dataset$Keyword %like% 'maternity leggings' |
             keyword_dataset$Keyword %like% 'maternity lingerie' |
             keyword_dataset$Keyword %like% 'maternity multipacks' |
             keyword_dataset$Keyword %like% 'maternity nightwear' |
             keyword_dataset$Keyword %like% 'maternity swimwear' |
             keyword_dataset$Keyword %like% 'maternity tops' |
             keyword_dataset$Keyword %like% 'nursing clothes' |
             keyword_dataset$Keyword %like% 'mamalicious clothes' |
             keyword_dataset$Keyword %like% 'womens shorts' |
             keyword_dataset$Keyword %like% 'womens  skirts' |
             keyword_dataset$Keyword %like% 'womens multipacks' |
             keyword_dataset$Keyword %like% 'womenswear' |
             keyword_dataset$Keyword %like% 'mensware')

# Male / Footware
male_footwear <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'mens trainers' |
             keyword_dataset$Keyword %like% 'mens boots ' |
             keyword_dataset$Keyword %like% 'mens formal shoes' |
             keyword_dataset$Keyword %like% 'mens leather & suede' |
             keyword_dataset$Keyword %like% 'mens slippers' |
             keyword_dataset$Keyword %like% 'mens sneaker' |
             keyword_dataset$Keyword %like% 'sneaker' |
             keyword_dataset$Keyword %like% 'mens shoes' |
             keyword_dataset$Keyword %like% 'pumps' |
             keyword_dataset$Keyword %like% 'wellies' |
             keyword_dataset$Keyword %like% 'mens cleats')

      
male_footwear$Group <- "Male Footwear"


# Male / Nightware
male_nightwear <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 'mens joggers' |
             keyword_dataset$Keyword %like% 'mens pyjamas' |
             keyword_dataset$Keyword %like% 'mens pyjama tops' |
             keyword_dataset$Keyword %like% 'mens pyjama bottoms' |
             keyword_dataset$Keyword %like% 'mens pyjama sets' |
             keyword_dataset$Keyword %like% 'mens dressing gowns' |
             keyword_dataset$Keyword %like% 'mens nightdresses' |
             keyword_dataset$Keyword %like% 'mens slippers' |
             keyword_dataset$Keyword %like% 'mens character' |
             keyword_dataset$Keyword %like% 'mens short pyjamas' |
             keyword_dataset$Keyword %like% 'mens bride & hen' |
             keyword_dataset$Keyword %like% 'mens loungewear' |
             keyword_dataset$Keyword %like% 'mens joggers')
   

male_nightwear$Group <- "Male Nightwear"



# Male / Clothing
male_clothing <- keyword_dataset %>%
   filter(keyword_dataset$Keyword %like% 't-shirts' |
             keyword_dataset$Keyword %like% 'long sleeve' |
             keyword_dataset$Keyword %like% 'polo shirts' |
             keyword_dataset$Keyword %like% 'mens loungewear' |
             keyword_dataset$Keyword %like% 'mens t-shirts' |
             keyword_dataset$Keyword %like% 'mens shirts' |
             keyword_dataset$Keyword %like% 'underwear' |
             keyword_dataset$Keyword %like% 'socks' |
             keyword_dataset$Keyword %like% 'mens joggers' |
             keyword_dataset$Keyword %like% 'mens jeans' |
             keyword_dataset$Keyword %like% 'mens trousers' |
             keyword_dataset$Keyword %like% 'mens hoodies' |
             keyword_dataset$Keyword %like% 'mens swearshirts' |
             keyword_dataset$Keyword %like% 'mens outfits' |
             keyword_dataset$Keyword %like% 'mens jumpers' |
             keyword_dataset$Keyword %like% 'mens t-shirts' |
             keyword_dataset$Keyword %like% 'mens long sleeve' |
             keyword_dataset$Keyword %like% 'mens polo shirts' |
             keyword_dataset$Keyword %like% 'mens loungewear' |
             keyword_dataset$Keyword %like% 'mens t-shirts' |
             keyword_dataset$Keyword %like% 'mens shirts' |
             keyword_dataset$Keyword %like% 'mens underwear' |
             keyword_dataset$Keyword %like% 'mens socks' |
             keyword_dataset$Keyword %like% 'mens workwear' |
             keyword_dataset$Keyword %like% 'mens sportswear' |
             keyword_dataset$Keyword %like% 'mens coat' |
             male_clothing$Group <- "Male Clothing")


male_clothing$Group <- "Male Clothing"

Rbind, Export Final Dataset

The final step is to combine all the new dataframes you have created. You do this using the ‘rbind’ function. All you need to do is add the names of dataframes (separated by comma). Then, once combined, export the dataframe as a CSV.

combined_dataset <- rbind(female_clothing, general_clothing, general_brand, general_transaction, 
                     female_footware, female_nightwear, female_lingerie, female_maternity, 
                     female_clothing, male_footwear, male_nightwear, male_clothing)

Conclusion

As you can see, once you have decided upon the keyword categories and got a bulked out list to query, the process of filtering and grouping is relatively simple. I would recommend downloading the files attached if this is the first time using this concept. Beyond that, once you feel comfortable, test and see what works (i.e. generates more ‘hits’ with the corresponding database). It would be unjust not to mention Merge Words, which I would recommend using when you start building out your version. It is a tool that makes concatenating strings of code a much simpler affair and is free.

This article is a member of The Engine Room. This section of the blog comprises programming and other technically challenging criteria in one place.

Harry Austen is a Data & Search Analyst. He has worked with the likes of Disney, The Olympics and Zoopla. @austenharry  

This page may change periodically, as and when more information becomes available. Therefore, please remain patient if you are experiencing longer load times than normal. As this article is subject the above, please also be patient for immediate updates.  The Blog | Subscribe