Automated Keyword Research (AKWR)¶

What is Keyword Research?¶

Firstly, keyword research is the process of finding and analysing search terms that people enter into search engines with the hope of using that data for a specific purpose, most often for SEO or general marketing. Conducting keyword research has many benefits such as marketing trend insight, traffic growth, and customer acquisition. As one can imagine it is fairly laborious and painstaking process to perform manual keyword research - remember this can be unique to a given webpage for a given domain. This is where AKWR sweeps in to relieve some of the manual labour which goes into keyword research.

What is automated in AWKR?¶

Automated keyword research is a data pipeline that extracts keywords from a variety of sources such as SEMRush, Google Search Console, and client webpages. The keywords go through a processing pipeline to remove redundant and low quality keywords. The keywords are then inserted into their respective kwr_ tables in the database. It is then up to the client to review the keywords and activate them. This pipeline is extensive and in some cases complex, this document aims to provide a high level overview of the pipeline and its components.

Data Pipeline¶

Within Cubed we have two commands which are responsible for running the AKWR command hierarchy, these are update_kwr_process and update_kwr_process_topic_specific. The former is responsible for extracting keywords with no topic specifics, while the latter will only generate keywords under a specific topic (i.e. football based keywords). The following diagram illustrates the command hierarchy. You will notice that some subcommands are coloured in orange, this is to indicate that they are run on the rsrv. This can be otherwise identified by the _fabric_ prefix.

For a given sub command we can expect whatever keyword data that is generated to be inserted into both seogd_keyword and seogd_market_keyword as inactive. Additionally, we can expect kwr_keyword and seogd_market_keyword_source to be populated where appropriate. The following diagram illustrates the relationship between these tables.

The table below offers a short summary for each sub command in the AKWR pipeline. For more information on a given sub command, please click on the sub command name.

Sub Command	Description
update_fabric_automated_kwr	Extracts keywords from a clients website, the script does this by utilising the data found inside both `attrib_page_content` and `attrib_page_element` tables. The script utilises BERT embeddings to extract keywords from the content and element data.
update_fabric_kwr_keyword_relevance	Determines the relevance of a keyword to a webpage. The script utilises term frequency-inverse document frequency (TF-IDF) to determine the relevance of a keyword to a webpage.
update_kwr_semrush	Utilises the SEMRush API to get keyword data such as search volumes, cost per click, competition level and trends. The script then inserts the data into `kwr_semrush_volume` and `kwr_semrush_trends`.
update_related_and_rising_keywords_gtrends	Scrapes Google Trends data to get related and rising keywords. The script then inserts the data into `seogd_market_keyword_source`, `seogd_gtrends_related_rising` and `seogd_gtrends_related_top`.
update_kwr_filter	A stored procedure which applies some final filters to the keywords found in kwr_semrush_volume. This data is then inserted into `kwr_keyword`.

There are other commands which are not part of the AKWR pipeline but are still related to keyword research. These commands are listed below. For more information on a given command, please click on the command name.

Command	Description
update_kwr_mapped_market	Responsible for mapping each path inside `attrib_path` to a market. This is done by us or the client creating regular expressions which match a given path to a market. The script will then insert the mapping into `kwr_mapped_market`.