The latest Google update is here, and I wanted to present a few ideas to help you take advantage of it. BERT was trained on Wikipedia among others, using 2,500M words and now it’s here to help Google present better ‘question answering’ in the results.
Back after a long hiatus from writing anything SEO related, I’m back as this topic has really got me interested again in what Google is up to.
I’ll cover what can we, SEOs, do about this new BERT update and what we should be doing for our clients to future proof their rankings.
Just as a note, I’ve based these thoughts largely on Google’s Paper and github page, sprinkled by what I’ve seen ‘in the wild’ over the last few years. References at the end of the post.
What is BERT according to Google’s own page?
BERT is a method of pre-training language representations, meaning that we train a general-purpose “language understanding” model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP.
What is the BERT Google update for us, SEOs?
BERT Google update is a strong move towards more Natural Language Processing from Google, showing us, SEOs, that we can’t ignore the intent, context and semantics of any written content on our clients’ websites.
A few terms to understand first
Pre-training is a stage where BERT was essentially trained, Google taught BERT based on millions of sentences.
NLP = Natural Language Processing
OK, enough with the theory, what are the 5 actionables?
1 Google BERT is all about ‘question answering’ so give them questions and answers.
Add FAQs to your key pages, list questions and answers – mark them up using FAQ schema.
Here’s why Google cares: voice search and teaching BERT. Google cares most about ‘question answering’.
https://github.com/google-research/bert
Add in example of money supermarket page with FAQs.
https://www.moneysupermarket.com/breakdown-cover/
2 Optimising for entities and tokens, is your content ‘deep’ enough or just shallow?
Google is really going after sentence prediction and named entity recognition. The more entities you mention, the stronger ‘deep’ support will there be for your content.
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Here’s an example of depth of content your competitors (best rankings ones) may have in their copy:
Let’s take ‘blue widgets’ as an example: they mention buying them, installing, maintaining and cleaning a ‘blue widget’ 3 times per week, but with a special blue widget oil diluted in water 5% for that extra blue widget shine, ensuring that the key part of the ‘blue widget’ = the blue widgetator is tuned to 10 degrees south when operating blue widget.
If in your text, you just say ‘we sell the best ‘blue widgets’ and the oil to clean them. And then you talk for 300 words about how your blue widgets are cleaner and shinier than theirs, it’s not gonna work. Google will look at their copy, analyse it based on entities, concepts and context they contain, understand that their copy is way DEEPER into the topic, provides answers to most common questions that happen during ‘blue widget’ buying journey, and rank theirs instead of yours.
So the whole notion of ‘you can’t improve your SEO to work together with Google’s algorithms’ is not true, yes we can improve it, it just takes a whole different approach to presenting data and helping users through your content.
3 Ensure that all your targeting is in the main copy
Not just in headings, tables or lists, it MAY BE that Google treats the main text data closer (sentences and paragraphs) than list items or tables or headings.
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
BERT was ‘pre-trained’ largely using ‘body copy’ – avoiding bullet lists and tables. My inclination is that Google bot may do similar things on your website. Although Direct Answers use tables and lists very often, I think they may not be included as a ranking factor itself.
4 Write some of your content Wikipedia-like
As per above screenshot from Google’s paper, BERT was taught largely using Wikipedia. Wikipedia is full of:
- Internal links
- References
- Fact checking
- Deep factual presentation
- Headings
- Meaningful paragraphs delivering a point in each (not just writing for the sake of it).
These are the kinds of aspects that I can already see playing a large part in helping pages rank, that we can take advantage of.
I’m not saying that every single page has to look like a Wikipedia page, but looking at some of best ranking sites in multiple industries – I can see how ‘heavy’ and meaningful their copy is, and how many facts and terms they present with good interlinking. This is not ‘brand new information’ its’ just another Google’s move in rewarding that approach.
5 Sentences, paragraphs and next steps (next sentences)
In addition to optimising for keywords and intents, we should optimise the copy for sentences, context and ‘next step’ that a user may want to take.
Google has multiple times mentioned that BERT is all about understanding sentences better.
If your copy only talks in short bullets, and doesn’t include ‘a flow of thought’ – it may perform weaker.
The analysis also talks a lot about ‘next sentence prediction’.
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
I hope you enjoyed this read and that you can and will apply some of these ideas.
If you’ve any need for a SEO consultant (in London or anywhere else) drop me a message and let’s arrange a time to speak.
References:
Google’s paper: https://arxiv.org/pdf/1810.04805.pdf
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Github page for the project: https://github.com/google-research/bert
A great Medium post on BERT: https://medium.com/the-artificial-impostor/news-topic-similarity-measure-using-pretrained-bert-model-1dbfe6a66f1d
And part 2 from 2018 dissecting BERT: https://medium.com/dissecting-bert/dissecting-bert-part2-335ff2ed9c73