Chitika

Wednesday, April 25, 2012

Co-occurrence, Predictive Phrases and LSI

Co-occurrence is the percentage of websites that contain both the main theme keyword (or keyphrase) and a secondary keyword (synonym) as well.

Does this sound a little complicated?
Well it can be if you want to dig through mountains of technical jargon, Google patent applications and geeky mathematical calculations.
Here is a simple explanation:
Theme of Page - The main keyword or theme of any given web page in your website
Keyword (synonym to the theme of page keyword) - Using the LSI concept not only would you use the main theme keyword in your pages but you would also include synonyms of that keyword in your content.
Co-occurrence is the percentage (%) of web pages that contain both the Theme of the page (keyword) AND the keyword (sysnonym).
For example...
Let's say you have a page whose main keyword (theme) is "cats". Also you have included a synonym for this keyword in your content to establish "theme density" to take advantage of the LSI theory. The synonym you choose is "kittens".
Here is a snippet of what the content of your page may look like:
Cats
Cats are wonderful pets that have been co-existing with humans for thousands of years. Cats were first domesticated in ancient Egypt where they were raised from kittens to co-exist with their owners...
As you can see in the sample above, the main theme of the page is "cats" and in the body of the content a synonym "kittens" is added to give more theme density to the article.

Co-occurrence Explained

Watch this video: co-occurrence
In order to "roughly" figure out the co-occurrence you can use Google.
  1. Find the number of competing pages for the word "cats"
  2. Find the number of competing pages for the word "kittens"
  3. Find the number of competing pages that contain the word "cats" and "kittens"
  4. Divide the number of pages that contain the word cats and kittens by the number of pages that come up for the word "cats"
Pages with "cats" & "kittens" DIVIDED BY pages with "cats" = co-occurrence
At the time of this writing I did a search on Google to reveal:
"cats" has 102,000,000 pages
"kittens" has 12,200,000 pages
"cats" and "kittens" has 1,860,000 pages
I then divide 1,860,000 (number of pages that contain cats and kittens in the content) by 102,000,000 (number of pages that have the word cats). This gives me a co-occurrence percentage of 12%.
This means that of all the pages indexed by Google 12% of the pages that are themed for the word "cats" also have the word "kittens" within their content.
The higher the co-occurrence... The better and more relevant you secondary keywords (synonyms) will be.
Watch this video: Figuring Co-occurrence
To dig a bit deeper into the theory of co-occurrence and the Google Patent Applications behind them please visit: Phrase Based Information Retrieval

Carefully choose the theme synonyms for your content

Be sure that the co-occurrence is as high as possible and avoid using the same core keyword over and over again as this may tilt Google's spam flag.
Remember... This is all theory but the theory is based on research of Google's patent applications. So in order to protect yourself against future or present algorithm changes it is best to adhere to this theory.
This theory makes your content more relevant which has been proven to produce higher rankings so what do you have to lose?
The only reason we call all of this "theory" is because Google has not publicly stated that these algorithms are in effect. However, careful testing reveals that it is in action and the theory stands up with the real-world results.

Predictive Phrases

Phrases (or keywords) that have a co-occurrence of other words (synonyms) may be a "predictive phrase". A predictive phrase is a phrase (or keyword) that "predicts" the occurrence of other words or phrases.
As in the "cats" example above, the word "cats" may predict that the word "kittens" will also appear 1.5% of the time on pages that are themed for the word "cats".
William Slawski from SEO By The Sea Says:
 
An example of the predictive ability of good phrases:The phrase “President of the United States” predicts other phrases such as “George Bush” and “Bill Clinton.”
Other phrases may not be predictive, such as “fell down the stairs” or “top of the morning,” “out of the blue.” Idioms and colloquisms like these are widely used, and often appear with many other different and unrelated phrases. Looking at how frequently phrases co-occur on individual pages, within the whole collection of indexed pages, can tell us whether or not the appearance of one phrase might be used to predict the appearance of another.
 
I will not go into any more discussion about this publicly. We will be teaching this within a structured environment at University 20/20.

Theme Density

As you may have already learned in The Master Plan, the game of high rankings is no longer about Keyword Density (the number of times a specific keyword appears on a page to make the page "relevant" to that keyword). It IS aboutTheme Density which is the amount of synonyms related to the "core theme" that appear on a page.

No comments:

Post a Comment

Clicksor

Related Results

Related Posts Plugin for WordPress, Blogger...

Comment Here