How to Improve Your Google Ranking: Myths and Reality

Download Report

Transcript How to Improve Your Google Ranking: Myths and Reality

How to Improve Your Google Ranking: Myths and Reality Ao-Jan Su

Y. Charlie Hu

Aleksandar Kuzmanovic

Cheng-Kok Koh

‡ †

Northwestern University

Purdue University

Motivation

● Internet search engines (e.g. Google) drive users to highly ranked pages ● Search engines ranking results greatly influence how people acquire knowledge from the Internet [Pan ‘07] ● It is desirable to understand how a search engine ranks web pages ● Search engines’ ranking algorithms are proprietary ■ Publicly available information is very limited and out dated

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

2

Current Approaches

● Guess-works by webmasters ■ ■ Trial and error Inefficient ● Based on experience of search engine optimization (SEO) experts

Ao-Jan Su

Lack of systematical studies leads to folklores

How to Improve Your Google Ranking: Myths and Reality

3

Various Ranking Feature Opinions

SEO experts Survey of marketing expert Internet users

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

4

Goals & Challenges

● Goals ■ ■ Systematically approximate a search engine’s ranking results Identify the importance of ranking factors ● Reverse-engineering a search engines’ ranking algorithms can be very complicated ■ ■ Numerous ranking factors

Google claims to have over 200 ranking factors Sophisticated ranking functions

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

5

Our Approach

● Build our own ranking system to approximate search engines’ ranking results • •

Learning models:

Linear programming SVM

Recursive partitioning

algorithm:

Capture non-equational behavior of ranking functions.

Ao-Jan Su New ranking system:

Generate our own ranking results and compare to Google’s

How to Improve Your Google Ranking: Myths and Reality

6

System Architecture

● Components of our ranking system ■ ■ Crawler Ranking Engine Can we approximate Google’s ranking results (top 10 pages) by using our own ranking system?

How to Improve Your Google Ranking: Myths and Reality

7

Ao-Jan Su

Ranking Features

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

8

Learning Models

● ● Linear programming model ■ ■ Minimize the distance between our ranking system and Google’s Minimize objective function Ranking difference between the 2 pages Out of order => penalty  (

W

) 

i n

  1

c i j

n

i

 1

i

j

D

(

i

,

j

) Weight: highly ranked pages ■ ■ General technique for learning to rank programs Support linear and polynomial kernels 9

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

Recursive Partitioning Algorithm

● Multiple layers of indices ● Non-equational ranking algorithm Train or apply ranking models and continue the recursion The algorithm ends when we found top X pages

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

10

Experimental Evaluation

● Evaluate different ranking models ■ Which model has better prediction accuracy?

● Evaluate the effectiveness of recursive partitioning algorithm ■ Can recursive partitioning algorithm improve prediction accuracy?

● Evaluate the relative weights of ranking features ■ Which ranking feature is more important?

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

11

Experimental Setup

● Crawl top 100 pages of 60 random keywords ● Randomly select 15 keywords as the training set with the rest 45 keywords as the testing set ● Evaluate the accuracy of our ranking system by predicting Google’s top 10 pages for each keyword in the testing set

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

12

Comparisons of Ranking Models

The performance of our customized linear learning is better than SVM-linear model The performance of the polynomial model is better than both linear models.

At the cost of:

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

13

The Power of Recursive Partitioning

The recursive partitioning algorithm does help to improve accuracy of the ranking system in every round 3 rounds of recursive partitioning successfully “smooth out” the non-linearity of Google ranking algorithm and achieve a high prediction accuracy

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

14

Weights in Different Rounds in a Linear Model

Page rank score, keyword in title and hostname are the top 3 ranking feature Keyword in meta-description tag matters but in meta keyword tag does not In different rounds, the learning model produces different set of weights

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

15

Case Studies

● Can we improve our ranking system’s accuracy by isolating a subset of ranking features ■ Example: remove the age factor by focusing on “young” pages ● Can we use our ranking system to detect biases in search engines’ ranking algorithms?

■ Example: blogs ● Can we validate or disapprove new ranking features?

■ Example: HTML syntax errors

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

16

Isolating Subsets of Ranking Features

We crawl web pages less or equal to 24 hours old to

remove

ranking features of specific, our ranking system performs better improves to 80% for 92% of evaluated keywords

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

17

Negative Bias Toward Blogs

We categorized web pages to different categories (e.g. blogs,

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

18

HTML Syntax Errors do not Matter

We add a new ranking feature (hypothesis) for the ranking feature does not make an impact

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

19

Conclusions

● In this work, we show that it is possible to systematically approximate Google’s ranking results with high accuracy ■ By a linear learning model incorporated with a recursive partitioning scheme ● We reveal the relative importance of ranking features in Google’s ranking function ● We illustrate our system can validate or disapprove ranking features and detect ranking bias

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

20

Thank you!

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

21

Backup Slides

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

22

Linear Programming Model

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality

Query Keywords

Ao-Jan Su How to Improve Your Google Ranking: Myths and Reality