For session 2 I had selected a session on Advanced Search Engine Optimization. It sounded like the session was right up my alley. My hopes were high.
Search Engine – Crawling, Ranking, Finding
Search Engines do three basic things – crawl, rank, and find.
- Crawling – search engines start with sitemap.xml and robots.txt files, and follow links from there
- Ranking – Each page is ranked according to certain criteria – inbound links (basically an endorsement of other sites – either high quality, low quality, or links with penalty); outbound links; note that subdomains are treated very differently than subdirectories.
- Searching – simple process – check spelling, determine intent, fulfillment of search request with results, determine results order
Building Pages
- Use HTML Semantically !!
- H1 (SEO good) vs spans & styles (SEO bad)
- A – used for ranking, test in link is important, use something descriptive
- H1 – only 1 per page, most important page topic
- Title – critical for determining keywords, relevance, and page content
- Meta tags – description is what the user will see… without one, the crawler will have to guess at the page’s description based on content
- JavaScript & CSS – don’t use JavaScript navigation, host css externally
Rich Internet Applications (RIAs)
- These tend to look like black boxes to the search engines
- Noscript tags are your friend !!
- Validate your HTML – it makes the job of the crawler easier
- 3 types of pages, based on your SEO goals
- Monolithic – like mail.live.com – don’t want it to be searched
- Linkable – bmw.com – each car is a separate page, with a rich experience on that page
- Crawlable – lots of content, all HTML based
HTTP Status Codes
- 200 OK – Page returned just fine without any errors
- 404 Page Not Found – good for customers, bad for search engines – never removes pages from search engines
- 301 Permanent Redirect – instead of throwing the 404, use this for a moved domain, etc.
- 302 Moved Temporarily – confusing to users and to search engines, don’t use this.
- 304 Not Modified – conditional get, only if search engine has latest version,
- 503 Down for Maintenance – great to use if your server will be down temporarily, the crawler knows to come back
- For more, visit W3C for standards for http status codes
Site Evaluation – Mix Site
- Do a search for mix08 – the visitmix.com site is not the top result
- Across the search engines, you get no title, and no description
- Blogs are beating out mix site !
- The site has a JavaScript redirect – BAD!
- 16,000 inbound links to the index.html page, no server side redirect
- 5700 inbound links to the default.aspx page – a lot less
- http://www.visitmix.com vs visitmix.com vs visitmix.com/default.aspx – all 3 are different to the crawlers, choose one option and stick with it, use 301 redirects for other two
Other Notes
- URL Rewriting in asp.net – there is a whitepaper by Scott Guthrie
- Soft 404s (a page that gives content or a redirect to the homepage) is great for users, but bad for search engines. There is a workaround for soft 404
- Case Matters, particularly for apache, mono, etc.
- There is another whitepaper called How to optimize Silverlight for search – read it!!
- A good idea to make XAML understandable to crawlers is to create XSLT to reflect XAML
- There is a Tools Review slide – we are using most already
- Gatineau – AdSense Analytics – a competitor to Google Analytics (maybe this plays nicer with .Net and Silverlight?)
- Canonicalization of URLs – include www or not, and force a redirect server side to stay consistent
- Cloaking is bad
- Sub-domains do not carry pagerank juice like a directory or subpage will
- Underscores are bad (one word vs multiple, usability), dashes are better
Conclusion
I had high expectations for this session, since its title promised Advanced topics in SEO. Most of the session was review. Some of the more interesting things I walked away with were the brief tips on Silverlight, and on the importance of canonicalization of your site.