While stumbling across the InterWeb, I came across this really neat page. The page, by Jonas John, describes htmlSQL, a PHP class to query the web by an SQL like language. I know, I know, this is PHP, not C#. But the idea really interested me. I have spent a lot of time focusing on WatiN, which does a similar thing. It allows you to take control of objects on the page and fill in forms, click buttons, etc. Getting an element in WatiN requires you to search the page using the Find methods. One of the troubles I have had is making sure that your find returns one, and only one, page element. Being able to query the page with htmlSQL would be a great way to verify that. A class like this would make it very easy to leverage the page DOM on the fly.
Do you see any other uses for a class like this? Are there other products like this? What are your thoughts about htmlSQL?
Similar software products exist but not as a language library that I know of as yet. See this link for "Web QL" by Caesius softwarehttp://www.reviewsonline.com/articles/976222123.htmMore on web data extraction process:http://zillman.blogspot.com/2004/09/web-data-extractors.html