Definition :
Screen scraping is a technique in which a computer program extracts data from the HTML output of another program.
Further details :
Screen scraping is a lot easier in ASP.NET. The only thing you have to do is to how to retrieve HTML from webpages dynamically.But Using the .NET library it is easy to aqurire the HTML from the site.
Steps to Aquire the HTML: 1.First you need to create a WebResponse object and feed the ResponseStream into a instance of StreamReader. 2.From there you can remove the empty lines and assign the result to a StringBuilder using StringBuilder.Append method.
3.Finally convert the StringBuilder to a string and get the entire HTML.
But here the question is “Is there any use of this HTML output?” Ans : Ofcourse Yes,It would be useful that On certain occasions you need to create a web site and there you need to display some information that is included in the web page being scraped.
For example suppose you want to display this stock price for Microsoft in your site which is in the URL “http://finance.yahoo.com/q?s=msft“.
Following is the code snippet for this above example:
ASPX Code:
Last Trade: |
A Function to acquire HTML:
private string AquireHTML() { WebRequest oWebRequest; StringBuilder oStringBuilder; StreamReader oStreamReader; string strLine = string.Empty; string strHTML = string.Empty; string strURL = "http://finance.yahoo.com/q?s=msft"; // Open the requested URL oWebRequest = WebRequest.Create(strURL); // Get the stream from the returned web response oStreamReader = new StreamReader(oWebRequest.GetResponse().GetResponseStream()); // Get the stream from the returned web response oStringBuilder = new StringBuilder(); try { // Read the stream a line at a time and place each one into the stringbuilder while ((strLine = oStreamReader.ReadLine()) != null) { // Ignore blank lines if (strLine.Length > 0) oStringBuilder.Append(strLine); } // Cache the streamed site now so it can be used without reconnecting later strHTML = oStringBuilder.ToString(); } catch { } finally { // Finished with the stream so close it now oStreamReader.Close(); } return strHTML; } At Code-Behind page: protected void Page_Load(object sender, EventArgs e) { int intPos1, intPos2, intPos3; string strHTML = string.Empty; strHTML = AquireHTML(); if (strHTML != string.Empty) { intPos1 = strHTML.IndexOf("Last Trade:", 0); intPos2 = strHTML.IndexOf("", intPos1); intPos3 = strHTML.IndexOf("", intPos2); lblPrice.Text = strHTML.Substring(intPos2 + 3, intPos3 - intPos2 + 3); //lblPrice.Text = strHTML; } }
Reference:
http://www.codeproject.com/KB/aspnet/weather.aspx