Screen Scraping in ASP.NET

Definition :
Screen scraping is a technique in which a computer program extracts data from the HTML output of another program.

Further details :
Screen scraping is a lot easier in ASP.NET. The only thing you have to do is to how to retrieve HTML from webpages dynamically.But Using the .NET library it is easy to aqurire the HTML from the site.

Steps to Aquire the HTML: 1.First you need to create a WebResponse object and feed the    ResponseStream into a instance of StreamReader. 2.From there you can remove the empty lines and assign the result to a StringBuilder using StringBuilder.Append method.

3.Finally convert the StringBuilder  to a string and get the entire HTML.

But here the question is “Is there any use of this HTML output?” Ans : Ofcourse Yes,It would be useful that On certain occasions you need to create a web site and there you need to display some information that is included in the web page being scraped.

For example suppose you want to display this stock price for Microsoft in your site which is in the URL “http://finance.yahoo.com/q?s=msft“.

Following is the code snippet for this above example:
ASPX Code:

Last Trade:

A Function to acquire HTML:

private string AquireHTML()
{
    WebRequest oWebRequest;
    StringBuilder oStringBuilder;
    StreamReader oStreamReader;
    string strLine = string.Empty;
    string strHTML = string.Empty;
    string strURL = "http://finance.yahoo.com/q?s=msft";
    
    // Open the requested URL
    oWebRequest = WebRequest.Create(strURL);
    // Get the stream from the returned web response
    oStreamReader = new StreamReader(oWebRequest.GetResponse().GetResponseStream());
    // Get the stream from the returned web response
    oStringBuilder = new StringBuilder();
    try
    {
        // Read the stream a line at a time and place each one into the stringbuilder        
        while ((strLine = oStreamReader.ReadLine()) != null)
        {
            // Ignore blank lines
            if (strLine.Length > 0)
                oStringBuilder.Append(strLine);
        }            
        // Cache the streamed site now so it can be used without reconnecting later        
        strHTML = oStringBuilder.ToString();
    }
    catch
    {
    }
    finally
    {
        // Finished with the stream so close it now
        oStreamReader.Close();
    }
    return strHTML;
}
At Code-Behind page:

protected void Page_Load(object sender, EventArgs e)
{                     
    int intPos1, intPos2, intPos3;
    string strHTML = string.Empty;
 
    strHTML = AquireHTML();
    if (strHTML != string.Empty)
  {
   intPos1 = strHTML.IndexOf("Last Trade:", 0);
   intPos2 = strHTML.IndexOf("", intPos1);
   intPos3 = strHTML.IndexOf("", intPos2);
   lblPrice.Text = strHTML.Substring(intPos2 + 3, intPos3 - intPos2 + 3);
        //lblPrice.Text = strHTML;
    }      
}

Reference:
http://www.codeproject.com/KB/aspnet/weather.aspx

150 150 Burnignorance | Where Minds Meet And Sparks Fly!