asp.net - c#: crawler project -


could easy follow code examples on following:

  1. use browser control launch request target website.
  2. capture response target website.
  3. convert response dom object.
  4. iterate through dom object , capture things "firstname" "lastname" etc if part of response.

thanks

here code uses webrequest object retrieve data , captures response stream.

    public static stream getexternaldata( string url, string postdata, int timeout )     {         servicepointmanager.servercertificatevalidationcallback += delegate( object sender,                                                                                 x509certificate certificate,                                                                                 x509chain chain,                                                                                 sslpolicyerrors sslpolicyerrors )         {             // if trust callee implicitly, return true...otherwise, perform validation logic             return [bool];         };          webrequest request = null;         httpwebresponse response = null;          try         {             request = webrequest.create( url );             request.timeout = timeout; // force quick timeout              if( postdata != null )             {                 request.method = "post";                 request.contenttype = "application/x-www-form-urlencoded";                 request.contentlength = postdata.length;                  using( streamwriter requeststream = new streamwriter( request.getrequeststream(), system.text.encoding.ascii ) )                 {                     requeststream.write( postdata );                     requeststream.close();                 }             }              response = (httpwebresponse)request.getresponse();         }         catch( webexception ex )         {             log.logexception( ex );         }                 {             request = null;         }          if( response == null || response.statuscode != httpstatuscode.ok )         {             if( response != null )             {                 response.close();                 response = null;             }              return null;         }          return response.getresponsestream();     } 

for managing response, have custom xhtml parser use, thousands of lines of code. there several publicly available parsers (see darin's comment).

edit: per op's question, headers can added request emulate user agent. example:

request = (httpwebrequest)webrequest.create( url );                 request.accept = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, */*";                 request.timeout = timeout;                 request.headers.add( "cookie", cookies );                  //                 // manifest standard user agent                 request.useragent = "mozilla/5.0 (windows; u; windows nt 6.1; en-us)"; 

Comments

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

jquery - appear modal windows bottom -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -