e-ISSN:0976-5166
p-ISSN:2231-3850


INDIAN JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

Call for Papers 2024

Feb 2024 - Volume 15, Issue 1
Deadline: 15 Jan 2024
Publication: 20 Feb 2024

Apr 2024 - Volume 15, Issue 2
Deadline: 15 Mar 2024
Publication: 20 Apr 2024

More

 

ABSTRACT

Title : A Novel Architecture for Domain Specific Parallel Crawler
Authors : Nidhi Tyagi, Deepti Gupta
Keywords : WWW, URLs, crawling process, parallel crawlers.
Issue Date : June 2010
Abstract :
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so it has become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology of crawler workers that parallely download information from the web. This paper proposes a novel architecture of parallel crawler, which is based on domain specific crawling, makes crawling task more effective, scalable and load-sharing among the different crawlers which parallel download web pages related to different domains specific URLs.
Page(s) : 44-53
ISSN : 0976-5166
Source : Vol. 1, No.1