e-ISSN:0976-5166
p-ISSN:2231-3850


INDIAN JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

Call for Papers

Feb 2020 - Volume 11, Issue 1
Deadline: 15 Jan 2020
Notification: 15 Feb 2020
Publication: 28 Feb 2020

Feb 2020 - Volume 11, Issue 1
Deadline: 15 Mar 2020
Notification: 15 Apr 2020
Publication: 29 Apr 2020

Indexed in

IJCSE Indexed in Scopus

ABSTRACT

Title : A Novel Architecture for Domain Specific Parallel Crawler
Authors : Nidhi Tyagi, Deepti Gupta
Keywords : WWW, URLs, crawling process, parallel crawlers.
Issue Date : June 2010
Abstract :
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so it has become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology of crawler workers that parallely download information from the web. This paper proposes a novel architecture of parallel crawler, which is based on domain specific crawling, makes crawling task more effective, scalable and load-sharing among the different crawlers which parallel download web pages related to different domains specific URLs.
Page(s) : 44-53
ISSN : 0976-5166
Source : Vol. 1, No.1