e-ISSN:0976-5166
p-ISSN:2231-3850


INDIAN JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

Call for Papers

Aug 2019 - Volume 10, Issue 4
Deadline: 15 Jul 2019
Notification: 15 Aug 2019
Publication: 31 Aug 2019

Oct 2019 - Volume 10, Issue 5
Deadline: 15 Sep 2019
Notification: 15 Oct 2019
Publication: 30 Oct 2019

Indexed in

IJCSE Indexed in Scopus

ABSTRACT

Title : A Novel Architecture for Domain Specific Parallel Crawler
Authors : Nidhi Tyagi, Deepti Gupta
Keywords : WWW, URLs, crawling process, parallel crawlers.
Issue Date : June 2010
Abstract :
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so it has become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology of crawler workers that parallely download information from the web. This paper proposes a novel architecture of parallel crawler, which is based on domain specific crawling, makes crawling task more effective, scalable and load-sharing among the different crawlers which parallel download web pages related to different domains specific URLs.
Page(s) : 44-53
ISSN : 0976-5166
Source : Vol. 1, No.1