Call for Papers 2022 |
Feb 2023 - Volume 14, Issue 1
Deadline: 15 Jan 2023
Publication: 20 Feb 2023
Apr 2023 - Volume 14, Issue 2
Deadline: 15 Mar 2023
Publication: 20 Apr 2023
More
|
Indexed in
|
|
ABSTRACT
Title |
: |
A Novel Architecture for Domain Specific Parallel Crawler |
Authors |
: |
Nidhi Tyagi, Deepti Gupta |
Keywords |
: |
WWW, URLs, crawling process, parallel crawlers. |
Issue Date |
: |
June 2010 |
Abstract |
: |
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so it has become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology of crawler workers that parallely download information from the web. This paper proposes a novel architecture of parallel crawler, which is based on domain specific crawling, makes crawling task more effective, scalable and load-sharing among the different crawlers which parallel download web pages related to different domains specific URLs. |
Page(s) |
: |
44-53 |
ISSN |
: |
0976-5166 |
Source |
: |
Vol. 1, No.1 |
|