| Call for Papers 2025 |
|
Feb 2024 - Volume 16, Issue 1
Deadline: 15 Jan 2025
Publication: 20 Feb 2025
Dec 2024 - Volume 16, Issue 2
Deadline: 15 Mar 2024
Publication: 20 Apr 2024
More
|
|
|
ABSTRACT
| Title |
: |
A Novel Architecture for Domain Specific Parallel Crawler |
| Authors |
: |
Nidhi Tyagi, Deepti Gupta |
| Keywords |
: |
WWW, URLs, crawling process, parallel crawlers. |
| Issue Date |
: |
June 2010 |
| Abstract |
: |
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so it has become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology of crawler workers that parallely download information from the web. This paper proposes a novel architecture of parallel crawler, which is based on domain specific crawling, makes crawling task more effective, scalable and load-sharing among the different crawlers which parallel download web pages related to different domains specific URLs. |
| Page(s) |
: |
44-53 |
| ISSN |
: |
0976-5166 |
| Source |
: |
Vol. 1, No.1 |
|