e-ISSN:0976-5166
p-ISSN:2231-3850


INDIAN JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

Call for Papers 2025

Feb 2024 - Volume 16, Issue 1
Deadline: 15 Jan 2025
Publication: 20 Feb 2025

Dec 2024 - Volume 16, Issue 2
Deadline: 15 Mar 2024
Publication: 20 Apr 2024

More

 

ABSTRACT

Title : ENRICHED BIG DATA PRE-PROCESSING MODEL WITH MACHINE LEARNING APPROACH TO INVESTIGATE WEB USER USAGE BEHAVIOURY
Authors : N. Silpa, Dr. V V R Maheswara Rao
Keywords : Web Analytics, Weblog Pre-processing, Machine Learning, Search Engine Access, Apache Spark, Big Data.
Issue Date : Sep-Oct 2021
Abstract :
In the present, the web has become the environment to live, learn, entertain, and socialize individually or as a group through digital platforms where users with high aspirations. As a result, investigating the web user behaviour is most active research even in the present and demands re-innovation in potential analytics to provide reliable and quality customized solutions. To perform this, the weblog is the primary source and poses tremendous challenges for the web researchers with complex sequence of processing steps and abundant information of weblog. Further, limited distributed storage models, partial parallel computing techniques, typical identification of appropriate attributes in the weblog analysis demands the high competitive performance models for effective characterization of web users. The importance of pre-processing in the entire process of weblog analysis is so critical while it is popular among researchers, nonetheless, the studies are limited. In addition, existing pre-processing studies focus on elicitation, reduction and transformation of web user usage data individually not comprehensively.
Towards this, the present paper proposes Enriched Pre-processing Model (EPPM) that comprehensively concentrating on all the stages of pre-processing of weblog data in the framework of apache spark. The EPPM enables the capability of processing real time streaming data along with batch data as to sustain the validity of web user behaviour extracted from historical data also requires the strategy of processing real time streaming data. In addition to all pre-processing steps, EPPM integrates a machine learning approach to discard the search engine accessed logs from weblog as they are excessive in noticing the web user behaviour. The performance of EPPM is validated by conducting a series of experiments on a server side weblog data in a standard execution environment. The experimental results are also included.
Page(s) : 1248-1256
e-ISSN : 0976-5166
Source : Vol. 12, No.5
PDF : Download
DOI : 10.21817/indjcse/2021/v12i5/211205050