Abstract—In today’s fast developing Internet technologies era, e-commerce is taking more and more market share from physical stores. The user is overwhelmed by the huge amount of choices that are provided for searching item. With these developments the information overloading problem has been at its peak. This is when Recommendation System comes in handy, which is the system that produces individualized recommendation by searching through large volume of dynamically generated information to provide users with personalized possible options based on certain reference characteristics. But data becomes too big to handle and process effectively by traditional approaches. So, to attenuate the impact of dense data, a clustering based collaborative filtering algorithm using user interest information is proposed in this paper. This method aims at recruiting similar data in same cluster to recommend an item & personalized services for users collaboratively.Keywords— E-commerce,Recommendation system, user interest, clustering, collaborative filtering I. Introduction Big data has come into view as a widely recognized trend, attracting courtesy from government, multinational companies, banking sectors, academia and etc. Generally speaking, big data is the act of collecting and storing large volume, complex, enormously growing information for eventual analysis with multiple, autonomous sources whose global level is almost inconceivable. So, where this Big Data comes from? These Big Data comes from its various applications like social networking sites, E-commerce sites, Telecom companies, health and life sciences, weather stations, share market, and etc. Due to all the applications of Big data where data collection has grown enormously/ tremendously and it is exceeding/beyond the ability of traditional data processing application software tools to capture, store, analyze, manage and process the data, that too in “tolerable elapsed time”.With the development of web technology and advancement in the area of E-commerce most people go for Online shopping rather than the window (retail) shopping. As the competition between businesses has become increasingly strong, customers are facing information overloading problem. It occurs as users are provided with a huge amount of product options available to choose from, most of which may not be relevant to what they are searching for and that makes users overwhelmed and indecisive.Hence, Recommendation Systems (RSs) are techniques and an intelligent application that tries to predict items out of a large pool a user may be interested in and recommends the best option to the target user. So, to explore the large volume of data and mine useful information or knowledge for further actions, according to user interest, preferences and behavior that are being captured from his/her previous purchase history, which then are stored and use the same for personalized recommendation in the future.Recommendation techniques are applied to almost all large E-commerce platforms like, Amazon, eBay, etc., which are supposed to personalize services for customers by reducing transaction costs of searching and choosing items of interest in an Online shopping environment and are also beneficial to service providers. To the service providers they have increased revenue for the fact that advertisements are the effective means of selling more products by directly reaching customer of interest.Various methods are applied in RSs which can take either of two basic approaches: Collaborative Filtering (CF) or Content Based Filtering. Collaborative Filtering is the most dominant technique used in RSs that do not need any external information about either the user or the item. CF technique basically assumes that if two users have similar behavior, i.e. watching, buying, or have similarity in rating n items, and hence will act or rate on other items similarly. So, we can say CF when appeared at recommendation is based on a model of user’s prior behavior. The model can be created simply from a single user behavior or also from the behavior of other users having similar traits. Here, when other user’s behavior is taken into account, CF uses group knowledge to form a recommendation based on similar users.Collaborative Filtering are required/ expected to have the ability to deal with highly sparse data, to scale with increasing number of items and users, to deal with the problems like similar items having different names, noisy data, privacy protection, shilling attacks and to make adequate recommendations in a short time period, which may reduce recommendation accuracy by keeping it far behind the expectation of customers as well as businesses. To solve those problems by reducing their impact, improving the scalability and accuracy in some extent a number of CF algorithms such as user-based, item-based, content based, model based and so on, are proposed.The basic assumption behind the item based algorithm is that it recommends user the item that is similar to what she/he has preferred before. Alternate to item-based User-based algorithm assumes that people who tend to agree in the past may agree again in the future. II. LITERATURE SURVEY Nowadays Service relevant data has grown tremendously, become complex, and it is beyond the ability of traditional approaches to effectively capture, manage, and process that Big data, so one solution to this challenge is Clustering Based Collaborative Filtering (CLUBCF) clubcf…. CLUBCF recruits similar services in the same cluster to recommend services collaboratively and contains two modules. First step is Clustering, in this services are clustered depend on similarity in Description, Functionality which are Computed by using Jaccard similarity coefficient (JSC), & Characteristic similarity between two services is computed by using weighted sum of Description Similarity and Functionality Similarity then Agglomerative Hierarchical Clustering (AHC) Algorithm is used for clustering. Second step is of Collaborative Filtering, in this, initially rating similarity is computed by using Pearson correlation coefficient (PCC) & then predicted rating is given to the clustered services. Lastly according to the predicted ratings all recommended services are ranked in descending order.Collaborative filtering (CF) such as user and item based methods are popular technique to retrieve the services from overwhelming services but it consumes lot of time. Clustering techniques is the solution to decrease the data size of service. Bottom-up hierarchical Clustering based collaborative filtering approach is used in An innovative…. In this approach, similar services are grouped into clusters by their functionalities and classifications, and recommendations are made based on the similar services under same cluster where K-means clustering algorithm issued for recommendation. It is one of the partition based clustering algorithm. It was applied to partition of services based on user’s preference. Even though, K-means is one of the partition based clustering algorithm but for clustering process it requires additional information from users.Collaborative filtering (CF) is widely used in many domains with numerous algorithms for personalized recommendation. Despite of many advantages CF suffers from some issues like data sparsity, scalability, cold start problem, shilling attacks and so on. To improve the accuracy many of the researchers have proposed some new measures of similarity. For example, H. J. Ahn(2008) proposed a new similarity of CF that is called Proximity-Impact-Popularity (PIP) that combined item content information and popularity with user-behavior data to get good results. In CF to reduce the impact of data sparsity number of methods are put forward by researchers. Existing research on data sparsity problem can be largely divided as i) Model- based CF and hybrid recommender algorithm, ii) improved user similarity calculated methods in memory-based CF, which includes item-based and user based CF algorithm. An item-based CF algorithm was proposed by Sarwar B. & et al.(2001) which improve the insufficient of traditional similarity measure method when user rating is extremely sparse. Ma. & et al(2007) proposed a CF algorithm based on the nearest neighbor set of target users and items with adjustment parameters to control the weight of the two parts to generate recommendation results.iii) The other stream of research is by analyzing the characteristics of users and items thoroughly to improve the performance of CF algorithm.Extra information such as user location, user activity and user interest 14 is used by researchers to enhance the performance of CF. liu. Q & et. Al (2012) expands user interest based on the relationship between items to calculate user similarity 16. User-interest was integrated with time to calculate user- similarity 17 by Tsang-Hsiang cheng & et. al. (2011). While, Yehuda Koren (2009) proposed a CF algorithm based on dynamic recording within the nearest neighbor set where weight of users in neighbor set are adjusted dynamically according to different target items based on user activity15. These studies improve scalability and accuracy by reducing the impact of data sparseness of CF algorithm in some extent.An improved user-based clustering CF algorithm combining with user interest information is proposed in an improved…… which improved basically by two ways: improving user similarity calculating method and extending user-item rating matrix. To solve cold-start problem in CF algorithm 19-20 for calculating user similarity some researchers extend the user-item rating matrix by using user’ attributes and item content information.Because of these methods the calculation methods of similarity in the algorithm are improved. The disadvantages of Pearson correlation coefficient 10 and cosine similarity 11 are analyzed by Ahn 14 which considers 3 aspects as: impact, proximity and popularity of the user ratings. But, this similarity does not consider the global preference of user ratings and considers only the local information of the ratings. Weighted Pearson correlation coefficient has been proposed16 to solve the problem of traditional Pearson correlation coefficient that it does not consider the size of the set of common users.A new metric proposed by Bobadilla et al. 18 which combined mean squared difference 6 and Jaccard measures17. This approach assumed that the Jaccard mean measure could complement each other. a) Existing System Amazon uses Item Clustering Collaborative Filtering technique for recommending books and all other products. LinkedIn, Facebook, MySpace uses collaborative Filtering technique to make friend suggestions, groups and other social connections by observing the network of connections between a user and people present in their connections. Recommender System in Twitter is used for suggesting whom to follow which makes use of several signals and in-memory calculations.YouTube also uses Item Clustering CF technique for recommendation. Netflix and YouTube are hybrid systems as they recommends by comparing the habits of searching and watching of similar user i.e. Collaborative Filtering combined with Content based Filtering i.e. by offering movies that a user has rated highly and that share similar characteristics.Last.fm and Reddit uses User-based Collaborative Filtering technique as suggestions are made by considering user choice. In Last.fm, a station is created in which songs are suggested based on what other people did with that song, what people you follow, who is similar to you, and what music they listen to/ rate/like. But for making reliable suggestions Last.fm needs large amount of data related to the user, this makes it suffer cold start problem.Pandora uses Content based approach. Pandora uses the qualities and the features of a song or artist for tuning into a station playing similar featured music. It uses the feedback given by users for considering the feature like liking a particular song Feedback given by the users is used to tune the station instead of considering some features as like and dislikes. Pandora can get started with little information and has limited scope. b) Methodologyi) User/ Item Based Collaborative filtering Collaborative filtering (CF) also referred to as social filtering is a technique used by recommender systems which filters information by using the recommendation of people.CF deals with the problem of effectively extraction of useful information from vast available data, leads to the concept of CF. It is a techniques used in Recommender System (RS) which filters information by using the recommendation of people having same taste. Collaborative filtering such as User-based and Item-based methods are mostly implemented techniques in RSs. The basic idea behind User-Based is that people who agreed in the past are likely to agree in future in their evaluation of certain items. Suppose if we want to predict how user U will rate item I, we can check how other users who are similar to user U (i.e. like minded users) have rated that item. It is possible that the user will rate items similarly to users with similar taste than that of randomly chosen user from the crowd. For ex. a CF Recommender System for music taste could make predictions about which type of music a user should like, given a partial list of that user’s likes or dislikes. Item-based CF algorithm recommends a user the items similar to that of she/he has preferred before. Suppose if we want to predict how user U will buy/rate item I, we can check how he has bought/rated other items, which are similar to item I. It is possible that the user will buy/rate similar items similarly. For ex. a user who bought an item X also tends to buy item Y.Although many e-commerce RSs has successfully implemented traditional CF techniques, they have to face some challenges for Big data like i) to give an ideal recommendation from large number of items and ii) to give an ideal decision in acceptable time. In traditional CF algorithm, to calculate similarity between every pair of users or items may take too much time, many times it may go beyond the processing capability of current recommendation systems. Therefore, items recommendation based on user similarity or item similarity may not be done completely or may exceed time. Also when considered ratings in traditional CF algorithm all items are considered when computing rating similarity while most of them are different to the target item. The accuracy of predicted rating is affected due to these dissimilar ones. So as to decrease the number of items that need to be processed in real time Clustering can be used. Clustering is a technique which reduces the data size by making group of like-minded users or similar items. This gives the result as less number of items/users in a cluster is much less than the total number of items/users. Thus the computation time of CF algorithm can be reduced and accuracy may be enhanced. Agglomerative Hierarchical clustering algorithm (AHC) is applied for clustering and for optimization cuckoo search optimization algorithm is implemented. The cuckoo search when compared with other algorithms such as particle swarm optimization and genetic algorithms has shown best performance. A novel recommender system is proposed for recommending best suitable item with AHC algorithm & cuckoo search optimization. For stemming that is for getting a common root format of word transformed from variant word forms, various kinds of stemming algorithms, such as Lovins stemmer, Daw-son Stemmer, Paice/Husk Stemmer, and Porter Stemmer, have been proposed 13. One of the most widely used stemming algorithms among them is Porter Stemmer. It do not require the use of a lexicon 14 and applies cascaded rewrite rules that can be run very quickly.iii) MS Chat BOT for User responseA chatbot is a computer program that carries out a conversation by textual or auditory methods. Simply it can a basic pattern matching with a response or highly developed Artificial Intelligence techniques with complex conversational state tracking and integration to existing business services. Today Chatbots are accessed by many websites, applications and different platforms. In order to study we are going to use Microsoft Chatbot framework which enables us to build bots that supports different types of interaction with users. We can design conversation in bot to be freeform. The conversation can use simple text string or more complex rich cards that contains images, text and action buttons.We can let users interact with our bot by adding natural language interactions. iii) Natural Language Understanding Response While interacting with computers, humans face a problem regarding the ability of the computer to understand what a person wants. So, developers are building smart applications with the help of Natural Language Understanding Response (LUIS) which enables applications to understand human language and react accordingly.
Any client application having conversation with user input like Chatbot or any other dialog system can pass user input to a LUIS app and receives result that provides natural language understanding. LUIS uses the power of Machine Learning to solve difficult problem of extracting meaning from natural language input.