Long-term Product Rating Prediction Based on Users' Short-term Multiple Ratings

Ratings and product reviews could be considered as one of the main features determining the quality of a product in online store systems, especially in deciding whether to place a product as part of an online store's inventory. online vendors are often attracted by product reviews and ratings. However, when the average product rating observed based on a small number of user ratings, the decision maker may not be certain about choosing that product, even if it has a fairly high rate. Long-term rating predictions would help online vendors to identify products and advertise their websites by choosing potential ones. In this paper machine learning approach utilizing linear regression model is used to predict long-term product rate. The model evaluated using the Datasheet of the Amazon Online Store website,1996 to 2014.


1-Introduction
None of previous methods of rating predictions had worked on a comprehensive approach and only were enough to make certain algorithms better than the rest on a prediction dataset, they couldn't analyze the answers or either provide general solution for the same initial data. Hence, this paper proposes a comprehensive framework which can consider diverse data and different products as an input dataset. Moreover, this framework would provide appropriate method selection for the test dataset and then utilizing machine learning model for long-term prediction process.
Our proposed model uses various methods of extracting properties and machine learning algorithms in order to analyze multiple rating inputs for each type of product. Not to forget to mention that every machine learning algorithm has its own strength and weakness which depends on initial data. As stated above, predicting long-term product ratings will meet different goals on both sides of the users and online store owners. On the side of the users, the main goal is to eliminate doubt of potential customers and raise the purchasing decision, while on the side of the owners there are other purposes, such as increased sales and commercial profits, increased online store rate in search engines, improved situation in SEO, increased welfare and continuous return of users, compatible user services, user based marketing, targeted internal social network for engaging and encouraging users to put more ratings into consideration. Toghillin et al. (2005) tried to use voting systems to improve prediction. They also described the limitations of the current advisory methods, and discussed the possible generalizations that can improve the recommendation capabilities and use the recommender systems for a wide range of applications. These formats include improved users and items understanding, the integration of textual information in the advocacy process, support for multimodal ratings, and the provision of flexible recommendations.

2-Literature Review
Nietin Cheindal et al. (2006) improved the user rating system using the concept of data mining. They examine a text extraction problem and presented an equation for extracting comparative sentences. An adaptive sentence denotes the relationship between two categories of entities with respect to some common features. For example, the comparative sentence "Canon is better than Sony and Nikon", which describes the comparative ratio: (better, optics, Canon, Sony, Nikon). According to a set of web-based evaluation texts, for example, surveys, forum posts, and news articles, the task of extracting comparative sentences is (1) determining the comparative sentences of the texts and (2) extracting comparative relations from the sentences. Accordingly, many applications were created. For example, a product vendor wants to know customer opinions about their products compared to their competitors.
Torsov (2013), improved the users' voting system by considering the importance of social networks. Although the role of social networks and consumer interactions in the release of new products is widely accepted, such networks and interactions are not often explorable for researchers. Instead, something that may be visible was the general release patterns for earlier products that have been taken on a specific social network. He presented an approach to identify the systematic conditions that are persistent throughout the releases and transitions of a new product within the network. He also suggested that the integration of these systematic conditions improves predictions. Chen et al. (2015) reviewed recommender systems based on user voting systems. In recent years, a variety of review-based reviewing systems have been developed, aiming at incorporating valuable information from usergenerated texts into user-model designs and recommendations. They provided an overview of how to use the review elements to improve the standard content-based recommendations, collaborative filtering, and prioritybased product rating techniques. The survey looked at two main modes on two main levels: creating user profiles based on reviews and creating product-based review profiles. In the sub-section of user profiles, surveys are used not only to create time-based profiles, but also to infer or enhance evaluation. More multifaceted comments can be exploited to gain the weight / value priority that users place on specific features. In another branch, the product profile can be enriched with commentary comments or comparative views to evaluate its quality.
Yu Zhang et al. (2017) offered theoretical recommendations as a new work in conjunction with the prediction of a specific survey, along with a ranking score that a particular user gives to a specific product or service. They used a single neural network for modeling users and products, defining their correlation, and customizing products. The results indicated that their prediction method offers ratings that are very close to real user ratings, and better results than other algorithms.
Samizadeh and Mahmoudi (2018) assigned the opinions and texts published by users in cyberspace to classes with positive or negative feelings. The purposed article is to use and to compare the methods of machine learning in categorizing Persian texts based on the emotions of active users in cyberspace. Prior to using algorithms, the process of preprocessing is based on character conversion, expression deletion and multi-layered analysis. In another study, Kipour, Barry and Shirazi (2014) presented an article called a new method for predicting the link between vertices in social networks and concluded that the local approach could be a good proposition for the edges due to localities.

4-Research Goal
Long-term product rating prediction based on users' short-term ratings: The prediction tasks are important to achieve objectives of early strategic approaches dealing with different products. Therefore, finding long-term product rating with only initial short-term ratings (assuming 50 initial votes) will result in better organizational performance.

5-Research Methodology
Long-term product rating prediction is based on the following assumptions: 1) The score that user gives to a product is influenced by former average of product rating.
2) The score that user gives to a product is influenced by number of voters.
3) The score that user gives to a product is influenced by user's profile type.
4) The score that user gives to a product is influenced by the actual product quality. 5) The score that user gives to a product is influenced by the standard deviation of the product ratings during his evaluation.

5-1-Effective-parameters selection in long-term product rating
The main idea of this paper is to design a predictive tool for effective long-term product rating. Linear regression is a common prediction method in many scientific domains which could be a suitable alternative to many computational and complex models, such as Bayesian network. Linear regression model utilizes training dataset and continues parameters instead of taxonomic values.
1) Former average of product rating (x1): x1 is a continuous variable.
2) Number of voters (x2): x2 can accept the positive integer value. This is not categorized.
3) Relative benchmark of user profile (x3): x3 considers the quantitative score for each category of users: very easy going (-2), easy going (-1), accurate (0), strict (1), very strict (2). This value is an average value of the user category for all users who voted for that particular product at the time a new vote was taken by a user. Consider a situation where the product is rated by three users. Suppose that the categories of these users are accurate, strict and very easy going. When The first user rates the product, x3 would be 0 then the second user rates the product, x3 would be 0.5 (average 0 and 1). Similarly, when a third user rates a product, the ratio of a category or x3 value is 0.33-on average (0, 1, and -2). This negative benchmark indicates that the product is considered to be easily rated. The effect of this score on the average long-term rating (parameter) is expressed by the corresponding regression model estimated from the trained dataset. 4) Standard deviation of evaluation standards (x4): x4 is a continuous variable. In addition to linear terms, second-order terms are also considered in the linear regression model: (( , , , )) In addition to linear terms and second-order terms, interaction effects are also considered in the linear regression model: It should be noted that the quality parameter of products which definitely affecting user ratings is not considered as a parameter due to not being measurable in this section. The parameters of the regression model are estimated from similar training dataset. For selected data test, this model is used for the predictions of ratings based on the long-term average. Hence, in this model, the long-term average is a dependent variable. In this way, the input to linear regression model contains 14 parameters. As a multi-dimensional linear regression problem, the output would present the effect of these 14 main parameters on the final rate of the products.

5-3-Coefficient Specification for Each Defined Parameter
Initially, with the progress of time before each voting operation, we compute 14 main parameters, which are described above. According to these parameters and the creation of a test and train dataset, we calculate the effect of these parameters in the final product rating individually. Therefore, machine learning objective is to obtain effective coefficients using the training dataset. In other words, coefficients of 14 parameters are their degree of effectiveness.
In result the effect of these 14 main parameters on the final product rating is definite and the prediction process would be time independent. Coefficients are calculated using the training dataset and the following formula: It should be noted that coefficients are presented as integers and could get aggregated like A.

5-4-Coefficient Application and Long-term Product Rating Prediction
Considering test dataset of the same product, long term product rating prediction uses A which is aggregated coefficient for the long-term rating obtained from formula 3.
The tasks of long-term prediction process mainly highlighted below: 1) Compute coefficients of main parameters using training dataset (formula 2) then place A (formula 3) or individual coefficients in formula 4. 2) Calculate main parameters of the test dataset, and place them in formula 4. 3) Repeat this process until MSE reaches the minimum in regression model formula 5.

6-3-Coefficient Calculation
For the long-term prediction, the example given in Table 6-3 represents the linear regression model for our limited test data to determine the coefficient or influence rate of these parameters in the long-term product rating.  Table 6-3, Coefficient calculation of long-term product rating using linear regression As it is obvious from the table 6-3, in addition to coefficient of x2 * x3 (the number of voters and the former average rating), the remaining coefficients also have a relatively large effect on long-term product rating. Finally, in the figure below you can see a prediction view sample which belongs to www.amazon.com.

7-Conclusion and Future Work
The results of this paper involve long-term product rating prediction based on linear regression model measuring coefficients of effective parameters by utilizing test and train datasets. As stated, long-term rating prediction is affected by several factors such as former average rating, number of voters, user profiles, user perception of the actual quality and standard deviation. Since we have used linear regression model as the prediction method, we must consider inbound information sufficiency. In other words, the more input we give to this model, the more precise we expect to be. Generally, different machine learning algorithms have their own strengths and weaknesses. Therefore, linear regression model has limitations mainly mentioned below: 1) In linear regression analysis for prediction objectives prediction errors are not merely random, they may occur due to model inadequacy and inappropriateness. 2) In linear regression analysis for prediction objectives predictions outside the range of independent variables are not allowed. 3) In linear regression analysis for prediction objectives duplicate data measurements are not allowed. 4) In linear regression analysis for prediction objectives existence of a regression relationship does not guarantee a causal reasoning. 5) In linear regression analysis for prediction objectives coefficient related restrictions could occur. Considering machine learning approaches to resolve noted restrictions would be known as new field of research, as well as utilizing other regression models in predicting long-term product rating such as Ordinal Regression, Polynomial Regression, and even hybrid methods such as combining Bias networks and linear regression.