A Model for Movie Recommender Systems Based on Collaborative Filtering

The recommender systems are used in different fields such as movie, music, social networks and etc. their aim is giving attractive suggestions to users concerning their performance in the system. Content-based methods and participatory filtering are the most popular recommender systems. The participatory method has two main problems. The first one is a cold start of users and the next problem is loyal users believe. In this article, one recommender system has been designed with a combination of the content-based and participatory filtering methods as a boosting system. Besides, that includes the problem solving of a cold start, trusting and attention to the loyal users. The recommender method consists of three steps: 1) the primary classification of all users and assigning a suitable classification to a new user. 2) The determination of an appropriate weight for each characteristic of slightly users and close neighbors of a new user. 3) Making a scores adjacency matrix of the neighbors close to available movie and the scores calculation of new users for each movie. The presented method, concerning demographic information of users, utilizes content-based system capability for primary classification. The gained consequences show that, mean absolute error and the root mean square error for the recommender method compare with Naïve Bayes, C4.5, C4.5 and RCA has improved about 8.4%.


Introduction
The recommender systems are used in some situations like movie store, libraries, restaurants, tourism systems and other places for giving interesting choices and items [1].these systems play important and vital role in e-commerce and specifically website online movie [2]. the most popular systems which has been considered from past till now, are recommender system of movie.Concerning a huge amount of information, one of the relative matters in this field, is giving the most attractive items (movie) to users in a suitable time.The recommender system of movie, this enables users to be provided movies with features like movie title, director, author, release date, etc. generally, the Recommender systems are divided into two major categories [3]: -Content-based filtering systems.
-Participatory filtering systems.In content-based filtering systems, suggestions are presented based on ranks and privileges that users allocate to the content, news text, links, etc. Accordingly, a top rated content is recommended [4].In participatory filtering system, suggestions are based on selections of similar users and dedicated privileges by them for movies.The most important ahead challenge in the recommender systems based on participatory filtering, is the problem of the cold start.In recent years, this matter is considered by different researchers.The problem of the cold start in social networks, belong to new users who have empty profiles.(In other words, in an operating environment of movie, none of users accredit the movies), or a few ranks have been proved in the system.For users who have cold start and don't have any available activities, so, the content-based filtering based on their profile is used.But, for users who have available records in the system, the participatory filtering method is used.The rest of this article is divided in this case: In section 2, the past works are investigated.In section 3, the recommender model along with presented the descriptions of architecture, are given.In section 4 and 5, gained results and in section 6, also the conclusion and coming suggestions are made.

Related Works
In this study, for making suggestion to the movie, some of the researches are presented due to resolve the problem of cold start.Hung et al. (2011) point out the problem of cold start for movies and users.They introduced an important traditional system of participatory filtering.In this model, two matrixes of similarity was used, that one of them shows the similarity between users and movies and the other one shows the similarity of users with each other.Then, concerning the mechanism of discussed forecast, they make some suggestions to the users.One of the weaknesses of this study is a high memory usage concerning members (users) and movies and due to construction of several similarity matrixes [5].Bobadilla et al. (2012) used the neural network as a recommender system of the participatory filtering to reduce cold start issue for new users.They assessed on the dataset Movie lens, Netflix and in their research, due to the usage of non-numeric data, they utilized the metric for Jakard measurement of similarity [6].Henc (2013) recommended the movie to users with using of clustering movies and k-means algorithm.He carried out it based on points which were presented by users to the movies.Henc studied on famous dataset Movie lens and implemented the presentation for data collection with a volume of 10109 movies that was assessed by 2113 users [7].Kamvatsus et al. (2014) introduced a model that in, classification algorithm such as Naive Bayes, decision tree, and random classification algorithm, with using metric for similarity in order to recommend movies to users, has been utilized.Also, they evaluated movies lens dataset [8].Luize et al. (2015), for increasing the performance of the system and solving the cold start problem, posed the combination method of both participatory filtering and demographic information.In the study, they used the combination co-clustering algorithm and knowing the machine for solving the cold start problem and have evaluated Movie lens, Jester, http://www.ispacs.com/journals/cacsa/2017/cacsa-00073/International Scientific Publications and Consulting Services and Netflix dataset [9].Due to unimportant challenges like scalability, dispersion and users confidence compare with the cold start and films which has been researched till now, the challenges have also been resolved with preprocessing, clustering and classification.

The proposed method
The plan of proposed model has been given in figure 1.In the following, duties of each components of the model are explained.First, preprocessing is performed on data and users processing that in the system lack profile information or privileges to movies are ignored till the speed in data analysis and processing improve.The clustering of users based on demographic information and by using clustering of k-means algorithm.Determination of a suitable cluster by using the combination of clustering and metric of boosting technics for new users that are faced to the cold start.To find similar users, using combinational metric of similarity in accordance with amount of similarity in age, gender, and education.Making a proximity user-item which in, neighbors (rows) have given privilege to all movies (columns), have been showed.Calculation of new user rating for each movie concerning the loyal users.Giving a list of movies to the users along with using a forecast mechanism concerning the proximity matrix.

Classify User Using Boosting
Class Of New User

Rating Movie inconsideration of legal users Movie Lists
Producing of user-movie adjacency matrix

Recommender system
Classify New User

User
Predictor In this study as figure 1 shows that, before applying the users clustering based on the demographic information, it is necessary to normalize the data first be done.In order to select the optimal number of clusters, using data mining software (weka), data on the number of different clusters (k) has evaluated and by calculating the sum of squares for error in each clustering, we determine the number of acceptable K. the optimal number of clusters is 100.Once you find the most appropriate clusters, at this point, using the new http://www.ispacs.com/journals/cacsa/2017/cacsa-00073/International Scientific Publications and Consulting Services user demographic information.In this step, using the new user demographic information and determined clusters in the previous step, we can find the suitable new user clustering.
In the next step, the output stage is training data clustering that is given to the boosting system and this model is produced.Then, the new user inters to the system as a test data and its cluster is determined.After the new user of cluster or class was determined, the neighbors such as the users of that cluster are extracted.The neighbor's ideas to suggest the movies are considered.After the class or cluster was assigned to the new user, its neighbors such as available users in the cluster are extracted.suppose the users of system are defined as a Collection of U={u Where in, p is number of neighbor users, u j is the neighbor user of jth feature, r u j ,i b is neighbor user rating of jth feature to video of ith feature and R n,i b is the new user rating to video of ith feature.

Experiment results
In this paper, for checking and evaluating results, required simulation was conducted on the dataset Movie lens.In order to achieve the data used source, just referring to [10] Where in, P u,i is predicted rating user u and movie ith and r u,i are real score of user u for movie ith.The results of simulated have been defined in the form of Scenarios that are shown at the Scenarios  2 shows, the proposed method in terms of the metrics of MAE and RMSE, with number of 100 users in online network, compared with the other methods have been noticeably improved.3 shows, the simulation results of the proposed method with Scenarios 1, 2 and 3 and the number of 100 users in terms of metrics of MAE and RMSE compared with the other methods have improved respectively about 2.55% and 1.87%.As Table 3 shows, the simulation results of the proposed method with Scenarios 1, 2 and 3 and the number of 100 users in terms of metrics of MAE and RMSE compared with the other methods have improved respectively about 2.5% and 1.8%.At table 5, 6 and 7, the mean absolute error and the root mean square error for the proposed method with Scenarios 1, 2 and 3 and numbers of 100 users have been showed.As mentioned, tables 5, 6 and 7, analyze result of the proposed method in terms of MAE and RMSE scores with Scenarios 1, 2 and 3 and number of 500 users.So, the said proposed method on average and compared to other methods such as C24.5, C M 4.5, Naïve Bayes and RCA, in terms of mean absolute error and the root mean square error for the proposed method, have improved respectively about 2.92% and 0.96%.At table 8, 9 and 10, mean absolute error and the root mean square error for the proposed method with scenario 1, 2 and 3and numbers of 500 users have been showed.http://www.ispacs.com/journals/cacsa/2017/cacsa-00073/International Scientific Publications and Consulting Services This study shows in table 8, 9 and 10 compares the result of proposed method in terms of metric of MAE and RMSE with scenario 1, 2 and 3 and according to the numbers of 900 users.Finally, proposed method on average and compared to other methods such as C 2 4.5, C M 4.5, Naïve Bayes and RCA, in terms of mean absolute error and the root mean square error for the proposed method, have improved respectively about 0.86% and 0.93%.The result of the study shows that, the findings of the proposed method compared to other methods, has less error for posing suggestions.

Conclusions
The main purpose of this paper is resolving the problem of cold start on online channels and giving appropriate videos.This aim goes ahead with acceptable accuracy by using a combination of content-based and participatory filtering and also using data mining techniques.Therefore in the article, using clustering techniques and metrics of combinational similarity, researcher could give videos to new users which compared to other methods that have been done so far, have more accuracy.So, entering a new user to system and given that any video is not rated and faces to the cold start, accordingly, the proposed method using clustering algorithm and boosting method to employ capabilities based on demographic information of the new user, suggest videos.Lastly, to evaluate the score of forecast error in the proposed method compared to other similar methods such as C 2 4.5, C M 4.5, Naïve Bayes and RCA, evaluation metrics of MAE and RMSE are used.These metrics with numbers of 100, 500 and 900 users have been evaluated.Generally,

Figure 1 :
Figure 1: Flowchart and architecture of the proposed method , select desired dataset from provided version and download them.The evaluation metric of average actual error and root mean square error are used., −  , ,)2  (3.6) 1 ,u 2 ,u 3 ,…,u m ,}, with demographic features of users D={d 1 ,d 2 ,d 3 ,…,d l }, and collection of movies to form of I={i 1 ,i 2 ,i 3 ,…,i k }.Assume that W = {w 1 , w 2 , w 3 , … , w i } is demographic weight for each user.So empirically, the weight of each properties using numbers interval [0, 1], are initialized.Then, similarity of a new user (n) and each of the neighbors (up) is calculated by Equation1: SFj is the similarity value of the jth feature.wj is the weight of desired feature.s(d j ) is a function between [0, 1].This function calculates the degree of similarity between features of two users.Concerning the nature of the users, s(d j ) can be defined in two general groups.
That in, d j,n is value of feature in user n, d j,u p is attribute value jth feature in the user u p , Diff is difference and Diff max is the maximum difference of jth feature of new user and each of the neighbors.β, is a parameter that determines effect of the difference of one feature.After using metric of combinational similarity, we gain the similarities between a new user and other neighbor user.The adjacency matrix regarding given privileges to movies are made by the neighbor users.Then, using equation (3.4), video rating is calculated.Movies that earn the most points are suggested as a movie premiere.

Table 1 :
Simulation ScenariosAs Table1shows, for different Weights for characteristics such as age, gender, occupation, and achieved different results.Relevant scenarios with different amounts of weight are defined.Weights that have more features are more important and more effective in the similarities.At the table 2, 3 and 4, the mean absolute error and the root mean square error for the proposed method with Scenarios 1, 2and 3 and numbers of 100 users have been showed.

Table 2 :
Mean absolute error and the root mean square error for the proposed method with numbers of 100 users.

Table 3 :
Mean absolute error and the root mean square error for the proposed method with numbers of 100 users.

Table 4 :
Mean absolute error and the root mean square error for the proposed method with numbers of 100 users.
International Scientific Publications and Consulting Services

Table 5 :
Mean absolute error and the root mean square error for the proposed method with numbers of 500 users.

Table 6 :
Mean absolute error and the root mean square error for the proposed method with numbers of 500 users.

Table 7 :
Mean absolute error and the root mean square error for the proposed method with numbers of 500 users.

Table 8 :
Mean absolute error and the root mean square error for the proposed method with numbers of 900 users.

Table 9 :
Mean absolute error and the root mean square error for the proposed method with numbers of 900 users.

Table 10 :
Mean absolute error and the root mean square error for the proposed method with numbers of 900 users.