ESLI: Enhancing slope one recommendation through local information embedding

Authors: Heng-Ru Zhang ^aff001; Yuan-Yuan Ma ^aff001; Xin-Chao Yu ^aff001; Fan Min ^aff001
Authors place of work: School of Computer Science, Southwest Petroleum University, Chengdu, China ^aff001
Published in the journal: PLoS ONE 14(10)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0222702

Summary

Slope one is a popular recommendation algorithm due to its simplicity and high efficiency for sparse data. However, it often suffers from under-fitting since the global information of all relevant users/items are considered. In this paper, we propose a new scheme called enhanced slope one recommendation through local information embedding. First, we employ clustering algorithms to obtain the user clusters as well as item clusters to represent local information. Second, we predict ratings using the local information of users and items in the same cluster. The local information can detect strong localized associations shared within clusters. Third, we design different fusion approaches based on the local information embedding. In this way, both under-fitting and over-fitting problems are alleviated. Experiment results on the real datasets show that our approaches defeats slope one in terms of both mean absolute error and root mean square error.

Keywords:

Learning – Mathematical functions – Habits – Neural networks – Experimental design – Clustering algorithms – k means clustering

Introduction

Collaborative filtering (CF) [1–3] is one of the widely used techniques in recommender systems [4, 5]. CF does not rely on the content descriptions of items, but purely depends on preferences expressed by a set of users. Memory-based and model-based CF are two main approaches [3, 6]. The former uses the entire user-item database to make a prediction [7], such as slope one [8], k-nearest neighbor [9], and matrix factorization [10]. The latter first learns a descriptive model of user preferences and then uses it for predicting ratings [11], such as neural network classifiers [12], Bayesian network [13], linear classifiers [14].

Data sparsity [15] is one of the main factors affecting the prediction accuracy of CF. Slope one uses a linear regression model to handle data sparsity. By determining the quantitative relationship between two or more items, efficient recommendation can be generated in real time. However, slope one often faces under-fitting since the global information of all users/items are considered.

In this paper, we propose a new approach called enhanced slope one recommendation through local information embedding (ESLI). On one hand, we try to alleviate under-fitting caused by slope one with global information. This is fulfilled through using local information of users/items to accurately measure the similarities between two users’ preferences. On the other hand, we try to alleviate over-fitting caused by local information. This is fulfilled through appropriate granular selection [16] and approach fusion.

First, we employ clustering algorithms to extract local information. Users with similar rating habits will be clustered into one category. The user clusters represent local user information (LU). Correspondingly, items of similar popularity will be clustered into one category. The item clusters represent local item information (LI).

Second, we predict ratings using the local information of users and items in the same cluster. We design three enhanced slope-one approaches embedding local information. The local-user-global-item approach (LUGI, also called A₁) only embeds user local information. The global-user-local-item approach (GULI, also called A₂) only embeds item local information. The local-user-local-item approach (LULI, also called A₃) embeds both the user and the item local information.

Third, we design four fusion approaches (A₄, A₅, A₆, and A₇) based on the above three basic approaches to make the best prediction. The four approaches merges LUGI, GULI, and LULI, respectively. We use the average of any two or three approaches to form four fusion approaches. In this way, both under-fitting and over-fitting are alleviated.

To examine the performance of the proposed method, we conducted experiments on the well-known MovieLens, DouBan datasets with a Java implementation. Experimental results show that (1) ESLI decreases both the mean absolute error (MAE) and root mean square error (RMSE) evaluation indicators; and (2) ELSI is more prominent than slope one in large datasets.

The rest of this paper is organized as follows: Firstly, we present the related works including rating system, slope one algorithm and clustering algorithms. Secondly, we discuss how to extract local information and embed it into the slope one algorithm. Subsequently, we present our experimental results for four datasets. Finally, we introduce the conclusion and further work. All code files and datasets are available from the Github database (https://github.com/FanSmale/ESLI.git) or Supporting Information (see S1 and S2 Files).

Table 1 defines notations used throughout the paper.

Related work

The ESLI scheme uses the rating system and the local user/item information as input. The clustering algorithm is employed to obtain the local user/item information using the rating system.

Rating system

Let U = {u₀, u₁, …, u_m−1} be the set of users and T = {t₀, t₁, …, t_n−1} be the set of items. The users’ ratings of the items form a rating matrix. The rating function is given by [17]

where V is the rating scale. For convenience, we denote the rating system as an m × n rating matrix R = (r_i,j)_m×n, where r_i,j = R(u_i, t_j), 0 ≤ i ≤ m − 1, and 0 ≤ j ≤ n − 1.

Table 2 depicts an example of rating system, where m = 5, n = 5 and V = {1, 2, …, 10}. “–” indicates that the users do not have ratings on the items.

The rating matrix (<i>R</i>). — **Tab. 2. The rating matrix (R).**

Slope one

The underlying principle of the slope one algorithm [8] is based on linear regression to determine the extent by which users prefer one item to another. It uses a simple formula f(x) = x + b, where the parameter b represents the average deviation of the ratings of two users or items [8]. Then, given a user’s ratings of certain items, we can predict the user’s ratings of other items based on the average deviation.

Slope one [8] is adaptive to data sparsity. It is easy to realize and extend. Due to it can generate effective recommendation in real time, it is used in many online recommender systems, such as movies, music and books. However, owing to calculate the average deviation with global information, this can lead to under-fitting problem.

Global and local rating information fusion

CF uses rating information to predict users’ preferences for items [18–21]. Rating information can be collected by implicit means, explicit means or both. Implicit ratings are inferred from a user’s behaviors. In the explicit collection of ratings, the user is asked to provide an opinion about the item on a rating scale. Explicit ratings provide a more accurate description of a user’s preference for an item than implicit ratings. We only take explicit ratings as input in this paper.

CF algorithms typically use global or local information about user preferences to help people make choices. Some studies take global information as input, such as slope one [8], matrix factorization [22], which leads to under-fitting problems. Some studies take local information as input, such as MG-LCR [23], UPUC-CF [24], which leads to over-fitting problems. In order to avoid the above two problems, some studies combine local and global information to learn models, such as MPMA [25], GLOMA [26]. However, to the best of our knowledge, the fusion of global and local information has not been used for slope one algorithm.

Clustering algorithms

Clustering is used to reveal the intrinsic properties and laws of data [27, 28]. It attempts to divide the data sample into several subsets which are usually not intersected [28]. In collaborative filtering, users and items can be grouped into different clusters. User-based clustering [29] divides users with similar rating habits into a cluster. Item-based clustering [3] divides items into different clusters based on the similarity of attributes such as item popularity, etc.

There are a lot of clustering algorithms, such as k-means [30] and M-distance [31]. k-means [30] randomly selects k samples as the center points and obtains clusters through multiple iterations. It is easy to implement, but the convergent speed is slow and the clustering results are uncertain. M-distance [31] defines the relationship between users or items using the average rating. Compared with k-means clustering, its convergent speed is fast and the clustering results are deterministic.

ESLI scheme

In this section, we describe our proposed scheme. Firstly, we describe the extraction of local information. Then, we describe the ESLI scheme, which includes three basic approaches and four fusion approaches.

Local information extraction

Local information is intended to extract the rating habits of similar users or the popularity of similar items. Naturally, the clustering algorithm is employed to obtain it. LU/LI are used to represent the local user/item information, respectively. S1 Fig depicts the schematic diagram of local information extraction.

S1A Fig depicts an example of LU. Users are classified into different clusters based on the rating habits. The first user cluster is composed of u₀ and u₄. Their ratings are no more than 5 points for all items. They are more strict users and are used to providing the lower rating. The second user cluster is composed of u₁, u₂ and u₃. Their ratings are no less than 6 points for all items. They are more tolerant users and are used to providing the higher rating.

S1B Fig depicts an example of LI. Items are classified into different clusters based on the items popularity. The first item cluster is composed of t₀ and t₄. They get a lot of low ratings of 1-2 points. The low ratings indicate that they are less popular. The second item cluster is composed of t₁, t₂ and t₃. They get a lot of high ratings of 8-9 points. The high ratings indicate that they are popular items.

S1C Fig depicts an example of LULI. Each cluster contains a subset of users and a subset of items. The first cluster is composed of a user group {u₀, u₄} and an item group {t₀, t₄}. They are the lowest rating of 1-2 points. The second cluster is composed of a user group {u₀, u₄} and an item group {t₁, t₂, t₃}. They are the lower rating of 3-5 points. The third cluster is composed of a user group {u₁, u₂, u₃} and an item group {t₀, t₄}. They are the higher rating of 6-7 points. The fourth cluster is composed of a user group {u₁, u₂, u₃} and an item group {t₁, t₂, t₃}. They are the highest rating of 8-9 points. Within each LULI cluster, the rating distribution is more balanced and the rating similarity is higher than LUGI and GULI.

Enhanced slope one algorithms

S2 Fig lists eight slope one approaches. S2A Fig depicts global-user-global-item approach (GUGI) [8]. S2B–S2D Fig depict three basic approaches, including LUGI, GULI and LULI. S2E–S2H Fig depict four fusion approaches.

Approach A₁ uses the sub-matrix R^g,. as input, and computes the predicted rating p i , j g , . for u_i to t_j as

Based on S2B Fig, we have

Example 1 p 3 , 1 1 , . = ( 9 − 7 + 6 ) + ( ( 8 − 9 ) + ( 9 − 8 ) 2 + 9 ) + ( ( 8 − 8 ) + ( 9 − 9 ) 2 + 8 ) + ( ( 8 − 6 ) + ( 9 − 6 ) 2 + 7 ) 4 ≈ 8 . 6 .

Approach A₂ uses sub-matrix R^.,q as input, and computes the predicted rating p i , j . , q for u_i to t_j as

Based on S2C Fig, we have

Example 2 p 3 , 1 . , 1 = ( ( 4 − 5 ) + ( 9 − 9 ) + ( 8 − 8 ) + ( 3 − 4 ) 4 + 8 ) + ( ( 9 − 8 ) + ( 8 − 9 ) + ( 3 − 5 ) 3 + 9 ) 2 ≈ 7 . 9 .

Approach A₃ uses the sub-matrix R^g,q as input, and computes the predicted rating p i , j g , q for u_i to t_j as

Based on S2D Fig, we have

Example 3 p 3 , 1 1 , 1 = ( ( 8 − 8 ) + ( 9 − 9 ) 2 + 8 ) + ( ( 8 − 9 ) + ( 9 − 8 ) 2 + 9 ) 2 = 8 . 5 .

The algorithm A₄ takes the average of the approaches A₁ and A₂ as the final predicted rating

Based on Example 1 and 2, we have

Example 4 f 3 , 1 1 = p 3 , 1 1 , . + p 3 , 1 . , 1 2 = 8 . 6 + 7 . 9 2 ≈ 8 . 3 .

The algorithm A₅ takes the average of the approaches A₁ and A₃ as the final predicted rating

Based on Example 1 and 3, we have

Example 5 f 3 , 1 2 = p 3 , 1 1 , . + p 3 , 1 1 , 1 2 = 8 . 6 + 8 . 5 2 ≈ 8 . 6 .

The algorithm A₆ takes the average of the approaches A₂ and A₃ as the final prediction rating

Based on Example 2 and 3, we have

Example 6 f 3 , 1 3 = p 3 , 1 . , 1 + p 3 , 1 1 , 1 2 = 7 . 9 + 8 . 5 2 = 8 . 2 .

The approach A₇ takes the average of the approaches A₁, A₂ and A₃ as the final prediction rating

Based on Example 1, 2 and 3, we have

Example 7 f i , j 4 = p 3 , 1 1 , . + p 3 , 1 . , 1 + p 3 , 1 1 , 1 3 = 8 . 6 + 7 . 9 + 8 . 5 3 ≈ 8 . 3 .

Time complexity analysis

Let the number of users and items be m and n, respectively. The complexity analysis includes off-line and on-line phases. Local information can be extracted in the off-line phase by clustering algorithm. For M-distance clustering algorithm [31], the time complexity is O(mn).

In the on-line prediction stage, we discuss the time complexity of predicting a rating. For global-user-global-item approach (GUGI), the time complexity is O(mn). Let the number of user groups and item groups be C and E, respectively. The time complexity of local-user-global-item approach (LUGI) is O ( m n C ). The time complexity of global-user-local-item approach (GULI) is O ( m n E ). The time complexity of local-user-local-item approach (LULI) is O ( m n C E ).

Experiments

In this section, we report extensive computational tests designed to address the following questions:

Does the ESLI model perform better than existing slope one [8] in terms of MAE and RMSE?
Does the ESLI model have a more prominent advantage than the existing slope one [8] when there are more users or items?

Question 1 compares the MAE and RMSE between our proposed scheme and existing slope one. The question is the core issue of this paper.

Question 2 compares the MAE and RMSE between our proposed scheme and the existing slope one under different scale of users or items.

Datasets

Table 3 lists the basic information of Movielens 100K (ML100K), Movielens 1M (ML1M), Movielens 10M (ML10M) and DouBan [32] (DB, https://www.cse.cuhk.edu.hk/irwin.king.new/pub/data/douban) datasets. The number of users ranges from 943 to 71,567. The number of items ranges from 1,682 to 39,695. The number of ratings ranges from 100,000 to 10,000,054, while the density of rating ranges from 0.78% to 6.30%. The average rating ranges from 3.51 to 3.75.

The rating distributions of four datasets have similar normal distribution characteristics. The rating scale ranges from 0.5 to 5. The highest scale is 5. The lowest scale is 0.5. The step length is 0.5. The frequency is the highest when the rating is 4, and the frequency is the second highest when the rating is 3 or 5. For the ML100K dataset, the maximum number of ratings for users/movies are 737/583, respectively, with a minimum of 168/1, respectively. For the ML1M dataset, the maximum number of ratings for users/movies are 2,314/3,428, respectively, with a minimum of 341/388, respectively. For the ML10M dataset, the maximum number of ratings for users/movies are 7,359/34,864, respectively, with a minimum of 20/1, respectively. For the DB dataset, the maximum number of ratings for users/movies are 10,157/1,274, respectively, with a minimum of 166/1, respectively.

Evaluation metrics

We employ MAE [33, 34] and RMSE [34, 35] as evaluation metrics. The lower the values of MAE and RMSE, the better the performance of the recommender system [36].

Given a rating system, the MAE is calculated by

and the RMSE is computed by

where p_i,j is the prediction rating of u_i for t_j.

Experimental design

We design two sets of experiments to answer the questions raised at the beginning of this section.

Exp1. We first determine the parameters C and E, and then obtain the optimal MAE and RMSE. We employ k-means and M-distance clustering algorithms to extract user and item local information. To determine the parameters, we change C ∈ [2, 10] and E ∈ [2, 10] and obtain the minimum MAE and RMSE.

Exp2. Our aim is to analyze its impact on the ESLI scheme under different scale of users and items. First, we gradually increase the number of users under the condition that all items are involved. Second, we gradually increase the number of items under the condition that all users are involved.

We randomly divide the entire dataset into a training set and a testing set. 80% of the data are usually specified as a training set and the remaining 20% as a testing set.

Sensitivity to parameters

Granular selection is one of the important factors affecting the performance of the ESLI scheme [37–39].

Because the approach A₇ is a fusion algorithm for all basic approaches. We find the optimal C and E through computing the MAE of approach A₇. S3 and S4 Figs show the MAE of approach A₇ when C ∈ [2, 10] and E ∈ [2, 10] for the ML1M dataset.

In S3 Fig, the number of item clusters is set to 3. When the user cluster C ∈ [2, 4], the MAE decreases. When the user cluster C ∈ [4, 10], the MAE increases. We get the minimum MAE when C = 4. In S4 Fig, the number of user clusters is set to 4. When the user cluster E ∈ [2, 3], E ∈ [4, 5] and E ∈ [8, 10], the MAE decreases. When the user cluster E ∈ [3, 4] and E ∈ [5, 8], the MAE increases. We get the minimum MAE when E = 3.

We analyze the performance of the ESLI through changing the number of users/items. S5 and S6 Figs show the MAE comparison between A₇ and GUGI. In S5 Fig, we fix the number of items, then gradually increase the number of users. As the number of users increases, the advantages of the ESLI scheme become more apparent. In S6 Fig, we fix the number of users, then gradually increase the number of items. As the number of items increases, the advantages of the ESLI scheme become more apparent.

Runtime comparison

The time complexities of GUGI, LUGI (A₁), GULI (A₂), and LULI (A₃) is O(mn), O ( m n C ), O ( m n E ), and O ( m n C E ), respectively. Therefore we expect the runtime of LUGI, GULI, and LULI is 1/C, 1/E, and 1/CE of GUGI. When C = 4 and E = 3, they should be 1/4, 1/3, and 1/12, respectively.

The runtime of all algorithms under M-distance clustering is compared in Table 4. Note that the runtime is the total execution time, which includes the file input and output overhead. The computations were performed on a Windows 10 64-bit operating system with 8 GB RAM and intel Core i5 CPU@3.4GHz processors, using java software.

**Tab. 4. Runtime comparison under M-distance clustering (unit: ms).**

For ML100K dataset, the experimental values are 41/63, 49/63, and 36/63, respectively. For ML1M dataset, the experimental values are 53/92, 56/92, and 40/92, respectively. For ML10M dataset, the experimental values are 53/93, 55/93, and 41/93, respectively. For DB dataset, the experimental values are 53/78, 56/78, and 43/78, respectively. They generally comply with the expected values. A₄, A₅, A₆ and A₇ are fusion algorithms, therefore they have more runtime than GUGI.

Comparison of MAE and RMSE

We compare the performance between ESLI scheme and the traditional slope one in terms of MAE and RMSE.

Table 5 shows MAE comparison under M-distance clustering.

For dataset ML100K, approach A₂ obtains the lowest MAE, which is 0.21% lower than the traditional GUGI approach. For dataset ML1M, approach A₄ obtains the lowest MAE, which is 3.11% lower than the traditional GUGI approach. For dataset ML10M, approach A₁ obtains the lowest MAE, which is 4.60% lower than the traditional GUGI approach. For dataset DB, approach A₃ obtains the lowest MAE, which is 1.66% lower than the traditional GUGI approach.

Table 6 shows RMSE comparison under M-distance clustering.

For dataset ML100K, all ESLI approaches are no lower than the traditional GUGI approach. For dataset ML1M, approach A₁ obtains the lowest RMSE, which is 2.55% lower than the traditional GUGI approach. For dataset ML10M, approach A₁ obtains the lowest RMSE, which is 4.23% lower than the traditional GUGI approach. For dataset DB, approach A₆ obtains the lowest RMSE, which is 1.00% lower than the traditional GUGI approach.

Table 7 shows MAE comparison under k-means clustering.

MAE comparison under <i>k</i>-means clustering. — **Tab. 7. MAE comparison under k-means clustering.**

For datasets ML100K and DB, all ESLI approaches are no lower than the traditional GUGI approach. For dataset ML1M, approach A₅ obtains the lowest MAE, which is 0.21% lower than the traditional GUGI approach. For dataset ML10M, approach A₂ obtains the lowest MAE, which is 0.66% lower than the traditional GUGI approach.

Table 8 shows RMSE comparison under k-means clustering.

RMSE comparison under <i>k</i>-means clustering. — **Tab. 8. RMSE comparison under k-means clustering.**

For datasets ML100K and DB, all ESLI approaches are no lower than the traditional GUGI approach. For dataset ML1M, approach A₅ obtains the lowest RMSE, which is 0.03% lower than the traditional GUGI approach. For dataset ML10M, approach A₁ obtains the lowest RMSE, which is 0.37% lower than the traditional GUGI approach.

In general, the M-distance-based ESLI is superior to the k-means-based ESLI. The k-means clustering is non-deterministic and is related to the initial center and distance function. The M-distance clustering is deterministic and is only relevant to the average rating of the user/item. The user average rating indicates her/his rating preference, and the item average score indicates its popularity. Compared with the k-means clustering method, the M-distance clustering method can better reflect the difference in ratings between different clusters.

Conclusion and further work

In this paper, we propose an ESLI scheme for local information extraction based on clustering. In the ESLI scheme, we design seven different local information embedding approaches. The experimental results show that our scheme is better than slope one in terms of both MAE and RMSE.

In the future, we will apply the concept of local information embedding to other collaborative filtering algorithms. For model-based recommendation algorithms, the local demographic and occupation information will be considered.

Supporting information

S1 File [zip]
ML100K.

S2 File [jar]
StableMA-master.

Zdroje

1. Cheng WJ, Yin GS, Dong YX, Dong HB, Zhang WS. Collaborative Filtering Recommendation on Users’ Interest Sequences. PLOS ONE. 2016;11(5):1–17. doi: 10.1371/journal.pone.0155739

2. Feng JM, Fengs XY, Zhang N, Peng JY. An improved collaborative filtering method based on similarity. PLOS ONE. 2018;13(9). doi: 10.1371/journal.pone.0204003

3. Sarwar B, Karypis G, Konstan J, Riedl J. Item-based Collaborative Filtering Recommendation Algorithms. In: Proceedings of the 10th International Conference on World Wide Web; 2001. p. 285–295.

4. Sun SB, Zhang ZH, Dong XL, Zhang HR, Li TJ, Zhang L, et al. Integrating Triangle and Jaccard similarities for recommendation. PLOS ONE. 2017;12(8):1–16. doi: 10.1371/journal.pone.0183570

5. Zhou YB, Lü LY, Liu WP, Zhang JL. The Power of Ground User in Recommender Systems. PLOS ONE. 2013;8(8):1–11. doi: 10.1371/journal.pone.0070094

6. Zhao ZD, Shang MS. User-based Collaborative-Filtering Recommendation Algorithms on Hadoop. In: Proceedings of 3th International Conference on Knowledge Discovery and Data Mining; 2010. p. 478–481.

7. Linden G, Smith B, York J. Amazon. com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing. 2003;7(1):76–80. doi: 10.1109/MIC.2003.1167344

8. Lemire D, Maclachlan A. Slope One Predictors for Online Rating-Based Collaborative Filtering. In: Proceedings of the 2005 SIAM International Conference on Data Mining. SIAM; 2005. p. 471–475.

9. Keller JM, Gray MR, Givens JA. A Fuzzy K-Nearest Neighbor Algorithm. IEEE Transactions on Systems, Man, and Cybernetics. 1985;SMC-15(4):580–585. doi: 10.1109/TSMC.1985.6313426

10. Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems. Computer. 2009;42(8):42–49. doi: 10.1109/MC.2009.263

11. Yu K, Schwaighofer A, Tresp V, Xu XW, Kriegel HP. Probabilistic Memory-Based Collaborative Filtering. IEEE Transactions on Knowledge and Data Engineering. 2004;16(1):56–69. doi: 10.1109/TKDE.2004.1264822

12. Demuth HB, Beale MH, De Jess O, Hagan MT. Neural network design. Hagan Martin; 2014.

13. Friedman N, Geiger D, Goldszmidt M. Bayesian Network Classifiers. Machine Learning. 1997;29(2-3):131–163. doi: 10.1023/A:1007465528199

14. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research. 2008;9(Aug):1871–1874.

15. Guo GB, Zhang J, Thalmann D. Merging trust in collaborative filtering to alleviate data sparsity and cold start. Knowledge-Based Systems. 2014;57:57–68. doi: 10.1016/j.knosys.2013.12.007

16. Shepitsen A, Gemmell J, Mobasher B, Burke R. Personalized Recommendation in Social Tagging Systems Using Hierarchical Clustering. In: Proceedings of the 2008 ACM Conference on Recommender systems; 2008. p. 259–266.

17. Zhang HR, Min F, Shi B. Regression-based three-way recommendation. Information Sciences. 2017;378:444–461. doi: 10.1016/j.ins.2016.03.019

18. Luo X, Wang D, Zhou M, Yuan H. Latent factor-based recommenders relying on extended stochastic gradient descent algorithms. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2019. doi: 10.1109/TSMC.2018.2884191

19. Luo X, Zhou M, Li S, Wu D, Liu Z, Shang M. Algorithms of Unconstrained Non-negative Latent Factor Analysis for Recommender Systems. IEEE Transactions on Big Data. 2019. doi: 10.1109/TBDATA.2019.2916868

20. Herlocker JL, Konstan JA, Riedl J. Explaining Collaborative Filtering Recommendations. Proc of Cscw. 2000;22(1):5–53.

21. Zhang HR, Min F, Zhang ZH, Wang S. Efficient collaborative filtering recommendations with multi-channel feature vectors. International Journal of Machine Learning and Cybernetics. 2019;10(5):1165–1172. doi: 10.1007/s13042-018-0795-8

22. Kannan R, Ishteva M, Park H. Bounded matrix factorization for recommender system. Knowledge & Information Systems. 2014;39(3):491–511. doi: 10.1007/s10115-013-0710-2

23. Liu W, Lai HJ, Wang J, Ke GY, Yang WW, Yin J. Mix geographical information into local collaborative ranking for POI recommendation. World Wide Web. 2019; p. 1–22.

24. Zhang J, Lin YJ, Lin ML, Liu JH. An effective collaborative filtering algorithm based on user preference clustering. Applied Intelligence. 2016;45(2):230–240. doi: 10.1007/s10489-015-0756-9

25. Chen C, Li DS, Lv Q, Yan JC, Chu SM, Shang L. MPMA: Mixture Probabilistic Matrix Approximation for Collaborative Filtering. In: IJCAI; 2016. p. 1382–1388.

26. Chen C, Li DS, Lv Q, Yan JC, Shang L, Chu SM. GLOMA: Embedding global information in local matrix approximation models for collaborative filtering. In: Thirty-First AAAI Conference on Artificial Intelligence; 2017.

27. Hartigan JA. Clustering Algorithms. Applied Statistics. 1975;25(1).

28. Tellaroli P, Bazzi M, Donato M, Brazzale AR, Drăghici S. Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters. PLOS ONE. 2016;11(3). doi: 10.1371/journal.pone.0152333 27015427

29. Gordon MD. User-based document clustering by redescribing subject descriptions with a genetic algorithm. Journal of the American Society for Information Science. 1991;42(5):311–322. doi: 10.1002/(SICI)1097-4571(199106)42:5%3C311::AID-ASI1%3E3.0.CO;2-J

30. Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society Series C (Applied Statistics). 1979;28(1):100–108.

31. Zheng M, Min F, Zhang HR, Chen WB. Fast Recommendations With the M-Distance. IEEE Access. 2016;4:1464–1468. doi: 10.1109/ACCESS.2016.2549182

32. Ma H, Zhou D, Liu C, Lyu MR, King I. Recommender systems with social regularization. In: Proceedings of the fourth ACM international conference on Web search and data mining. WSDM’11. Hong Kong, China; 2011. p. 287–296.

33. Konno H, Yamazaki H. MEAN-ABSOLUTE DEVIATION PORTFOLIO OPTIMIZATION MODEL AND ITS APPLICATIONS TO TOKYO STOCK MARKET. Management Science. 1991;37(5):519–531. doi: 10.1287/mnsc.37.5.519

34. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research. 2005;30(1):79–82. doi: 10.3354/cr030079

35. Levinson N. The Wiener (Root Mean Square) Error Criterion in Filter Design and Prediction. Journal of Mathematics and Physics. 1946;25(1-4):261–278. doi: 10.1002/sapm1946251261

36. Zhang HR, Min F, Wu YX, Fu ZL, Gao L. Magic barrier estimation models for recommended systems under normal distribution. Appl Intell. 2018;48(12):4678–4693. doi: 10.1007/s10489-018-1237-8

37. Xu WH, Li WT, Zhang XT. Generalized multigranulation rough sets and optimal granularity selection. Granular Computing. 2017;2(4):271–288. doi: 10.1007/s41066-017-0042-9

38. Liu Y, Liao SZ. Granularity selection for cross-validation of SVM. Information Sciences. 2017;378:475–483. doi: 10.1016/j.ins.2016.06.051

39. Zhu PF, Hu QH. Adaptive neighborhood granularity selection and combination based on margin distribution optimization. Information Sciences. 2013;249:1–12. doi: 10.1016/j.ins.2013.06.012