协同过滤算法结合相似度评估与交替最小二乘优化技术

作者：峰吹云飞_974 | 来源：互联网 | 2024-10-29 11:59

本文探讨了协同过滤算法在推荐系统中的应用，重点介绍了基于用户和基于物品的两种协同过滤方法。通过引入相似度评估技术和交替最小二乘优化技术，显著提升了推荐系统的准确性和鲁棒性。实验结果表明，该方法在处理大规模数据集时表现出色，能够有效提高用户满意度和系统性能。

一.协同过滤算法（Collaborative Filtering）

1.简介

协同过滤算法：是一种基于群体用户或者物品的经典推荐算法。分两种：

(1).通过考察具有相同爱好的用户对相同物品的评分标准进行计算。

(2).考察具有相同特质的物品从而推荐给选择了某件物品的用户。

A和B是“志同道合”的基友（相似度很高），将A喜欢的物品推荐给B是合理的

在无先验知识的前提下，根据A所喜欢物品的相似性，将相似物品推荐给A，是合理的

2.问题

基于用户，数据量大，对于通用物品往往优先推荐，对于热点物品不够准

基于物品，数据量相对小，而推荐同类物品存在用户已持有不再需要的问题

二.相似度度量

1.基于欧几里得距离的相似度计算

(1)欧几里得距离(Euclidean distance)：表示三维空间中两个点的真实距离。是一种基于用户之间直线距离的计算方式。

不同的物品或用户对应不同的坐标，特定目标定位为坐标原点。

欧几里得距离公式：

相似度：1/(d+1)

2.基于余弦角度的相似度计算

特定目标作为坐标上的点，但不是原点。夹角的大小来反映目标之间的相似度。

3.欧几里得相似度与余弦相似度的比较

distance and similarity

欧几里得相似度注重目标之间的差异。

余弦相似度更多是对目标从方向趋势上区分。

4.余弦相似度实战

/**
  * Created by Administrator on 2017/4/17.
  */

import org.apache.spark.{SparkConf, SparkContext}
import scala.collection.mutable.Map
object testVector {
  val conf = new SparkConf()
    .setMaster("local")
    .setAppName("testVector");
  val sc = new SparkContext(conf);
  //将内存数据读入Spark系统中
  val users = sc.parallelize(Array("aaa","bbb","ccc","ddd","eee"));
  val films = sc.parallelize(Array("smzdm","ylxb","znh","nhsc","fcwr"));
  //使用一个source嵌套map作为姓名，电影名和分值的评价
  var source = Map[String,Map[String,Int]]();
  //设置一个用于存放电影分的map
  val filmSource = Map[String,Int]();
  //设置电影评分
  def getSource(): Map[String,Map[String,Int]] = {
    val user1FilmSource = Map("smzdm" -> 2,"ylxb" -> 3,"znh" -> 1,"nhsc" -> 0,"fcwr" -> 1);
    val user2FilmSource = Map("smzdm" -> 1,"ylxb" -> 2,"znh" -> 2,"nhsc" -> 1,"fcwr" -> 4);
    val user3FilmSource = Map("smzdm" -> 2,"ylxb" -> 1,"znh" -> 0,"nhsc" -> 1,"fcwr" -> 4);
    val user4FilmSource = Map("smzdm" -> 3,"ylxb" -> 2,"znh" -> 0,"nhsc" -> 5,"fcwr" -> 3);
    val user5FilmSource = Map("smzdm" -> 5,"ylxb" -> 3,"znh" -> 1,"nhsc" -> 1,"fcwr" -> 2);
    source += ("aaa" -> user1FilmSource);
    source += ("bbb" -> user2FilmSource);
    source += ("ccc" -> user3FilmSource);
    source += ("ddd" -> user4FilmSource);
    source += ("eee" -> user5FilmSource);
    return source;
  }
  //两两计算分值，采用余弦相似性
  def getCollaborateSource(user1:String,user2:String):Double = {
    val user1FilmSource = source.get(user1).get.values.toVector;//获得第一个用户的评分
    val user2FilmSource = source.get(user2).get.values.toVector;//获得第二个用户的评分
    val member = user1FilmSource.zip(user2FilmSource).map(d => d._1 * d._2).reduce(_ + _).toDouble;//对分子部分进行计算
    //求出分母第一个变量值
    val temp1 = math.sqrt(user1FilmSource.map(num => {
      math.pow(num,2);
    }).reduce(_ + _));//进行叠加
    //求出分母第二个变量值
    val temp2 = math.sqrt(user2FilmSource.map(num => {
      math.pow(num,2);
    }).reduce(_ + _));//进行叠加
    val denominator = temp1 * temp2;
    return member / denominator;
  }
  def main(args: Array[String]): Unit = {
    getSource();//初始化分数
    val name = "bbb";
    users.foreach(user =>{
      println(name + "相对于" + user + "的相似性分数是：" + getCollaborateSource(name,user));
    })
  }
}

bbb相对于aaa的相似性分数是：0.7089175569585667
bbb相对于bbb的相似性分数是：1.0000000000000002
bbb相对于ccc的相似性分数是：0.8780541105074453
bbb相对于ddd的相似性分数是：0.6865554812287477
bbb相对于eee的相似性分数是：0.6821910402406466

三.交替最小二乘法（ALS）--最常用的逼近计算的一种算法。

1.最小二乘法

(1)概念：最小二乘法多项式曲线拟合，根据给定的m个点,并不要求这条曲线精确地经过这些点，而是曲线y=f(x)的近似曲线y= φ(x)。

(2)常见的曲线拟合方法：

1.使偏差绝对值之和最小

2.使偏差绝对值最大的最小

3.使偏差平方和最小

按偏差平方和最小的原则选取拟合曲线，并且采取二项式方程为拟合曲线的方法,称为最小二乘法。

(3)推导过程

1. 设拟合多项式为：

2. 各点到这条曲线的距离之和，即偏差平方和如下：

3. 为了求得符合条件的a值，对等式右边求ai偏导数，因而我们得到了：

.......

4. 将等式左边进行一下化简，然后应该可以得到下面的等式：

.......

5. 把这些等式表示成矩阵的形式，就可以得到下面的矩阵：

6. 将这个范德蒙得矩阵化简后可得到:

7. 也就是说X*B=Y，那么

，便得到了系数矩阵B ，同时，我们也就得到了拟合曲线。

(4)java实现

    public class TheleastSquareMethod {
   //定义系数
    private double[] x;
    private double[] y;
    private double[] weight;
    private int n;
    private double[] coefficient;

    /**
     * Constructor method.
     *
     * @param x
     *            Array of x
     * @param y
     *            Array of y
     * @param n
     *            The order of polynomial
     */
    public TheleastSquareMethod(double[] x, double[] y, int n) {
        if (x == null || y == null || x.length <2 || x.length != y.length
                || n <2) {
            throw new IllegalArgumentException(
                    "IllegalArgumentException occurred.");
        }
        this.x = x;
        this.y = y;
        this.n = n;
        weight = new double[x.length];
        for (int i = 0; i             weight[i] = 1;
        }
        compute();
    }

    /**
     * Constructor method.
     *
     * @param x
     *            Array of x
     * @param y
     *            Array of y
     * @param weight
     *            Array of weight
     * @param n
     *            The order of polynomial
     */
    public TheleastSquareMethod(double[] x, double[] y, double[] weight, int n) {
        if (x == null || y == null || weight == null || x.length <2
                || x.length != y.length || x.length != weight.length || n <2) {
            throw new IllegalArgumentException(
                    "IllegalArgumentException occurred.");
        }
        this.x = x;
        this.y = y;
        this.n = n;
        this.weight = weight;
        compute();
    }

    /**
     * Get coefficient of polynomial.
     *
     * @return coefficient of polynomial
     */
    public double[] getCoefficient() {
        return coefficient;
    }

    /**
     * Used to calculate value by given x.
     *
     * @param x
     *            x
     * @return y
     */
    public double fit(double x) {
        if (coefficient == null) {
            return 0;
        }
        double sum = 0;
        for (int i = 0; i             sum += Math.pow(x, i) * coefficient[i];
        }
        return sum;
    }

    /**
     * Use Newton's method to solve equation.
     *
     * @param y
     *            y
     * @return x
     */
    public double solve(double y) {
        return solve(y, 1.0d);
    }

    /**
     * Use Newton's method to solve equation.
     *
     * @param y
     *            y
     * @param startX
     *            The start point of x
     * @return x
     */
    public double solve(double y, double startX) {
        final double EPS = 0.0000001d;
        if (coefficient == null) {
            return 0;
        }
        double x1 = 0.0d;
        double x2 = startX;
        do {
            x1 = x2;
            x2 = x1 - (fit(x1) - y) / calcReciprocal(x1);
        } while (Math.abs((x1 - x2)) > EPS);
        return x2;
    }

    /*
     * Calculate the reciprocal of x.
     *
     * @param x x
     *
     * @return the reciprocal of x
     */
    private double calcReciprocal(double x) {
        if (coefficient == null) {
            return 0;
        }
        double sum = 0;
        for (int i = 1; i             sum += i * Math.pow(x, i - 1) * coefficient[i];
        }
        return sum;
    }

    /*
     * This method is used to calculate each elements of augmented matrix.
     */
    private void compute() {
        if (x == null || y == null || x.length <= 1 || x.length != y.length
                || x.length             return;
        }
        double[] s = new double[(n - 1) * 2 + 1];
        for (int i = 0; i             for (int j = 0; j                 s[i] += Math.pow(x[j], i) * weight[j];
            }
        }
        double[] b = new double[n];
        for (int i = 0; i             for (int j = 0; j                 b[i] += Math.pow(x[j], i) * y[j] * weight[j];
            }
        }
        double[][] a = new double[n][n];
        for (int i = 0; i             for (int j = 0; j                 a[i][j] = s[i + j];
            }
        }

        // Now we need to calculate each coefficients of augmented matrix
        coefficient = calcLinearEquation(a, b);
    }

    /*
     * Calculate linear equation.
     *
     * The matrix equation is like this: Ax=B
     *
     * @param a two-dimensional array
     *
     * @param b one-dimensional array
     *
     * @return x, one-dimensional array
     */
    private double[] calcLinearEquation(double[][] a, double[] b) {
        if (a == null || b == null || a.length == 0 || a.length != b.length) {
            return null;
        }
        for (double[] x : a) {
            if (x == null || x.length != a.length)
                return null;
        }

        int len = a.length - 1;
        double[] result = new double[a.length];

        if (len == 0) {
            result[0] = b[0] / a[0][0];
            return result;
        }

        double[][] aa = new double[len][len];
        double[] bb = new double[len];
        int posx = -1, posy = -1;
        for (int i = 0; i <= len; i++) {
            for (int j = 0; j <= len; j++)
                if (a[i][j] != 0.0d) {
                    posy = j;
                    break;
                }
            if (posy != -1) {
                posx = i;
                break;
            }
        }
        if (posx == -1) {
            return null;
        }

        int count = 0;
        for (int i = 0; i <= len; i++) {
            if (i == posx) {
                continue;
            }
            bb[count] = b[i] * a[posx][posy] - b[posx] * a[i][posy];
            int count2 = 0;
            for (int j = 0; j <= len; j++) {
                if (j == posy) {
                    continue;
                }
                aa[count][count2] = a[i][j] * a[posx][posy] - a[posx][j]
                        * a[i][posy];
                count2++;
            }
            count++;
        }

        // Calculate sub linear equation
        double[] result2 = calcLinearEquation(aa, bb);

        // After sub linear calculation, calculate the current coefficient
        double sum = b[posx];
        count = 0;
        for (int i = 0; i <= len; i++) {
            if (i == posy) {
                continue;
            }
            sum -= a[posx][i] * result2[count];
            result[i] = result2[count];
            count++;
        }
        result[posy] = sum / a[posx][posy];
        return result;
    }

    public static void main(String[] args) {
       TheleastSquareMethod eastSquareMethod = new TheleastSquareMethod(
                new double[] { 0.5, 1.0, 1.5, 2.0, 2.5, 3.0 }, new double[] {
                        1.75, 2.45, 3.81, 4.8, 7.0, 8.6 }, 3);
        /*double[] coefficients = eastSquareMethod.getCoefficient();
        for (double c : coefficients) {
            System.out.println(c);
        }*/

        System.out.println(eastSquareMethod.fit(4));

        TheleastSquareMethod eastSquareMethod2 = new TheleastSquareMethod(
                new double[] { 0.5, 1.0, 1.5, 2.0, 2.5, 3.0 }, new double[] {
                        1.75, 2.45, 3.81, 4.8, 7.0, 8.6 }, 2);
        System.out.println(eastSquareMethod2.solve(100));

    }
}
三.基于ALS算法的协同过滤

(1)数据集

(2)建立ALS数据模型

MLlib中ALS算法固定格式：case class Rating(user: Int,product: Int,rating: Double)

ALS.tran码源：

def train(
           ratings: RDD[Rating];
           rank: Int;
           iterations: Int;
           lambda: Double;
           blocks: Int;
           seed: Long;
         ): MatrixFactorizatiOnModel= {
  new ALS(blocks,rank,iterations,lambda,false,1.0,seed).run(ratings)

(3)基于ALS算法的协同过滤推荐

import org.apache.spark.mllib.recommendation.{ALS, Rating}
import org.apache.spark.{SparkConf, SparkContext}
object testVector {
  def main(args: Array[String]): Unit = {
    var cOnf= new SparkConf().setMaster("local").setAppName("testVector");//设置环境变量
    val sc = new SparkContext(conf);//实例化环境
    val data = sc.textFile("kimi.txt");//设置数据集
    val ratings = data.map(_.split(' ')match{//处理数据
      case Array(user,item,rate) => //将数据集转化
        Rating(user.toInt,item.toInt,rate.toDouble);//将数据集转化成专用rating
    })
    val rank = 2;//设置隐藏因子
    val numIteratiOns= 2;//设置迭代次数
    val model = ALS.train(ratings,rank,numIterations,0.01);//进行模型训练
    var rs = model.recommendProducts(2,1);//为用户2推荐一个商品
    rs.foreach(println);//打印结果
  }
}

/Rating(2,15,3.9713808775549495)，为用户 2 推荐一个编号 15 的商品，预测评分 3.97 与实际的 4 接近．

推荐阅读

case
Scala 实现 UTF-8 编码属性文件读取与克隆

本文介绍如何使用 Scala 以 UTF-8 编码方式读取属性文件，并实现属性文件的克隆功能。通过这种方式，可以确保配置文件在多线程环境下的一致性和高效性。 ... [详细]

蜡笔小新 2024-12-26 08:25:19
cmd
编写有趣的VBScript恶作剧脚本

本文将介绍如何编写一些有趣的VBScript脚本，这些脚本可以在朋友之间进行无害的恶作剧。通过简单的代码示例，帮助您了解VBScript的基本语法和功能。 ... [详细]

蜡笔小新 2024-12-28 09:46:23
char
使用Objective-C和dispatch库实现并发素数计算

本文介绍如何使用Objective-C结合dispatch库进行并发编程，以提高素数计数任务的效率。通过对比纯C代码与引入并发机制后的代码，展示dispatch库的强大功能。 ... [详细]

蜡笔小新 2024-12-28 08:44:35
bash
网络链路质量监控：Smokeping部署与配置

本文详细介绍了如何在Linux系统上安装和配置Smokeping，以实现对网络链路质量的实时监控。通过详细的步骤和必要的依赖包安装，确保用户能够顺利完成部署并优化其网络性能监控。 ... [详细]

蜡笔小新 2024-12-27 19:31:05
list
Java并发编程：LinkedBlockingQueue的实际应用

本文介绍了Java并发库中的阻塞队列（BlockingQueue）及其典型应用场景。通过具体实例，展示了如何利用LinkedBlockingQueue实现线程间高效、安全的数据传递，并结合线程池和原子类优化性能。 ... [详细]

蜡笔小新 2024-12-27 18:51:49
list
Python 的 10 个开发技巧！太实用了

1.如何在运行状态查看源代码？查看函数的源代码，我们通常会使用IDE来完成。比如在PyCharm中，你可以Ctrl+鼠标点击进入函数的源代码。那如果没有IDE呢？当我们想使用一个函 ... [详细]

蜡笔小新 2024-12-27 18:36:54
list
java编写的简易计算器

主要用了2个类来实现的，话不多说，直接看运行结果，然后在奉上源代码1.Index.javaimportjava.awt.Color;im ... [详细]

蜡笔小新 2024-12-27 18:18:10
list
使用 Azure Service Principal 和 Microsoft Graph API 获取 AAD 用户列表

本文介绍了一段通用代码示例，该代码不仅能够操作 Azure Active Directory (AAD)，还可以通过 Azure Service Principal 的授权访问和管理 Azure 订阅资源。Azure 的架构可以分为两个层级：AAD 和 Subscription。 ... [详细]

蜡笔小新 2024-12-27 16:07:12
controller
深入解析 MVC 源码：ParameterDescriptor 与 Action 方法参数绑定

在前两篇文章中，我们探讨了 ControllerDescriptor 和 ActionDescriptor 这两个描述对象，分别对应控制器和操作方法。本文将基于 MVC3 源码进一步分析 ParameterDescriptor，即用于描述 Action 方法参数的对象，并详细介绍其工作原理。 ... [详细]

蜡笔小新 2024-12-27 15:26:10
list
Akka BackoffSupervisor的深入解析与实践

本文详细介绍了Akka中的BackoffSupervisor机制，探讨其在处理持久化失败和Actor重启时的应用。通过具体示例，展示了如何配置和使用BackoffSupervisor以实现更细粒度的异常处理。 ... [详细]

蜡笔小新 2024-12-27 15:04:09
list
Unity 客户端框架设计：UI管理系统的构建

本文详细介绍了如何构建一个高效的UI管理系统，集中处理UI页面的打开、关闭、层级管理和页面跳转等问题。通过UIManager统一管理外部切换逻辑，实现功能逻辑分散化和代码复用，支持多人协作开发。 ... [详细]

蜡笔小新 2024-12-27 10:28:40
list
寻找满足特定条件的整数N的最大和(a+b)

本文探讨了如何在给定整数N的情况下，找到两个不同的整数a和b，使得它们的和最大，并且满足特定的数学条件。 ... [详细]

蜡笔小新 2024-12-26 19:26:18
list
Deepin系统下MySQL 5.7安装指南

本文详细记录了在基于Debian的Deepin 20操作系统上安装MySQL 5.7的具体步骤，包括软件包的选择、依赖项的处理及远程访问权限的配置。 ... [详细]

蜡笔小新 2024-12-28 10:48:41
list
Handling Null Object Encoding in OAuth 1.0a API Implementation

Explore a common issue encountered when implementing an OAuth 1.0a API, specifically the inability to encode null objects and how to resolve it. ... [详细]

蜡笔小新 2024-12-28 08:54:34
input
深度学习理论解析与理解

梯度方向指示函数值增加的方向，由各轴方向的偏导数综合而成，其模长表示函数值变化的速率。本文详细探讨了导数、偏导数、梯度等概念，并结合Softmax函数、卷积神经网络（CNN）中的卷积计算、权值共享及池化操作进行了深入分析。 ... [详细]

蜡笔小新 2024-12-26 18:23:11

峰吹云飞_974

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章