TheUniversalRecommender

作者：摇荡的风车 | 来源：互联网 | 2023-08-01 11:36

TheUniversalRecommenderTheUniversalRecommender(UR)isanewtypeofcollaborativefilteringrecomm

The Universal Recommender

The Universal Recommender (UR) is a new type of collaborative filtering recommender based on an algorithm that can use data from a wide variety of user taste indicators—it is called the Correlated Cross-Occurrence algorithm. Unlike the matrix factorization embodied in things like MLlib&＃8217;s ALS, The UR&＃8217;s CCO algorithm is able to ingest any number of user actions, events, profile data, and contextual information. It then serves results in a fast and scalable way. It also supports item properties for filtering and boosting recommendations and can therefor be considered a hybrid collaborative filtering and content-based recommender.

The use of multiple types of data fundamentally changes the way a recommender is used and, when employed correctly, will provide a significant increase in quality of recommendations vs. using only one user event. Most recommenders, for instance, can only use &＃8220;purchase&＃8221; events. Using all we know about a user and their context allows us to much better predict their preferences.

用户单一行为举例

User	Action	Item
u1	view	t1
u1	view	t2
u1	view	t3
u1	view	t5
u2	view	t1
u2	view	t3
u2	view	t4
u2	view	t5
u3	view	t2
u3	view	t3
u3	view	t5

整理后得到以下关系：
u1=> [ t1, t2, t3, t5 ]
u2=> [ t1, t3, t4, t5 ]
u3=> [ t2, t3,t5 ]

个性化推荐的一般模型

$r=(P^{T}P)h_{p}$

$r=rcommendations$
$h_{p}$= 某一用户的历史动作(比如购买动作)
- $h_{u1}=\begin{bmatrix}1 & 1 & 1 & 0 & 1\end{bmatrix}$
- $h_{u2}=\begin{bmatrix}1 & 0 & 1 & 1 & 1\end{bmatrix}$
- …
- 针对某个item的动作在史来情况下是有可能重复的，如果表达？？？
  $h_{u1}=\begin{bmatrix}1 & 2 & 1 & 0 & 1\end{bmatrix}$ 2代表了购买item2两次
  如果这么表示，那么问题来了，近期的动作和久远的动作，意义是不同的。偶尔受伤买个了拐，是不能根据这个动作就推荐拐的，LLR是不是可以消减这类情况呢？
$P$ = 历史所有用户的主动作(主事件)构成的矩阵
- primary action：主动作在COO模型下才有意义，单一指标推荐，就无所谓主动作了。
- 行代表矩阵, 列代表items
  $P=\begin{bmatrix}1 & 1 & 1& 0 & 1\\ 1 & 0 & 1 & 1 &1 \\ 0& 1& 1 & 0 & 1\end{bmatrix}$
$(P^{T}P)$ = compares column to column using log-likelihood based correlation test
- $P=\begin{bmatrix}1 & 1 & 1& 0 & 1\\ 1 & 0 & 1 & 1 &1 \\ 0& 1& 1 & 0 & 1\end{bmatrix}$ $P^{T}=\begin{bmatrix}1 & 1 &0 \\ 1& 0 & 1\\ 1& 1 &0 \\ 0&1 & 1\\ 1& 1 &1 \end{bmatrix}$ $P^{T}\cdot P=\begin{bmatrix}- & 1 & 2 & 1 & 2\\ 1& &＃8211; & 2 &0 &2\\2& 2& &＃8211; & 1 &3 \\1& 0 & 1 & &＃8211; &1\\2&2&3&1 & -\end{bmatrix}$ $P^{T}代表矩阵转置$
- 其中$P^{T}\cdot P$ 中元素$C_{3,5}=3$ 代表有三个用户浏览$t_{3}$ 的用户同时浏览了$t_{5}$

COOCCURRENCE WITH LLR

Let&＃8217;s call ($P^{T}P$) an indicator matrix for some primary action like purchase
- Rows = items, columns = items, element = similarity/correlation score
The score is row compared to column using a &＃8220;similarity&＃8221; or &＃8220;correlation&＃8221; metric
Log-likelihood Ratio(LLR对数似然比) finds important/correlating cooccurrences and filters out the rest —a major improvement in quality over simple cooccurrences or other similarity metrics.
根据两个事件的共现关系计算LLR值，用于衡量两个事件的关联度：
$P^{T}\cdot P=\begin{bmatrix}- & 1 & 2 & 1 & 2\\ 1& &＃8211; & 1 &1 &1\\2& 1& &＃8211; & 1 &2 \\1& 1 & 1 & &＃8211; &1\\2&1&2&1 & -\end{bmatrix}\overset{LLR}{\rightarrow}\begin{bmatrix}-& 1.05 & 3.82 & 1.05 &3.82 \\ 1.05 & &＃8211; &1.05 &1.05 &1.05 \\ 3.82& 1.05 & &＃8211; & 1.05&3.82 \\1.05&1.05 &1.05 & &＃8211; &1.05 \\3.82& 1.05 & 3.82 & 1.05&-\end{bmatrix}$
注意：我们发现每个用户都有点击广告a4，但a4的LLR值却是0，也就是a4跟任何帖子都没有关联，这看上去很奇怪。但其实这是LLR的特点，LLR对于热门事件有很大的惩罚，简单来说它认为浏览t1和点击广告a4这两个事件共同发生的原因不是因为浏览t1和点击a4有关联，而仅仅只是因为点击a4本身是一个高频发生的事件。
Experiments on real-world data show LLR is significantly better than ohter similarity metrics

LLR AND SIMILARITY METRICS PRECISION (MAP@K)

《The Universal Recommender》

FROM COOCCURRENCE TO RECOMMENDATION

$r=(P^{t}P)h_{p}$

This actually means to take the user&＃8217;s history $h_{p}$ and compare it to rows of the cooccurrence matrix $(P^{t}P)$
$h_{p}$ =P动作历史行为
TF-IDF weigthing of cooccurrence would be nice to mitigate the undue influence of popular items
Find items nearest to the user&＃8217;s history
Sort these by similarity strength and keep only the highest — you have recommendations
Sound familair? Find the k-nearest neighbors using cosine and TF-IDF?
That&＃8217;s exactly what a search engine does!

USER HISTORY + COOCCURRENCES + SEARCH = RECOMMENDATIONS

$r=(P^{t}P)h_{p}$

The final calculation uses $h_{p}$ as the query on the Cooccurrence matrix $(P^{T}P)$ , returns a ranked set of items
Query is a &＃8220;similarity&＃8221; query, not relational or key based fetch
Uses Search Engine as Cosine-based K-Nearest Neighbor(KNN) Engine with norms and TF-IDF weighting
Highly optimized for serving these queries in realtime
Serveral (Solr,Elasticsearch) have High Availability , massively scalable clustered auto-sharding features like the best of NoSQL DBs

UR的突破性思想

几乎所有的协同过滤推荐仅仅根据一个偏好指标计算所得：
$r=(P^{t}P)h_{p}$
基于 CCO 的协同过滤推荐可以表示为:
$r=(P^{T}P)h_{p}+(P^{T}V)h_{v}+(P^{T}C)h_{c}+…$
- $(P^{T}P)$ =P与P的关联矩阵 $ (P^{T}V)$ =P与V的关联矩阵
- $(P^{T}V)h_{v}+(P^{T}C)h_{c}$ 代表了CROSS-OCCURRENCE
- $h_{p}$ =P动作历史行为 $h_{v}$ =V动作历史行为
基于COO推荐，只要我们能够想到的用户指标都可以提升推荐效果—购买行为，观看行为，类别偏好，位置偏好，设备偏好，用户性别&＃8230;

CORRELATED CROSS-OCCURRENCE: SO WHAT?

Comparting the history of the primary action to other actions finds actions that lead to the one you want to recommend
Given strong data about user preferences on a general population we can also use
- items clicked
- terms searched
- categories viewed
- items shared
- people followed
- items disliked (yes dislikes may predict likes)
- location
- device perference，设备偏好
- gender
- age bracket，年龄段， people in the 10~20 age bracket
Virtually any anything we know about the population can be tested for correlation and used to predict a particular users preferences

CORRELATED CROSS-OCCURRENCE; ADDING CONTENT MODELS

Collaborative Topic Filtering
- Use Latent Dirichlet Allocation(LDA) to model topics directly from the textual content
- Calculate based on Word2Vec type word vectors instead of bag-of-words analysis to boost quality
- Create cross-occurrence indicators from topics the user has preferred
- Repeat periodically
Entity Preferences:
- Use a Named Entity Recognition(NER) system to find entities in textual content
- Create cross-occurrence indicators for these entities
Entities and Topics are long lived and richly describle user interests, these are very good for use in the Universal Recommender

THE UNIVERSAL RECOMMENDER ADDING CONTENT-BASED RECS

Indicators can also be based on content similarity

$r=(TT^{t})h_{t}+I\cdot L$

$(TT^{t})$ is a calculation that compares every 2 documents to each other and finds the most similar—based upon content alone

INDICATOR TYPES

Cooccurences
- Find the best indicator of a user preference for the item type to be recommended: examples are &＃8220;buy&＃8221;, &＃8220;read&＃8221;, &＃8220;video_watch&＃8221;, &＃8220;share&＃8221;, &＃8220;follow&＃8221;, &＃8220;like&＃8221;
Cross-occurrence
- Item metadata as &＃8220;user&＃8221; preference, for example : treat item category as a user category-preferences
- Calculated from user actions on any data that may give information about user— category-preferences, search terms, gender, location
- Create with Mahout-Samsara SimilarityAnalysis.cooccurences
Content or metadata
- Content text, tags, categories, description text , anything describing an item
- Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity
Intrinsic
- Popularity rank, geo-location, anyting describing an item
- Some may be derived from usage data like popularity rank , or hotness
- Is a known or specially calculated property of the item

THE UNIVERSAL RECOMMENDER AKA THE WHOLE ENCHILADA

&＃8220;Universal&＃8221; means one query on all indicators at once

$r=(P^{T}P)h_{p}+(P^{T}V)h_{v}+(P^{T}C)h_{c}+…(TT^{T}h_{t})+I\cdot L$

Unified query:

purchase-correlator: users-history-of-purchase
view-correlator: user-history-of-views
category-correlator: user-history-of-categories-viewed
tags-correlator: user-history-of-purchases
geo-location-correlator: user-location
&＃8230;

Once indicators are indexed as search fields this entire equation is a single query

Fast!

THE UNIVERSAL RECOMMENDER: BETTER USER COVERAGE

Any number of user actions — entire user clickstream
Metadata—from user proflie or items
Context— on-site, time, location
Content— unstructured text or semi-structured categorical
Mixes any number of &＃8220;indicators&＃8221; to increase quality or tune to specific context
Solution to the &＃8220;cold-start&＃8221; problem—items with too short a lifespan or new users with no history
how to solve ??
Can recommend to new users using realtime history
Can use new interaction data from any user in realtime
95% implemented in Universal Recommender
v0.3.0—most current release

POLISH THE APPLE

Dithering for auto-optimize via explore-exploit:
Randomize some returned recs, if they are acted upon they become part of the new training data and are more likely to be recommended in the future
Visibility control:
- Don&＃8217;t show dups, blacklist items already shown
- Filter items the user has already seen
Zero-downtime Deployment: deploy prediction server once the hot-swap new index when ready
Generate some intrinsic indicators like hot, populay— helps solve the &＃8220;cold-start&＃8221; problem
Asymmetric train vs query—query with most recent user data, train on all historical data

基于PredictionIO的UR推荐架构

《The Universal Recommender》

参考

http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf
http://actionml.com/docs/ur
http://hejunhao.me/archives/1083

推荐阅读

java
《树莓派开发实战（第2版）》——2.2　创建模型和运行推理：重回Hello World

本节书摘来异步社区《概率编程实战》一书中的第2章，第2.2节，作者：【美】AviPfeffer（艾维费弗）&# ... [详细]

蜡笔小新 2023-10-17 19:56:35
ip
【机器学习】生成式对抗网络模型综述

生成式对抗网络模型综述摘要生成式对抗网络模型(GAN)是基于深度学习的一种强大的生成模型，可以应用于计算机视觉、自然语言处理、半监督学习等重要领域。生成式对抗网络 ... [详细]

蜡笔小新 2023-12-14 17:51:18
ip
生成对抗式网络GAN及其衍生CGAN、DCGAN、WGAN、LSGAN、BEGAN介绍

一、GAN原理介绍学习GAN的第一篇论文当然由是IanGoodfellow于2014年发表的GenerativeAdversarialNetworks（论文下载链接arxiv：[h ... [详细]

蜡笔小新 2023-12-14 11:39:45
blob
sklearn数据集库中的常用数据集类型介绍

本文介绍了sklearn数据集库中常用的数据集类型，包括玩具数据集和样本生成器。其中详细介绍了波士顿房价数据集，包含了波士顿506处房屋的13种不同特征以及房屋价格，适用于回归任务。 ... [详细]

蜡笔小新 2023-12-13 17:45:15
schema
Android开发实现的计时器功能示例

本文分享了Android开发实现的计时器功能示例，包括效果图、布局和按钮的使用。通过使用Chronometer控件，可以实现计时器功能。该示例适用于Android平台，供开发者参考。 ... [详细]

蜡笔小新 2023-12-12 22:51:19
spring
SpringBoot整合SpringSecurity+JWT实现单点登录

SpringBoot整合SpringSecurity+JWT实现单点登录,Go语言社区,Golang程序员人脉社 ... [详细]

蜡笔小新 2023-12-11 08:21:41
input
Spark实现高斯朴素贝叶斯模型的低配版

本文介绍了使用Spark实现低配版高斯朴素贝叶斯模型的原因和原理。随着数据量的增大，单机上运行高斯朴素贝叶斯模型会变得很慢，因此考虑使用Spark来加速运行。然而，Spark的MLlib并没有实现高斯朴素贝叶斯模型，因此需要自己动手实现。文章还介绍了朴素贝叶斯的原理和公式，并对具有多个特征和类别的模型进行了讨论。最后，作者总结了实现低配版高斯朴素贝叶斯模型的步骤。 ... [详细]

蜡笔小新 2023-12-10 21:42:37
ip
各种字符编码方式详解及由来(ANSI,UNICODE,UTF8,GB2312,GBK)

转自：http:www.phpweblog.netfuyongjiearchive200903116374.html一直对字符的各种编码方式懵懵懂懂，什 ... [详细]

蜡笔小新 2023-10-17 10:02:49
function
加密、解密、揭秘

谈PHP中信息加密技术同样是一道面试答错的问题，面试官问我非对称加密算法中有哪些经典的算法？当时我愣了一下，因为我把非对称加密与单项散列加 ... [详细]

蜡笔小新 2023-10-16 20:20:32
blob
RT3070无线网卡STA模式并使开发板接入Wifi上网

开发板：FL2440内核：linux3.0无线网卡：RT3070RT3070的工作模式分为STA(station)模式、SoftAP(AccessPoint)模式两种。STA ... [详细]

蜡笔小新 2023-10-16 19:01:18
java
MybatisPlus入门系列(13) MybatisPlus之自定义ID生成器

数据库ID生成策略在数据库表设计时，主键ID是必不可少的字段，如何优雅的设计数据库ID，适应当前业务场景，需要根据需求选取 ... [详细]

蜡笔小新 2023-10-16 16:58:54
input
程度|也就是_论文精读：Neural Architecture Search without Training

篇首语：本文由编程笔记#小编为大家整理，主要介绍了论文精读：NeuralArchitectureSearchwithoutTraining相关的知识，希望对你有一定的参考价值。 ... [详细]

蜡笔小新 2023-10-16 16:33:20
input
【送书福利】图论算法：如何找到最适合自己的另一半？

文末抽奖送书3本什么是算法？每当有人问我这样的问题，我总会引用下面这个例子。假如你是一个媒人，有若干名单身男子登门求助，还有 ... [详细]

蜡笔小新 2023-10-16 15:26:39
ip
python开方运算符_OpenCVPython常用图像运算：加减乘除幂开方对数及位运算

☞░前往老猿Python博客https:blog.csdn.netLaoYuanPython░一、引言在写该文之前，老猿就图像的一些运算已经单独边学边发了࿰ ... [详细]

蜡笔小新 2023-10-16 11:43:28
java
透明木头问世！“木头大王”胡良兵再发顶刊，已成立公司加速落地69

道翰天琼认知智能机器人平台API接口大脑为您揭秘。木材是人类最古老的建筑材料之一，也是一种绿色节能材料，我们对其外观的认知可谓根深蒂固。如今，随着透明木材的问世，这一观感将被颠覆。 ... [详细]

蜡笔小新 2023-10-14 19:54:28

摇荡的风车

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章