This commit is contained in:
Michael Zhang 2021-09-06 00:34:15 -05:00
parent e228e16f22
commit ac61899f6e
Signed by: michael
GPG key ID: BDA47A31A3C8EE6B
27 changed files with 363140 additions and 0 deletions

4
README.md Normal file
View file

@ -0,0 +1,4 @@
recommender systems
===
homework assignments from the coursera course on recommender systems

1
nonpers-assignment/.gitignore vendored Normal file
View file

@ -0,0 +1 @@
.gradle

View file

@ -0,0 +1,216 @@
# Non-Personalized Recommender Assignment
In this assignment, you will implement some non-personalized recommenders. In particular, you will
implement raw and damped item mean recommenders and simple and advanced association rule
recommenders.
You will implement these recommenders in the LensKit toolkit.
## Downloads and Resources
- Project template (from Coursera)
- [LensKit for Teaching website](http://mooc.lenskit.org) (links to relevant documentation)
- [JavaDoc for included code](http://mooc.lenskit.org/assignments/nonpers/javadoc/)
- [Fastutil API docs](http://fastutil.di.unimi.it/docs/) documents the Fastutil optimized data
structure classes that are used in portions of LensKit.
The project template contains support code, the build file, and the input data that you will use.
## Input Data
The input data contains the following files:
- `ratings.csv` contains user ratings of movies
- `movies.csv` contains movie titles
- `movielens.yml` is a LensKit data manifest that describes the other input files
## Getting Started
To get started with this assignment, unpack the template and import it in to your IDE as a Gradle
project. The assignment video demonstrates how to do this in IntelliJ IDEA.
## Mean-Based Recommendation
The first two recommenders you will implement will recommend items with the highest average rating.
With LensKit's scorer-model-builder architecture, you will just need to write the recommendation
logic once, and you will implement two different mechanisms for computing item mean ratings.
You will work with the following classes:
- `MeanItemBasedItemRecommender` (the *item recommender*) computes top-*N* recommendations based
on mean ratings. You will implement the logic to compute such recommendation lists.
- `ItemMeanModel` is a *model class* that stores precomputed item means. You will not need to
modify this class, but you will write code to construct instances of it and use it in your
item recommender implementation.
- `ItemMeanModelProvider` computes item mean ratings from rating data and constructs the model.
It computes raw means with no damping.
- `DampedItemMeanModelProvider` is an alternate builder for item mean models that computes
damped means instead of raw means. It takes the damping term as a parameter. The configuration
file we provide you uses a damping term of 5.
There are `// TODO` comments in all places where you need to write new code.
### Computing Item Means
Modify the `ItemMeanModelProvider` class to compute the mean rating for each item.
### Recommending Items
Modify the `MeanItemBasedItemRecommender` class to compute recommendations based on item mean
ratings. For this, you need to:
1. Obtain the mean rating for each item
2. Order the items in decreasing order
3. Return the *N* highest-rated items
### Computing Damped Item Means
Modify the `DampedItemMeanModelProvider` class to compute the damped mean rating for each item.
This formula uses a damping factor $\alpha$, which is the number of 'fake' ratings at the global
mean to assume for each item. In the Java code, this is available as the field `damping`.
The damped mean formula, as you may recall, is:
$$s(i) = \frac{\sum_{u \in U_i} r_{ui} + \alpha\mu}{|U_i| + \alpha}$$
where $\mu$ is the *global* mean rating.
### Example Outputs
To help you see if your output is correct, we have provided the following example correct values:
| ID | Title | Mean | Damped Mean |
| :-: | :---- | :--: | :---------: |
| 2959 | *Fight Club* | 4.259 | 4.252 |
| 1203 | *12 Angry Men* | 4.246 | 4.227 |
## Association Rules
In the second part of the assignment, you will implement two versions of an association rule
recommender.
The association rule implementation consists of the following code:
- `AssociationItemBasedItemRecommender` recommends items using association rules. Unlike the mean
recommenders, this recommender uses a *reference item* to compute the recommendations.
- `AssociationModel` stores the association rule scores between pairs of items. You will not need
to modify this class.
- `BasicAssociationModelProvider` computes an association rule model using the basic association
rule formula ($P(X \wedge Y) / P(X)$).
- `LiftAssociationModelProvider` computes an association rule model using the lift formula ($P(X \wedge Y) / P(X) P(Y)$).
### Computing Association Scores
Like with the mean-based recommender, we pre-compute product association scores and store them in
a model before recommendation. We compute the scores between *all pairs* of items, so that the
model can be used to score any item. When computing a single recommendation from the command line,
this does not provide much benefit, but is useful in the general case so that the model can be used
to very quickly compute many recommendations.
The `BasicAssociationModelProvider` class computes the association rule scores using the following
formula:
$$P(i|j) = \frac{P(i \wedge j)}{P(j))} = \frac{|U_i \cap U_j|/|U|}{|U_j|/|U|}$$
In this case, $j$ is the *reference* item and $i$ is the item to be scored.
We estimate probabilities by counting: $P(i)$ is the fraction of users in the system
who purchased item $i$; $P(i \wedge j)$ is the fraction that purchased both $i$ and $j$.
**Implement the association rule computation in this class.**
### Computing Recommendations
Implement the recommendation logic in `AssociationItemBasedItemRecommender` to recommend items
related to a given reference item. As with the mean recommender, it should compute the top *N*
recommendations and return them.
### Computing Advanced Association Rules
The `LiftAssociationModelProvider` recommender uses the *lift* metric that computes how
much more likely someone is to rate a movie $i$ when they have rated $j$ than they would have if we do not know anything about whether they have rated $j$:
$$s(i|j) = \frac{P(j \wedge i)}{P(i) P(j)}$$
### Example Outputs
Following is the correct output for the basic association rules with reference item 260 (*Star Wars*), as generated with `./gradlew runBasicAssoc -PreferenceItemm=260`:
2571 (Matrix, The (1999)): 0.916
1196 (Star Wars: Episode V - The Empire Strikes Back (1980)): 0.899
4993 (Lord of the Rings: The Fellowship of the Ring, The (2001)): 0.892
1210 (Star Wars: Episode VI - Return of the Jedi (1983)): 0.847
356 (Forrest Gump (1994)): 0.843
5952 (Lord of the Rings: The Two Towers, The (2002)): 0.841
7153 (Lord of the Rings: The Return of the King, The (2003)): 0.830
296 (Pulp Fiction (1994)): 0.828
1198 (Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)): 0.791
480 (Jurassic Park (1993)): 0.789
And lift-based association rules for item 2761 (*The Iron Giant*):
631 (All Dogs Go to Heaven 2 (1996)): 4.898
2532 (Conquest of the Planet of the Apes (1972)): 4.810
3615 (Dinosaur (2000)): 4.546
1649 (Fast, Cheap & Out of Control (1997)): 4.490
340 (War, The (1994)): 4.490
1016 (Shaggy Dog, The (1959)): 4.490
2439 (Affliction (1997)): 4.490
332 (Village of the Damned (1995)): 4.377
2736 (Brighton Beach Memoirs (1986)): 4.329
3213 (Batman: Mask of the Phantasm (1993)): 4.317
## Running your code
The Gradle build file we have provided is set up to automatically run all four of your recommenders.
The following Gradle targets will do this:
- `runMean` runs the raw mean recommender
- `runDampedMean` runs the damped mean recommender
- `runBasicAssoc` runs the basic association rule recommender
- `runLiftAssoc` runs the advanced (lift-based) association rule recommender
You can run these using the IntelliJ Gradle runner (open the Gradle panel, browse the tree to find
a task, and double-click it), or from the command line:
./gradlew runMean
The association rule recommenders can also take the reference item ID on the command line as a
`referenceItem` parameter. For example:
./gradlew runLiftAssoc -PreferenceItem=1
The IntelliJ Run Configuration dialog will allow you to specify additional script parameters to
the Gradle invocation.
### Debugging
If you run the Gradle tasks using IntelliJ's Gradle runner, you can run them under the debugger to debug your code.
The Gradle file also configures LensKit to write log output to log files under the `build`
directory. If you use the SLF4J logger (the `logger` field on the classes we provide) to emit debug
messages, you can find them there when you run one of the recommender tasks such as `runDampedMean`.
## Submitting
You will submit a compiled `jar` file containing your solution. To prepare your project for
submission, run the Gradle `prepareSubmission` task:
./gradlew prepareSubmission
This will create file `nonpers-submission.jar` under `build/distributions` that contains your final
solution code in a format the grader will understand. Upload this `jar` file to the Coursera
assignment grader.
## Grading
Your grade for each part will be based on two components:
- Outputting items in the correct order: 75%
- Computing correct scores for items (within an error tolerance): 25%
The parts themselves are weighted equally.

View file

@ -0,0 +1,87 @@
apply plugin: 'java'
ext.lenskitVersion = '3.0-M1'
if (!hasProperty('dataDir')) {
ext.dataDir = 'data'
}
sourceCompatibility = 1.7
apply from: "$rootDir/gradle/repositories.gradle"
dependencies {
compile "org.lenskit:lenskit-core:$lenskitVersion"
runtime "org.lenskit:lenskit-cli:$lenskitVersion"
}
dependencies {
testCompile group: 'junit', name: 'junit', version: '4.11'
}
task runMean(type: JavaExec, group: 'run') {
description "Run the simple mean recommender."
classpath sourceSets.main.runtimeClasspath
main 'org.lenskit.cli.Main'
args '--log-file', file("$buildDir/recommend-mean.log"), '--log-file-level', 'DEBUG'
args 'global-recommend'
args '--data-source', "$dataDir/movielens.yml"
args '-c', file('etc/mean.groovy')
args '-n', 10
if (project.hasProperty('lenskit.maxMemory')) {
maxHeapSize project.getProperty('lenskit.maxMemory')
}
}
task runDampedMean(type: JavaExec, group: 'run') {
description "Run the damped mean recommender."
mustRunAfter runMean
classpath sourceSets.main.runtimeClasspath
main 'org.lenskit.cli.Main'
args '--log-file', file("$buildDir/recommend-damped-mean.log"), '--log-file-level', 'DEBUG'
args 'global-recommend'
args '--data-source', "$dataDir/movielens.yml"
args '-c', file('etc/damped-mean.groovy')
if (project.hasProperty('lenskit.maxMemory')) {
maxHeapSize project.getProperty('lenskit.maxMemory')
}
}
task runBasicAssoc(type: JavaExec, group: 'run') {
description "Run the damped mean recommender."
mustRunAfter runDampedMean
classpath sourceSets.main.runtimeClasspath
main 'org.lenskit.cli.Main'
args '--log-file', file("$buildDir/recommend-basic-assoc.log"), '--log-file-level', 'DEBUG'
args 'global-recommend'
args '--data-source', "$dataDir/movielens.yml"
args '-c', file('etc/simple-assoc.groovy')
args findProperty('referenceItem') ?: 260
if (project.hasProperty('lenskit.maxMemory')) {
maxHeapSize project.getProperty('lenskit.maxMemory')
}
}
task runLiftAssoc(type: JavaExec, group: 'run') {
description "Run the damped mean recommender."
classpath sourceSets.main.runtimeClasspath
mustRunAfter runBasicAssoc
main 'org.lenskit.cli.Main'
args '--log-file', file("$buildDir/recommend-lift-assoc.log"), '--log-file-level', 'DEBUG'
args 'global-recommend'
args '--data-source', "$dataDir/movielens.yml"
args '-c', file('etc/lift-assoc.groovy')
args findProperty('referenceItem') ?: 2761
if (project.hasProperty('lenskit.maxMemory')) {
maxHeapSize project.getProperty('lenskit.maxMemory')
}
}
task runAll(group: 'run') {
dependsOn runMean, runDampedMean
dependsOn runBasicAssoc, runLiftAssoc
}
task prepareSubmission(type: Copy) {
from jar
into distsDir
rename(/-assignment/, '-submission')
}

View file

@ -0,0 +1,28 @@
ratings:
type: textfile
file: ratings.csv
format: csv
entity_type: rating
header: true
movies:
type: textfile
file: movies.csv
format: csv
entity_type: item
header: true
columns: [id, name]
tags:
type: textfile
file: tags.csv
format: csv
entity_type: item-tag
header: true
columns:
- name: item
type: long
- name: user
type: long
- name: tag
type: string
- name: timestamp
type: long

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,14 @@
import org.lenskit.api.ItemBasedItemRecommender
import org.lenskit.baseline.MeanDamping
import org.lenskit.mooc.nonpers.mean.DampedItemMeanModelProvider
import org.lenskit.mooc.nonpers.mean.ItemMeanModel
import org.lenskit.mooc.nonpers.mean.MeanItemBasedItemRecommender
// set up the recommender
bind ItemBasedItemRecommender to MeanItemBasedItemRecommender
// this time, we will use the damped mean model
bind ItemMeanModel toProvider DampedItemMeanModelProvider
// use a mean damping of 5
set MeanDamping to 5

View file

@ -0,0 +1,7 @@
import org.lenskit.api.ItemBasedItemRecommender
import org.lenskit.mooc.nonpers.assoc.LiftAssociationModelProvider
import org.lenskit.mooc.nonpers.assoc.AssociationItemBasedItemRecommender
import org.lenskit.mooc.nonpers.assoc.AssociationModel
bind ItemBasedItemRecommender to AssociationItemBasedItemRecommender
bind AssociationModel toProvider LiftAssociationModelProvider

View file

@ -0,0 +1,4 @@
import org.lenskit.mooc.nonpers.mean.MeanItemBasedItemRecommender
import org.lenskit.api.ItemBasedItemRecommender
bind ItemBasedItemRecommender to MeanItemBasedItemRecommender

View file

@ -0,0 +1,7 @@
import org.lenskit.api.ItemBasedItemRecommender
import org.lenskit.mooc.nonpers.assoc.AssociationItemBasedItemRecommender
import org.lenskit.mooc.nonpers.assoc.AssociationModel
import org.lenskit.mooc.nonpers.assoc.BasicAssociationModelProvider
bind ItemBasedItemRecommender to AssociationItemBasedItemRecommender
bind AssociationModel toProvider BasicAssociationModelProvider

View file

@ -0,0 +1,6 @@
repositories {
mavenCentral()
maven {
url 'https://oss.sonatype.org/content/repositories/snapshots/'
}
}

Binary file not shown.

View file

@ -0,0 +1,6 @@
#Fri Mar 25 17:48:43 CDT 2016
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-2.14-bin.zip

160
nonpers-assignment/gradlew vendored Executable file
View file

@ -0,0 +1,160 @@
#!/usr/bin/env bash
##############################################################################
##
## Gradle start up script for UN*X
##
##############################################################################
# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
DEFAULT_JVM_OPTS=""
APP_NAME="Gradle"
APP_BASE_NAME=`basename "$0"`
# Use the maximum available, or set MAX_FD != -1 to use that value.
MAX_FD="maximum"
warn ( ) {
echo "$*"
}
die ( ) {
echo
echo "$*"
echo
exit 1
}
# OS specific support (must be 'true' or 'false').
cygwin=false
msys=false
darwin=false
case "`uname`" in
CYGWIN* )
cygwin=true
;;
Darwin* )
darwin=true
;;
MINGW* )
msys=true
;;
esac
# Attempt to set APP_HOME
# Resolve links: $0 may be a link
PRG="$0"
# Need this for relative symlinks.
while [ -h "$PRG" ] ; do
ls=`ls -ld "$PRG"`
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '/.*' > /dev/null; then
PRG="$link"
else
PRG=`dirname "$PRG"`"/$link"
fi
done
SAVED="`pwd`"
cd "`dirname \"$PRG\"`/" >/dev/null
APP_HOME="`pwd -P`"
cd "$SAVED" >/dev/null
CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
# Determine the Java command to use to start the JVM.
if [ -n "$JAVA_HOME" ] ; then
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
# IBM's JDK on AIX uses strange locations for the executables
JAVACMD="$JAVA_HOME/jre/sh/java"
else
JAVACMD="$JAVA_HOME/bin/java"
fi
if [ ! -x "$JAVACMD" ] ; then
die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi
else
JAVACMD="java"
which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi
# Increase the maximum file descriptors if we can.
if [ "$cygwin" = "false" -a "$darwin" = "false" ] ; then
MAX_FD_LIMIT=`ulimit -H -n`
if [ $? -eq 0 ] ; then
if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
MAX_FD="$MAX_FD_LIMIT"
fi
ulimit -n $MAX_FD
if [ $? -ne 0 ] ; then
warn "Could not set maximum file descriptor limit: $MAX_FD"
fi
else
warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
fi
fi
# For Darwin, add options to specify how the application appears in the dock
if $darwin; then
GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
fi
# For Cygwin, switch paths to Windows format before running java
if $cygwin ; then
APP_HOME=`cygpath --path --mixed "$APP_HOME"`
CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
JAVACMD=`cygpath --unix "$JAVACMD"`
# We build the pattern for arguments to be converted via cygpath
ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
SEP=""
for dir in $ROOTDIRSRAW ; do
ROOTDIRS="$ROOTDIRS$SEP$dir"
SEP="|"
done
OURCYGPATTERN="(^($ROOTDIRS))"
# Add a user-defined pattern to the cygpath arguments
if [ "$GRADLE_CYGPATTERN" != "" ] ; then
OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
fi
# Now convert the arguments - kludge to limit ourselves to /bin/sh
i=0
for arg in "$@" ; do
CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option
if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition
eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
else
eval `echo args$i`="\"$arg\""
fi
i=$((i+1))
done
case $i in
(0) set -- ;;
(1) set -- "$args0" ;;
(2) set -- "$args0" "$args1" ;;
(3) set -- "$args0" "$args1" "$args2" ;;
(4) set -- "$args0" "$args1" "$args2" "$args3" ;;
(5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
(6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
(7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
(8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
(9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
esac
fi
# Split up the JVM_OPTS And GRADLE_OPTS values into an array, following the shell quoting and substitution rules
function splitJvmOpts() {
JVM_OPTS=("$@")
}
eval splitJvmOpts $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS
JVM_OPTS[${#JVM_OPTS[*]}]="-Dorg.gradle.appname=$APP_BASE_NAME"
exec "$JAVACMD" "${JVM_OPTS[@]}" -classpath "$CLASSPATH" org.gradle.wrapper.GradleWrapperMain "$@"

90
nonpers-assignment/gradlew.bat vendored Normal file
View file

@ -0,0 +1,90 @@
@if "%DEBUG%" == "" @echo off
@rem ##########################################################################
@rem
@rem Gradle startup script for Windows
@rem
@rem ##########################################################################
@rem Set local scope for the variables with windows NT shell
if "%OS%"=="Windows_NT" setlocal
@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
set DEFAULT_JVM_OPTS=
set DIRNAME=%~dp0
if "%DIRNAME%" == "" set DIRNAME=.
set APP_BASE_NAME=%~n0
set APP_HOME=%DIRNAME%
@rem Find java.exe
if defined JAVA_HOME goto findJavaFromJavaHome
set JAVA_EXE=java.exe
%JAVA_EXE% -version >NUL 2>&1
if "%ERRORLEVEL%" == "0" goto init
echo.
echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
echo.
echo Please set the JAVA_HOME variable in your environment to match the
echo location of your Java installation.
goto fail
:findJavaFromJavaHome
set JAVA_HOME=%JAVA_HOME:"=%
set JAVA_EXE=%JAVA_HOME%/bin/java.exe
if exist "%JAVA_EXE%" goto init
echo.
echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
echo.
echo Please set the JAVA_HOME variable in your environment to match the
echo location of your Java installation.
goto fail
:init
@rem Get command-line arguments, handling Windowz variants
if not "%OS%" == "Windows_NT" goto win9xME_args
if "%@eval[2+2]" == "4" goto 4NT_args
:win9xME_args
@rem Slurp the command line arguments.
set CMD_LINE_ARGS=
set _SKIP=2
:win9xME_args_slurp
if "x%~1" == "x" goto execute
set CMD_LINE_ARGS=%*
goto execute
:4NT_args
@rem Get arguments from the 4NT Shell from JP Software
set CMD_LINE_ARGS=%$
:execute
@rem Setup the command line
set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
@rem Execute Gradle
"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS%
:end
@rem End local scope for the variables with windows NT shell
if "%ERRORLEVEL%"=="0" goto mainEnd
:fail
rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
rem the _cmd.exe /c_ return code!
if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
exit /b 1
:mainEnd
if "%OS%"=="Windows_NT" endlocal
:omega

Binary file not shown.

View file

@ -0,0 +1,2 @@
rootProject.name = "nonpers-assignment"

View file

@ -0,0 +1,73 @@
package org.lenskit.mooc.nonpers.assoc;
import it.unimi.dsi.fastutil.longs.LongSet;
import org.lenskit.api.Result;
import org.lenskit.api.ResultList;
import org.lenskit.basic.AbstractItemBasedItemRecommender;
import org.lenskit.results.Results;
import org.lenskit.util.collections.LongUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.annotation.Nullable;
import javax.inject.Inject;
import java.util.ArrayList;
import java.util.List;
import java.util.Set;
/**
* An item-based item scorer that uses association rules.
*/
public class AssociationItemBasedItemRecommender extends AbstractItemBasedItemRecommender {
private static final Logger logger = LoggerFactory.getLogger(AssociationItemBasedItemRecommender.class);
private final AssociationModel model;
/**
* Construct the item scorer.
*
* @param m The association rule model.
*/
@Inject
public AssociationItemBasedItemRecommender(AssociationModel m) {
model = m;
}
@Override
public ResultList recommendRelatedItemsWithDetails(Set<Long> basket, int n, @Nullable Set<Long> candidates, @Nullable Set<Long> exclude) {
LongSet items;
if (candidates == null) {
items = model.getKnownItems();
} else {
items = LongUtils.asLongSet(candidates);
}
if (exclude != null) {
items = LongUtils.setDifference(items, LongUtils.asLongSet(exclude));
}
if (basket.isEmpty()) {
return Results.newResultList();
} else if (basket.size() > 1) {
logger.warn("Reference set has more than 1 item, picking arbitrarily.");
}
long refItem = basket.iterator().next();
return recommendItems(n, refItem, items);
}
/**
* Recommend items with an association rule.
* @param n The number of recommendations to produce.
* @param refItem The reference item.
* @param candidates The candidate items (set of items that can possibly be recommended).
* @return The list of results.
*/
private ResultList recommendItems(int n, long refItem, LongSet candidates) {
List<Result> results = new ArrayList<>();
// TODO Compute the n highest-scoring items from candidates
return Results.newResultList(results);
}
}

View file

@ -0,0 +1,90 @@
package org.lenskit.mooc.nonpers.assoc;
import com.google.common.base.Preconditions;
import it.unimi.dsi.fastutil.longs.LongSet;
import org.lenskit.inject.Shareable;
import org.lenskit.util.keys.SortedKeyIndex;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.Serializable;
import java.util.Map;
/**
* An association rule model, storing item-item association scores.
*
* <p>You <strong>should note</strong> need to change this class. It has some internal optimizations to reduce
* the memory requirements after the model is built.</p>
*/
@Shareable
public class AssociationModel implements Serializable {
private static final Logger logger = LoggerFactory.getLogger(AssociationModel.class);
private static final long serialVersionUID = 1L;
private final SortedKeyIndex index;
private final double[][] scores;
/**
* Construct a new association model.
* @param assocScores The association scores. The outer map's keys are the X items, and the inner map's keys are
* the Y items. So {@code assocScores.get(x).get(y)} should return the score for {@code y}
* with respect to {@code x}.
*/
public AssociationModel(Map<Long, ? extends Map<Long,Double>> assocScores) {
index = SortedKeyIndex.fromCollection(assocScores.keySet());
int n = index.size();
logger.debug("transforming input map for {} items into log data", n);
scores = new double[n][n];
for (int i = 0; i < n; i++) {
long itemX = index.getKey(i);
for (int j = 0; j < n; j++) {
if (i == j) {
continue; // skip self-similarities
}
long itemY = index.getKey(j);
Double score = assocScores.get(itemX).get(itemY);
if (score == null) {
logger.error("no score found for items {} and {}", itemX, itemY);
String msg = String.format("no score found for x=%d, y=%d", itemX, itemY);
throw new IllegalArgumentException(msg);
}
scores[i][j] = score;
}
}
}
/**
* Get the set of known items.
* @return The set of known item IDs.
*/
public LongSet getKnownItems() {
return index.keySet();
}
/**
* Query whether the model knows about an item.
* @param item The item ID.
* @return {@code true} if the model knows about the item {@code item}, {@code false} otherwise.
*/
public boolean hasItem(long item) {
return index.containsKey(item);
}
/**
* Get the association between two items.
* @param ref The reference item (X).
* @param item The item to score (Y).
* @return The score between X and Y.
* @throws IllegalArgumentException if either item is invalid.
*/
public double getItemAssociation(long ref, long item) {
// look up item positions
int refIndex = index.tryGetIndex(ref);
Preconditions.checkArgument(refIndex >= 0, "unknown reference item %d", ref);
int itemIndex = index.tryGetIndex(item);
Preconditions.checkArgument(itemIndex >= 0, "unknown target item %d", item);
return scores[refIndex][itemIndex];
}
}

View file

@ -0,0 +1,82 @@
package org.lenskit.mooc.nonpers.assoc;
import it.unimi.dsi.fastutil.longs.*;
import org.lenskit.data.dao.DataAccessObject;
import org.lenskit.data.entities.CommonAttributes;
import org.lenskit.data.ratings.Rating;
import org.lenskit.inject.Transient;
import org.lenskit.util.IdBox;
import org.lenskit.util.collections.LongUtils;
import org.lenskit.util.io.ObjectStream;
import javax.inject.Inject;
import javax.inject.Provider;
import java.util.List;
/**
* Build a model for basic association rules. This class computes the association for all pairs of items.
*/
public class BasicAssociationModelProvider implements Provider<AssociationModel> {
private final DataAccessObject dao;
@Inject
public BasicAssociationModelProvider(@Transient DataAccessObject dao) {
this.dao = dao;
}
@Override
public AssociationModel get() {
// First step: map each item to the set of users who have rated it.
// This map will map each item ID to the set of users who have rated it.
Long2ObjectMap<LongSortedSet> itemUsers = new Long2ObjectOpenHashMap<>();
LongSet allUsers = new LongOpenHashSet();
// Open a stream, grouping ratings by item ID
try (ObjectStream<IdBox<List<Rating>>> ratingStream = dao.query(Rating.class)
.groupBy(CommonAttributes.ITEM_ID)
.stream()) {
// Process each item's ratings
for (IdBox<List<Rating>> item: ratingStream) {
// Build a set of users. We build an array first, then convert to a set.
LongList users = new LongArrayList();
// Add each rating's user ID to the user sets
for (Rating r: item.getValue()) {
long user = r.getUserId();
users.add(user);
allUsers.add(user);
}
// put this item's user set into the item user map
// a frozen set will be very efficient later
itemUsers.put(item.getId(), LongUtils.frozenSet(users));
}
}
// Second step: compute all association rules
// We need a map to store them
Long2ObjectMap<Long2DoubleMap> assocMatrix = new Long2ObjectOpenHashMap<>();
// then loop over 'x' items
for (Long2ObjectMap.Entry<LongSortedSet> xEntry: itemUsers.long2ObjectEntrySet()) {
long xId = xEntry.getLongKey();
LongSortedSet xUsers = xEntry.getValue();
// set up a map to hold the scores for each 'y' item for this 'x'
Long2DoubleMap itemScores = new Long2DoubleOpenHashMap();
// loop over the 'y' items
for (Long2ObjectMap.Entry<LongSortedSet> yEntry: itemUsers.long2ObjectEntrySet()) {
long yId = yEntry.getLongKey();
LongSortedSet yUsers = yEntry.getValue();
// TODO Compute P(Y & X) / P(X) and store in itemScores
}
// save the score map to the main map
assocMatrix.put(xId, itemScores);
}
return new AssociationModel(assocMatrix);
}
}

View file

@ -0,0 +1,83 @@
package org.lenskit.mooc.nonpers.assoc;
import it.unimi.dsi.fastutil.longs.*;
import org.lenskit.data.dao.DataAccessObject;
import org.lenskit.data.entities.CommonAttributes;
import org.lenskit.data.ratings.Rating;
import org.lenskit.inject.Transient;
import org.lenskit.util.IdBox;
import org.lenskit.util.collections.LongUtils;
import org.lenskit.util.io.ObjectStream;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.inject.Inject;
import javax.inject.Provider;
import java.util.List;
/**
* Build an association rule model using a lift metric.
*/
public class LiftAssociationModelProvider implements Provider<AssociationModel> {
private static final Logger logger = LoggerFactory.getLogger(LiftAssociationModelProvider.class);
private final DataAccessObject dao;
@Inject
public LiftAssociationModelProvider(@Transient DataAccessObject dao) {
this.dao = dao;
}
@Override
public AssociationModel get() {
// First step: map each item to the set of users who have rated it.
// While we're at it, compute the set of all users.
// This set contains all users.
LongSet allUsers = new LongOpenHashSet();
// This map will map each item ID to the set of users who have rated it.
Long2ObjectMap<LongSortedSet> itemUsers = new Long2ObjectOpenHashMap<>();
// Open a stream, grouping ratings by item ID
try (ObjectStream<IdBox<List<Rating>>> ratingStream = dao.query(Rating.class)
.groupBy(CommonAttributes.ITEM_ID)
.stream()) {
// Process each item's ratings
for (IdBox<List<Rating>> item: ratingStream) {
// Build a set of users. We build an array first, then convert to a set.
LongList users = new LongArrayList();
// Add each rating's user ID to the user sets
for (Rating r: item.getValue()) {
long user = r.getUserId();
users.add(user);
allUsers.add(user);
}
// put this item's user set into the item user map
// a frozen set will be very efficient later
itemUsers.put(item.getId(), LongUtils.frozenSet(users));
}
}
// Second step: compute all association rules
// We need a map to store them
Long2ObjectMap<Long2DoubleMap> assocMatrix = new Long2ObjectOpenHashMap<>();
// then loop over 'x' items
for (Long2ObjectMap.Entry<LongSortedSet> xEntry: itemUsers.long2ObjectEntrySet()) {
long xId = xEntry.getLongKey();
LongSortedSet xUsers = xEntry.getValue();
// set up a map to hold the scores for each 'y' item
Long2DoubleMap itemScores = new Long2DoubleOpenHashMap();
// TODO Compute lift association formulas for all other 'Y' items with respect to this 'X'
// save the score map to the main map
assocMatrix.put(xId, itemScores);
}
return new AssociationModel(assocMatrix);
}
}

View file

@ -0,0 +1,68 @@
package org.lenskit.mooc.nonpers.mean;
import it.unimi.dsi.fastutil.longs.Long2DoubleOpenHashMap;
import it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap;
import org.lenskit.baseline.MeanDamping;
import org.lenskit.data.dao.DataAccessObject;
import org.lenskit.data.ratings.Rating;
import org.lenskit.inject.Transient;
import org.lenskit.util.io.ObjectStream;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.inject.Inject;
import javax.inject.Provider;
/**
* Provider class that builds the mean rating item scorer, computing damped item means from the
* ratings in the DAO.
*/
public class DampedItemMeanModelProvider implements Provider<ItemMeanModel> {
/**
* A logger that you can use to emit debug messages.
*/
private static final Logger logger = LoggerFactory.getLogger(DampedItemMeanModelProvider.class);
/**
* The data access object, to be used when computing the mean ratings.
*/
private final DataAccessObject dao;
/**
* The damping factor.
*/
private final double damping;
/**
* Constructor for the mean item score provider.
*
* <p>The {@code @Inject} annotation tells LensKit to use this constructor.
*
* @param dao The data access object (DAO), where the builder will get ratings. The {@code @Transient}
* annotation on this parameter means that the DAO will be used to build the model, but the
* model will <strong>not</strong> retain a reference to the DAO. This is standard procedure
* for LensKit models.
* @param damping The damping factor for Bayesian damping. This is number of fake global-mean ratings to
* assume. It is provided as a parameter so that it can be reconfigured. See the file
* {@code damped-mean.groovy} for how it is used.
*/
@Inject
public DampedItemMeanModelProvider(@Transient DataAccessObject dao,
@MeanDamping double damping) {
this.dao = dao;
this.damping = damping;
}
/**
* Construct an item mean model.
*
* <p>The {@link Provider#get()} method constructs whatever object the provider class is intended to build.</p>
*
* @return The item mean model with mean ratings for all items.
*/
@Override
public ItemMeanModel get() {
// TODO Compute damped means
// TODO Remove the line below when you have finished
throw new UnsupportedOperationException("damped mean not implemented");
}
}

View file

@ -0,0 +1,68 @@
package org.lenskit.mooc.nonpers.mean;
import com.google.common.base.Preconditions;
import it.unimi.dsi.fastutil.longs.Long2DoubleMap;
import it.unimi.dsi.fastutil.longs.LongSet;
import org.grouplens.grapht.annotation.DefaultProvider;
import org.lenskit.inject.Shareable;
import org.lenskit.util.collections.LongUtils;
import javax.annotation.concurrent.Immutable;
import java.io.Serializable;
import java.util.Map;
/**
* A <em>model</em> class that stores item mean ratings.
*
* <p>The {@link Shareable} annotation is common for model objects, and tells LensKit that the class can be shared
* between multiple recommender instances.</p>
*
* <p>The {@link DefaultProvider} annotation tells LensKit to use a <em>provider class</em> &mdash; the mean item scorer
* provider &mdash; to create instances of this class.</p>
*
* <p>You <strong>should not</strong> need to make any changes to this class.</p>
*/
@Shareable
@Immutable
@DefaultProvider(ItemMeanModelProvider.class)
public class ItemMeanModel implements Serializable {
private static final long serialVersionUID = 1L;
private final Long2DoubleMap itemMeans;
/**
* Construct a new item mean model.
* @param means A map of item IDs to their mean ratings.
*/
public ItemMeanModel(Map<Long, Double> means) {
itemMeans = LongUtils.frozenMap(means);
}
/**
* Get the set of items known by the model.
* @return The set of items known by the model.
*/
public LongSet getKnownItems() {
return itemMeans.keySet();
}
/**
* Query whether this model knows about an item.
* @param item The item ID.
* @return {@code true} if the item is known by the model, {@code false} otherwise.
*/
public boolean hasItem(long item) {
return itemMeans.containsKey(item);
}
/**
* Get the mean rating for an item.
* @param item The item ID.
* @return The mean rating.
* @throws IllegalArgumentException if the item is not a known itemm.
*/
public double getMeanRating(long item) {
Preconditions.checkArgument(hasItem(item), "unknown item " + item);
return itemMeans.get(item);
}
}

View file

@ -0,0 +1,69 @@
package org.lenskit.mooc.nonpers.mean;
import it.unimi.dsi.fastutil.longs.Long2DoubleOpenHashMap;
import it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap;
import org.lenskit.data.dao.DataAccessObject;
import org.lenskit.data.ratings.Rating;
import org.lenskit.inject.Transient;
import org.lenskit.util.io.ObjectStream;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.inject.Inject;
import javax.inject.Provider;
/**
* Provider class that builds the mean rating item scorer, computing item means from the
* ratings in the DAO.
*/
public class ItemMeanModelProvider implements Provider<ItemMeanModel> {
/**
* A logger that you can use to emit debug messages.
*/
private static final Logger logger = LoggerFactory.getLogger(ItemMeanModelProvider.class);
/**
* The data access object, to be used when computing the mean ratings.
*/
private final DataAccessObject dao;
/**
* Constructor for the mean item score provider.
*
* <p>The {@code @Inject} annotation tells LensKit to use this constructor.
*
* @param dao The data access object (DAO), where the builder will get ratings. The {@code @Transient}
* annotation on this parameter means that the DAO will be used to build the model, but the
* model will <strong>not</strong> retain a reference to the DAO. This is standard procedure
* for LensKit models.
*/
@Inject
public ItemMeanModelProvider(@Transient DataAccessObject dao) {
this.dao = dao;
}
/**
* Construct an item mean model.
*
* <p>The {@link Provider#get()} method constructs whatever object the provider class is intended to build.</p>
*
* @return The item mean model with mean ratings for all items.
*/
@Override
public ItemMeanModel get() {
// TODO Set up data structures for computing means
try (ObjectStream<Rating> ratings = dao.query(Rating.class).stream()) {
for (Rating r: ratings) {
// this loop will run once for each rating in the data set
// TODO process this rating
}
}
Long2DoubleOpenHashMap means = new Long2DoubleOpenHashMap();
// TODO Finalize means to store them in the mean model
logger.info("computed mean ratings for {} items", means.size());
return new ItemMeanModel(means);
}
}

View file

@ -0,0 +1,92 @@
package org.lenskit.mooc.nonpers.mean;
import it.unimi.dsi.fastutil.longs.LongSet;
import org.lenskit.api.Result;
import org.lenskit.api.ResultList;
import org.lenskit.api.ResultMap;
import org.lenskit.basic.AbstractItemBasedItemRecommender;
import org.lenskit.results.Results;
import org.lenskit.util.collections.LongUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.annotation.Nullable;
import javax.inject.Inject;
import java.util.ArrayList;
import java.util.List;
import java.util.Set;
/**
* An item scorer that scores each item with its mean rating.
*/
public class MeanItemBasedItemRecommender extends AbstractItemBasedItemRecommender {
private static final Logger logger = LoggerFactory.getLogger(MeanItemBasedItemRecommender.class);
private final ItemMeanModel model;
/**
* Construct a mean global item scorer.
*
* <p>The {@code @Inject} annotation tells LensKit to use this constructor.</p>
*
* @param m The model containing item mean ratings. LensKit will automatically build an {@link ItemMeanModel}
* object. Its use as a parameter type in this constructor declares it as a <em>dependency</em> of the
* mean-based item scorer.
*/
@Inject
public MeanItemBasedItemRecommender(ItemMeanModel m) {
model = m;
}
/**
* {@inheritDoc}
*
* This is the LensKit recommend method. It takes several parameters; we implement it for you in terms of a
* simpler method ({@link #recommendItems(int, LongSet)}).
*/
@Override
public ResultList recommendRelatedItemsWithDetails(Set<Long> basket, int n, @Nullable Set<Long> candidates, @Nullable Set<Long> exclude) {
LongSet items;
if (candidates == null) {
items = model.getKnownItems();
} else {
items = LongUtils.asLongSet(candidates);
}
if (exclude != null) {
items = LongUtils.setDifference(items, LongUtils.asLongSet(exclude));
}
logger.info("computing {} recommendations from {} items", n, items.size());
return recommendItems(n, items);
}
/**
* Recommend some items from a set of candidate items.
*
* <p>Your code needs to obtain the mean rating, if one is available, for each item, and return a list of the
* {@code n} highest-rated items, in decreasing order of score.</p>
*
* <p>To create the {@link ResultMap} data structure, do the following:</p>
*
* <ol>
* <li>Create a {@link List} to hold {@link Result} objects.</li>
* <li>Create a result object for each item that can be scored. Use {@link Results#create(long, double)} to
* create the result object. If an item cannot be scored (because there is no mean available), ignore it and
* do not add a result to the list.</li>
* <li>Convert the list of results to a {@link ResultList} using {@link Results#newResultList(List)}.</li>
* </ol>
*
* @param n The number of items to recommend. If this is negative, then recommend all possible items.
* @param items The items to score.
* @return A {@link ResultMap} containing the scores.
*/
private ResultList recommendItems(int n, LongSet items) {
List<Result> results = new ArrayList<>();
// TODO Find the top N items by mean rating
return Results.newResultList(results);
}
}