initial
This commit is contained in:
parent
e228e16f22
commit
ac61899f6e
27 changed files with 363140 additions and 0 deletions
4
README.md
Normal file
4
README.md
Normal file
|
@ -0,0 +1,4 @@
|
||||||
|
recommender systems
|
||||||
|
===
|
||||||
|
|
||||||
|
homework assignments from the coursera course on recommender systems
|
1
nonpers-assignment/.gitignore
vendored
Normal file
1
nonpers-assignment/.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
||||||
|
.gradle
|
216
nonpers-assignment/README.md
Normal file
216
nonpers-assignment/README.md
Normal file
|
@ -0,0 +1,216 @@
|
||||||
|
# Non-Personalized Recommender Assignment
|
||||||
|
|
||||||
|
In this assignment, you will implement some non-personalized recommenders. In particular, you will
|
||||||
|
implement raw and damped item mean recommenders and simple and advanced association rule
|
||||||
|
recommenders.
|
||||||
|
|
||||||
|
You will implement these recommenders in the LensKit toolkit.
|
||||||
|
|
||||||
|
## Downloads and Resources
|
||||||
|
|
||||||
|
- Project template (from Coursera)
|
||||||
|
- [LensKit for Teaching website](http://mooc.lenskit.org) (links to relevant documentation)
|
||||||
|
- [JavaDoc for included code](http://mooc.lenskit.org/assignments/nonpers/javadoc/)
|
||||||
|
- [Fastutil API docs](http://fastutil.di.unimi.it/docs/) documents the Fastutil optimized data
|
||||||
|
structure classes that are used in portions of LensKit.
|
||||||
|
|
||||||
|
The project template contains support code, the build file, and the input data that you will use.
|
||||||
|
|
||||||
|
## Input Data
|
||||||
|
|
||||||
|
The input data contains the following files:
|
||||||
|
|
||||||
|
- `ratings.csv` contains user ratings of movies
|
||||||
|
- `movies.csv` contains movie titles
|
||||||
|
- `movielens.yml` is a LensKit data manifest that describes the other input files
|
||||||
|
|
||||||
|
## Getting Started
|
||||||
|
|
||||||
|
To get started with this assignment, unpack the template and import it in to your IDE as a Gradle
|
||||||
|
project. The assignment video demonstrates how to do this in IntelliJ IDEA.
|
||||||
|
|
||||||
|
## Mean-Based Recommendation
|
||||||
|
|
||||||
|
The first two recommenders you will implement will recommend items with the highest average rating.
|
||||||
|
|
||||||
|
With LensKit's scorer-model-builder architecture, you will just need to write the recommendation
|
||||||
|
logic once, and you will implement two different mechanisms for computing item mean ratings.
|
||||||
|
|
||||||
|
You will work with the following classes:
|
||||||
|
|
||||||
|
- `MeanItemBasedItemRecommender` (the *item recommender*) computes top-*N* recommendations based
|
||||||
|
on mean ratings. You will implement the logic to compute such recommendation lists.
|
||||||
|
|
||||||
|
- `ItemMeanModel` is a *model class* that stores precomputed item means. You will not need to
|
||||||
|
modify this class, but you will write code to construct instances of it and use it in your
|
||||||
|
item recommender implementation.
|
||||||
|
|
||||||
|
- `ItemMeanModelProvider` computes item mean ratings from rating data and constructs the model.
|
||||||
|
It computes raw means with no damping.
|
||||||
|
|
||||||
|
- `DampedItemMeanModelProvider` is an alternate builder for item mean models that computes
|
||||||
|
damped means instead of raw means. It takes the damping term as a parameter. The configuration
|
||||||
|
file we provide you uses a damping term of 5.
|
||||||
|
|
||||||
|
There are `// TODO` comments in all places where you need to write new code.
|
||||||
|
|
||||||
|
### Computing Item Means
|
||||||
|
|
||||||
|
Modify the `ItemMeanModelProvider` class to compute the mean rating for each item.
|
||||||
|
|
||||||
|
### Recommending Items
|
||||||
|
|
||||||
|
Modify the `MeanItemBasedItemRecommender` class to compute recommendations based on item mean
|
||||||
|
ratings. For this, you need to:
|
||||||
|
|
||||||
|
1. Obtain the mean rating for each item
|
||||||
|
2. Order the items in decreasing order
|
||||||
|
3. Return the *N* highest-rated items
|
||||||
|
|
||||||
|
### Computing Damped Item Means
|
||||||
|
|
||||||
|
Modify the `DampedItemMeanModelProvider` class to compute the damped mean rating for each item.
|
||||||
|
This formula uses a damping factor $\alpha$, which is the number of 'fake' ratings at the global
|
||||||
|
mean to assume for each item. In the Java code, this is available as the field `damping`.
|
||||||
|
|
||||||
|
The damped mean formula, as you may recall, is:
|
||||||
|
|
||||||
|
$$s(i) = \frac{\sum_{u \in U_i} r_{ui} + \alpha\mu}{|U_i| + \alpha}$$
|
||||||
|
|
||||||
|
where $\mu$ is the *global* mean rating.
|
||||||
|
|
||||||
|
### Example Outputs
|
||||||
|
|
||||||
|
To help you see if your output is correct, we have provided the following example correct values:
|
||||||
|
|
||||||
|
| ID | Title | Mean | Damped Mean |
|
||||||
|
| :-: | :---- | :--: | :---------: |
|
||||||
|
| 2959 | *Fight Club* | 4.259 | 4.252 |
|
||||||
|
| 1203 | *12 Angry Men* | 4.246 | 4.227 |
|
||||||
|
|
||||||
|
## Association Rules
|
||||||
|
|
||||||
|
In the second part of the assignment, you will implement two versions of an association rule
|
||||||
|
recommender.
|
||||||
|
|
||||||
|
The association rule implementation consists of the following code:
|
||||||
|
|
||||||
|
- `AssociationItemBasedItemRecommender` recommends items using association rules. Unlike the mean
|
||||||
|
recommenders, this recommender uses a *reference item* to compute the recommendations.
|
||||||
|
- `AssociationModel` stores the association rule scores between pairs of items. You will not need
|
||||||
|
to modify this class.
|
||||||
|
- `BasicAssociationModelProvider` computes an association rule model using the basic association
|
||||||
|
rule formula ($P(X \wedge Y) / P(X)$).
|
||||||
|
- `LiftAssociationModelProvider` computes an association rule model using the lift formula ($P(X \wedge Y) / P(X) P(Y)$).
|
||||||
|
|
||||||
|
### Computing Association Scores
|
||||||
|
|
||||||
|
Like with the mean-based recommender, we pre-compute product association scores and store them in
|
||||||
|
a model before recommendation. We compute the scores between *all pairs* of items, so that the
|
||||||
|
model can be used to score any item. When computing a single recommendation from the command line,
|
||||||
|
this does not provide much benefit, but is useful in the general case so that the model can be used
|
||||||
|
to very quickly compute many recommendations.
|
||||||
|
|
||||||
|
The `BasicAssociationModelProvider` class computes the association rule scores using the following
|
||||||
|
formula:
|
||||||
|
|
||||||
|
$$P(i|j) = \frac{P(i \wedge j)}{P(j))} = \frac{|U_i \cap U_j|/|U|}{|U_j|/|U|}$$
|
||||||
|
|
||||||
|
In this case, $j$ is the *reference* item and $i$ is the item to be scored.
|
||||||
|
|
||||||
|
We estimate probabilities by counting: $P(i)$ is the fraction of users in the system
|
||||||
|
who purchased item $i$; $P(i \wedge j)$ is the fraction that purchased both $i$ and $j$.
|
||||||
|
|
||||||
|
**Implement the association rule computation in this class.**
|
||||||
|
|
||||||
|
### Computing Recommendations
|
||||||
|
|
||||||
|
Implement the recommendation logic in `AssociationItemBasedItemRecommender` to recommend items
|
||||||
|
related to a given reference item. As with the mean recommender, it should compute the top *N*
|
||||||
|
recommendations and return them.
|
||||||
|
|
||||||
|
### Computing Advanced Association Rules
|
||||||
|
|
||||||
|
The `LiftAssociationModelProvider` recommender uses the *lift* metric that computes how
|
||||||
|
much more likely someone is to rate a movie $i$ when they have rated $j$ than they would have if we do not know anything about whether they have rated $j$:
|
||||||
|
|
||||||
|
$$s(i|j) = \frac{P(j \wedge i)}{P(i) P(j)}$$
|
||||||
|
|
||||||
|
### Example Outputs
|
||||||
|
|
||||||
|
Following is the correct output for the basic association rules with reference item 260 (*Star Wars*), as generated with `./gradlew runBasicAssoc -PreferenceItemm=260`:
|
||||||
|
|
||||||
|
2571 (Matrix, The (1999)): 0.916
|
||||||
|
1196 (Star Wars: Episode V - The Empire Strikes Back (1980)): 0.899
|
||||||
|
4993 (Lord of the Rings: The Fellowship of the Ring, The (2001)): 0.892
|
||||||
|
1210 (Star Wars: Episode VI - Return of the Jedi (1983)): 0.847
|
||||||
|
356 (Forrest Gump (1994)): 0.843
|
||||||
|
5952 (Lord of the Rings: The Two Towers, The (2002)): 0.841
|
||||||
|
7153 (Lord of the Rings: The Return of the King, The (2003)): 0.830
|
||||||
|
296 (Pulp Fiction (1994)): 0.828
|
||||||
|
1198 (Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)): 0.791
|
||||||
|
480 (Jurassic Park (1993)): 0.789
|
||||||
|
|
||||||
|
And lift-based association rules for item 2761 (*The Iron Giant*):
|
||||||
|
|
||||||
|
631 (All Dogs Go to Heaven 2 (1996)): 4.898
|
||||||
|
2532 (Conquest of the Planet of the Apes (1972)): 4.810
|
||||||
|
3615 (Dinosaur (2000)): 4.546
|
||||||
|
1649 (Fast, Cheap & Out of Control (1997)): 4.490
|
||||||
|
340 (War, The (1994)): 4.490
|
||||||
|
1016 (Shaggy Dog, The (1959)): 4.490
|
||||||
|
2439 (Affliction (1997)): 4.490
|
||||||
|
332 (Village of the Damned (1995)): 4.377
|
||||||
|
2736 (Brighton Beach Memoirs (1986)): 4.329
|
||||||
|
3213 (Batman: Mask of the Phantasm (1993)): 4.317
|
||||||
|
|
||||||
|
## Running your code
|
||||||
|
|
||||||
|
The Gradle build file we have provided is set up to automatically run all four of your recommenders.
|
||||||
|
The following Gradle targets will do this:
|
||||||
|
|
||||||
|
- `runMean` runs the raw mean recommender
|
||||||
|
- `runDampedMean` runs the damped mean recommender
|
||||||
|
- `runBasicAssoc` runs the basic association rule recommender
|
||||||
|
- `runLiftAssoc` runs the advanced (lift-based) association rule recommender
|
||||||
|
|
||||||
|
You can run these using the IntelliJ Gradle runner (open the Gradle panel, browse the tree to find
|
||||||
|
a task, and double-click it), or from the command line:
|
||||||
|
|
||||||
|
./gradlew runMean
|
||||||
|
|
||||||
|
The association rule recommenders can also take the reference item ID on the command line as a
|
||||||
|
`referenceItem` parameter. For example:
|
||||||
|
|
||||||
|
./gradlew runLiftAssoc -PreferenceItem=1
|
||||||
|
|
||||||
|
The IntelliJ ‘Run Configuration’ dialog will allow you to specify additional ‘script parameters’ to
|
||||||
|
the Gradle invocation.
|
||||||
|
|
||||||
|
### Debugging
|
||||||
|
|
||||||
|
If you run the Gradle tasks using IntelliJ's Gradle runner, you can run them under the debugger to debug your code.
|
||||||
|
|
||||||
|
The Gradle file also configures LensKit to write log output to log files under the `build`
|
||||||
|
directory. If you use the SLF4J logger (the `logger` field on the classes we provide) to emit debug
|
||||||
|
messages, you can find them there when you run one of the recommender tasks such as `runDampedMean`.
|
||||||
|
|
||||||
|
## Submitting
|
||||||
|
|
||||||
|
You will submit a compiled `jar` file containing your solution. To prepare your project for
|
||||||
|
submission, run the Gradle `prepareSubmission` task:
|
||||||
|
|
||||||
|
./gradlew prepareSubmission
|
||||||
|
|
||||||
|
This will create file `nonpers-submission.jar` under `build/distributions` that contains your final
|
||||||
|
solution code in a format the grader will understand. Upload this `jar` file to the Coursera
|
||||||
|
assignment grader.
|
||||||
|
|
||||||
|
## Grading
|
||||||
|
|
||||||
|
Your grade for each part will be based on two components:
|
||||||
|
|
||||||
|
- Outputting items in the correct order: 75%
|
||||||
|
- Computing correct scores for items (within an error tolerance): 25%
|
||||||
|
|
||||||
|
The parts themselves are weighted equally.
|
87
nonpers-assignment/build.gradle
Normal file
87
nonpers-assignment/build.gradle
Normal file
|
@ -0,0 +1,87 @@
|
||||||
|
apply plugin: 'java'
|
||||||
|
|
||||||
|
ext.lenskitVersion = '3.0-M1'
|
||||||
|
if (!hasProperty('dataDir')) {
|
||||||
|
ext.dataDir = 'data'
|
||||||
|
}
|
||||||
|
|
||||||
|
sourceCompatibility = 1.7
|
||||||
|
|
||||||
|
apply from: "$rootDir/gradle/repositories.gradle"
|
||||||
|
|
||||||
|
dependencies {
|
||||||
|
compile "org.lenskit:lenskit-core:$lenskitVersion"
|
||||||
|
runtime "org.lenskit:lenskit-cli:$lenskitVersion"
|
||||||
|
}
|
||||||
|
dependencies {
|
||||||
|
testCompile group: 'junit', name: 'junit', version: '4.11'
|
||||||
|
}
|
||||||
|
|
||||||
|
task runMean(type: JavaExec, group: 'run') {
|
||||||
|
description "Run the simple mean recommender."
|
||||||
|
classpath sourceSets.main.runtimeClasspath
|
||||||
|
main 'org.lenskit.cli.Main'
|
||||||
|
args '--log-file', file("$buildDir/recommend-mean.log"), '--log-file-level', 'DEBUG'
|
||||||
|
args 'global-recommend'
|
||||||
|
args '--data-source', "$dataDir/movielens.yml"
|
||||||
|
args '-c', file('etc/mean.groovy')
|
||||||
|
args '-n', 10
|
||||||
|
if (project.hasProperty('lenskit.maxMemory')) {
|
||||||
|
maxHeapSize project.getProperty('lenskit.maxMemory')
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
task runDampedMean(type: JavaExec, group: 'run') {
|
||||||
|
description "Run the damped mean recommender."
|
||||||
|
mustRunAfter runMean
|
||||||
|
classpath sourceSets.main.runtimeClasspath
|
||||||
|
main 'org.lenskit.cli.Main'
|
||||||
|
args '--log-file', file("$buildDir/recommend-damped-mean.log"), '--log-file-level', 'DEBUG'
|
||||||
|
args 'global-recommend'
|
||||||
|
args '--data-source', "$dataDir/movielens.yml"
|
||||||
|
args '-c', file('etc/damped-mean.groovy')
|
||||||
|
if (project.hasProperty('lenskit.maxMemory')) {
|
||||||
|
maxHeapSize project.getProperty('lenskit.maxMemory')
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
task runBasicAssoc(type: JavaExec, group: 'run') {
|
||||||
|
description "Run the damped mean recommender."
|
||||||
|
mustRunAfter runDampedMean
|
||||||
|
classpath sourceSets.main.runtimeClasspath
|
||||||
|
main 'org.lenskit.cli.Main'
|
||||||
|
args '--log-file', file("$buildDir/recommend-basic-assoc.log"), '--log-file-level', 'DEBUG'
|
||||||
|
args 'global-recommend'
|
||||||
|
args '--data-source', "$dataDir/movielens.yml"
|
||||||
|
args '-c', file('etc/simple-assoc.groovy')
|
||||||
|
args findProperty('referenceItem') ?: 260
|
||||||
|
if (project.hasProperty('lenskit.maxMemory')) {
|
||||||
|
maxHeapSize project.getProperty('lenskit.maxMemory')
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
task runLiftAssoc(type: JavaExec, group: 'run') {
|
||||||
|
description "Run the damped mean recommender."
|
||||||
|
classpath sourceSets.main.runtimeClasspath
|
||||||
|
mustRunAfter runBasicAssoc
|
||||||
|
main 'org.lenskit.cli.Main'
|
||||||
|
args '--log-file', file("$buildDir/recommend-lift-assoc.log"), '--log-file-level', 'DEBUG'
|
||||||
|
args 'global-recommend'
|
||||||
|
args '--data-source', "$dataDir/movielens.yml"
|
||||||
|
args '-c', file('etc/lift-assoc.groovy')
|
||||||
|
args findProperty('referenceItem') ?: 2761
|
||||||
|
if (project.hasProperty('lenskit.maxMemory')) {
|
||||||
|
maxHeapSize project.getProperty('lenskit.maxMemory')
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
task runAll(group: 'run') {
|
||||||
|
dependsOn runMean, runDampedMean
|
||||||
|
dependsOn runBasicAssoc, runLiftAssoc
|
||||||
|
}
|
||||||
|
|
||||||
|
task prepareSubmission(type: Copy) {
|
||||||
|
from jar
|
||||||
|
into distsDir
|
||||||
|
rename(/-assignment/, '-submission')
|
||||||
|
}
|
28
nonpers-assignment/data/movielens.yml
Normal file
28
nonpers-assignment/data/movielens.yml
Normal file
|
@ -0,0 +1,28 @@
|
||||||
|
ratings:
|
||||||
|
type: textfile
|
||||||
|
file: ratings.csv
|
||||||
|
format: csv
|
||||||
|
entity_type: rating
|
||||||
|
header: true
|
||||||
|
movies:
|
||||||
|
type: textfile
|
||||||
|
file: movies.csv
|
||||||
|
format: csv
|
||||||
|
entity_type: item
|
||||||
|
header: true
|
||||||
|
columns: [id, name]
|
||||||
|
tags:
|
||||||
|
type: textfile
|
||||||
|
file: tags.csv
|
||||||
|
format: csv
|
||||||
|
entity_type: item-tag
|
||||||
|
header: true
|
||||||
|
columns:
|
||||||
|
- name: item
|
||||||
|
type: long
|
||||||
|
- name: user
|
||||||
|
type: long
|
||||||
|
- name: tag
|
||||||
|
type: string
|
||||||
|
- name: timestamp
|
||||||
|
type: long
|
2501
nonpers-assignment/data/movies.csv
Normal file
2501
nonpers-assignment/data/movies.csv
Normal file
File diff suppressed because it is too large
Load diff
264506
nonpers-assignment/data/ratings.csv
Normal file
264506
nonpers-assignment/data/ratings.csv
Normal file
File diff suppressed because it is too large
Load diff
94876
nonpers-assignment/data/tags.csv
Normal file
94876
nonpers-assignment/data/tags.csv
Normal file
File diff suppressed because it is too large
Load diff
14
nonpers-assignment/etc/damped-mean.groovy
Normal file
14
nonpers-assignment/etc/damped-mean.groovy
Normal file
|
@ -0,0 +1,14 @@
|
||||||
|
import org.lenskit.api.ItemBasedItemRecommender
|
||||||
|
import org.lenskit.baseline.MeanDamping
|
||||||
|
import org.lenskit.mooc.nonpers.mean.DampedItemMeanModelProvider
|
||||||
|
import org.lenskit.mooc.nonpers.mean.ItemMeanModel
|
||||||
|
import org.lenskit.mooc.nonpers.mean.MeanItemBasedItemRecommender
|
||||||
|
|
||||||
|
// set up the recommender
|
||||||
|
bind ItemBasedItemRecommender to MeanItemBasedItemRecommender
|
||||||
|
|
||||||
|
// this time, we will use the damped mean model
|
||||||
|
bind ItemMeanModel toProvider DampedItemMeanModelProvider
|
||||||
|
|
||||||
|
// use a mean damping of 5
|
||||||
|
set MeanDamping to 5
|
7
nonpers-assignment/etc/lift-assoc.groovy
Normal file
7
nonpers-assignment/etc/lift-assoc.groovy
Normal file
|
@ -0,0 +1,7 @@
|
||||||
|
import org.lenskit.api.ItemBasedItemRecommender
|
||||||
|
import org.lenskit.mooc.nonpers.assoc.LiftAssociationModelProvider
|
||||||
|
import org.lenskit.mooc.nonpers.assoc.AssociationItemBasedItemRecommender
|
||||||
|
import org.lenskit.mooc.nonpers.assoc.AssociationModel
|
||||||
|
|
||||||
|
bind ItemBasedItemRecommender to AssociationItemBasedItemRecommender
|
||||||
|
bind AssociationModel toProvider LiftAssociationModelProvider
|
4
nonpers-assignment/etc/mean.groovy
Normal file
4
nonpers-assignment/etc/mean.groovy
Normal file
|
@ -0,0 +1,4 @@
|
||||||
|
import org.lenskit.mooc.nonpers.mean.MeanItemBasedItemRecommender
|
||||||
|
import org.lenskit.api.ItemBasedItemRecommender
|
||||||
|
|
||||||
|
bind ItemBasedItemRecommender to MeanItemBasedItemRecommender
|
7
nonpers-assignment/etc/simple-assoc.groovy
Normal file
7
nonpers-assignment/etc/simple-assoc.groovy
Normal file
|
@ -0,0 +1,7 @@
|
||||||
|
import org.lenskit.api.ItemBasedItemRecommender
|
||||||
|
import org.lenskit.mooc.nonpers.assoc.AssociationItemBasedItemRecommender
|
||||||
|
import org.lenskit.mooc.nonpers.assoc.AssociationModel
|
||||||
|
import org.lenskit.mooc.nonpers.assoc.BasicAssociationModelProvider
|
||||||
|
|
||||||
|
bind ItemBasedItemRecommender to AssociationItemBasedItemRecommender
|
||||||
|
bind AssociationModel toProvider BasicAssociationModelProvider
|
6
nonpers-assignment/gradle/repositories.gradle
Normal file
6
nonpers-assignment/gradle/repositories.gradle
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
repositories {
|
||||||
|
mavenCentral()
|
||||||
|
maven {
|
||||||
|
url 'https://oss.sonatype.org/content/repositories/snapshots/'
|
||||||
|
}
|
||||||
|
}
|
BIN
nonpers-assignment/gradle/wrapper/gradle-wrapper.jar
vendored
Normal file
BIN
nonpers-assignment/gradle/wrapper/gradle-wrapper.jar
vendored
Normal file
Binary file not shown.
6
nonpers-assignment/gradle/wrapper/gradle-wrapper.properties
vendored
Normal file
6
nonpers-assignment/gradle/wrapper/gradle-wrapper.properties
vendored
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
#Fri Mar 25 17:48:43 CDT 2016
|
||||||
|
distributionBase=GRADLE_USER_HOME
|
||||||
|
distributionPath=wrapper/dists
|
||||||
|
zipStoreBase=GRADLE_USER_HOME
|
||||||
|
zipStorePath=wrapper/dists
|
||||||
|
distributionUrl=https\://services.gradle.org/distributions/gradle-2.14-bin.zip
|
160
nonpers-assignment/gradlew
vendored
Executable file
160
nonpers-assignment/gradlew
vendored
Executable file
|
@ -0,0 +1,160 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
##############################################################################
|
||||||
|
##
|
||||||
|
## Gradle start up script for UN*X
|
||||||
|
##
|
||||||
|
##############################################################################
|
||||||
|
|
||||||
|
# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
|
||||||
|
DEFAULT_JVM_OPTS=""
|
||||||
|
|
||||||
|
APP_NAME="Gradle"
|
||||||
|
APP_BASE_NAME=`basename "$0"`
|
||||||
|
|
||||||
|
# Use the maximum available, or set MAX_FD != -1 to use that value.
|
||||||
|
MAX_FD="maximum"
|
||||||
|
|
||||||
|
warn ( ) {
|
||||||
|
echo "$*"
|
||||||
|
}
|
||||||
|
|
||||||
|
die ( ) {
|
||||||
|
echo
|
||||||
|
echo "$*"
|
||||||
|
echo
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# OS specific support (must be 'true' or 'false').
|
||||||
|
cygwin=false
|
||||||
|
msys=false
|
||||||
|
darwin=false
|
||||||
|
case "`uname`" in
|
||||||
|
CYGWIN* )
|
||||||
|
cygwin=true
|
||||||
|
;;
|
||||||
|
Darwin* )
|
||||||
|
darwin=true
|
||||||
|
;;
|
||||||
|
MINGW* )
|
||||||
|
msys=true
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# Attempt to set APP_HOME
|
||||||
|
# Resolve links: $0 may be a link
|
||||||
|
PRG="$0"
|
||||||
|
# Need this for relative symlinks.
|
||||||
|
while [ -h "$PRG" ] ; do
|
||||||
|
ls=`ls -ld "$PRG"`
|
||||||
|
link=`expr "$ls" : '.*-> \(.*\)$'`
|
||||||
|
if expr "$link" : '/.*' > /dev/null; then
|
||||||
|
PRG="$link"
|
||||||
|
else
|
||||||
|
PRG=`dirname "$PRG"`"/$link"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
SAVED="`pwd`"
|
||||||
|
cd "`dirname \"$PRG\"`/" >/dev/null
|
||||||
|
APP_HOME="`pwd -P`"
|
||||||
|
cd "$SAVED" >/dev/null
|
||||||
|
|
||||||
|
CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
|
||||||
|
|
||||||
|
# Determine the Java command to use to start the JVM.
|
||||||
|
if [ -n "$JAVA_HOME" ] ; then
|
||||||
|
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
|
||||||
|
# IBM's JDK on AIX uses strange locations for the executables
|
||||||
|
JAVACMD="$JAVA_HOME/jre/sh/java"
|
||||||
|
else
|
||||||
|
JAVACMD="$JAVA_HOME/bin/java"
|
||||||
|
fi
|
||||||
|
if [ ! -x "$JAVACMD" ] ; then
|
||||||
|
die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
|
||||||
|
|
||||||
|
Please set the JAVA_HOME variable in your environment to match the
|
||||||
|
location of your Java installation."
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
JAVACMD="java"
|
||||||
|
which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
|
||||||
|
|
||||||
|
Please set the JAVA_HOME variable in your environment to match the
|
||||||
|
location of your Java installation."
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Increase the maximum file descriptors if we can.
|
||||||
|
if [ "$cygwin" = "false" -a "$darwin" = "false" ] ; then
|
||||||
|
MAX_FD_LIMIT=`ulimit -H -n`
|
||||||
|
if [ $? -eq 0 ] ; then
|
||||||
|
if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
|
||||||
|
MAX_FD="$MAX_FD_LIMIT"
|
||||||
|
fi
|
||||||
|
ulimit -n $MAX_FD
|
||||||
|
if [ $? -ne 0 ] ; then
|
||||||
|
warn "Could not set maximum file descriptor limit: $MAX_FD"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# For Darwin, add options to specify how the application appears in the dock
|
||||||
|
if $darwin; then
|
||||||
|
GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
|
||||||
|
fi
|
||||||
|
|
||||||
|
# For Cygwin, switch paths to Windows format before running java
|
||||||
|
if $cygwin ; then
|
||||||
|
APP_HOME=`cygpath --path --mixed "$APP_HOME"`
|
||||||
|
CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
|
||||||
|
JAVACMD=`cygpath --unix "$JAVACMD"`
|
||||||
|
|
||||||
|
# We build the pattern for arguments to be converted via cygpath
|
||||||
|
ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
|
||||||
|
SEP=""
|
||||||
|
for dir in $ROOTDIRSRAW ; do
|
||||||
|
ROOTDIRS="$ROOTDIRS$SEP$dir"
|
||||||
|
SEP="|"
|
||||||
|
done
|
||||||
|
OURCYGPATTERN="(^($ROOTDIRS))"
|
||||||
|
# Add a user-defined pattern to the cygpath arguments
|
||||||
|
if [ "$GRADLE_CYGPATTERN" != "" ] ; then
|
||||||
|
OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
|
||||||
|
fi
|
||||||
|
# Now convert the arguments - kludge to limit ourselves to /bin/sh
|
||||||
|
i=0
|
||||||
|
for arg in "$@" ; do
|
||||||
|
CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
|
||||||
|
CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option
|
||||||
|
|
||||||
|
if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition
|
||||||
|
eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
|
||||||
|
else
|
||||||
|
eval `echo args$i`="\"$arg\""
|
||||||
|
fi
|
||||||
|
i=$((i+1))
|
||||||
|
done
|
||||||
|
case $i in
|
||||||
|
(0) set -- ;;
|
||||||
|
(1) set -- "$args0" ;;
|
||||||
|
(2) set -- "$args0" "$args1" ;;
|
||||||
|
(3) set -- "$args0" "$args1" "$args2" ;;
|
||||||
|
(4) set -- "$args0" "$args1" "$args2" "$args3" ;;
|
||||||
|
(5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
|
||||||
|
(6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
|
||||||
|
(7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
|
||||||
|
(8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
|
||||||
|
(9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
|
||||||
|
esac
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Split up the JVM_OPTS And GRADLE_OPTS values into an array, following the shell quoting and substitution rules
|
||||||
|
function splitJvmOpts() {
|
||||||
|
JVM_OPTS=("$@")
|
||||||
|
}
|
||||||
|
eval splitJvmOpts $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS
|
||||||
|
JVM_OPTS[${#JVM_OPTS[*]}]="-Dorg.gradle.appname=$APP_BASE_NAME"
|
||||||
|
|
||||||
|
exec "$JAVACMD" "${JVM_OPTS[@]}" -classpath "$CLASSPATH" org.gradle.wrapper.GradleWrapperMain "$@"
|
90
nonpers-assignment/gradlew.bat
vendored
Normal file
90
nonpers-assignment/gradlew.bat
vendored
Normal file
|
@ -0,0 +1,90 @@
|
||||||
|
@if "%DEBUG%" == "" @echo off
|
||||||
|
@rem ##########################################################################
|
||||||
|
@rem
|
||||||
|
@rem Gradle startup script for Windows
|
||||||
|
@rem
|
||||||
|
@rem ##########################################################################
|
||||||
|
|
||||||
|
@rem Set local scope for the variables with windows NT shell
|
||||||
|
if "%OS%"=="Windows_NT" setlocal
|
||||||
|
|
||||||
|
@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
|
||||||
|
set DEFAULT_JVM_OPTS=
|
||||||
|
|
||||||
|
set DIRNAME=%~dp0
|
||||||
|
if "%DIRNAME%" == "" set DIRNAME=.
|
||||||
|
set APP_BASE_NAME=%~n0
|
||||||
|
set APP_HOME=%DIRNAME%
|
||||||
|
|
||||||
|
@rem Find java.exe
|
||||||
|
if defined JAVA_HOME goto findJavaFromJavaHome
|
||||||
|
|
||||||
|
set JAVA_EXE=java.exe
|
||||||
|
%JAVA_EXE% -version >NUL 2>&1
|
||||||
|
if "%ERRORLEVEL%" == "0" goto init
|
||||||
|
|
||||||
|
echo.
|
||||||
|
echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
|
||||||
|
echo.
|
||||||
|
echo Please set the JAVA_HOME variable in your environment to match the
|
||||||
|
echo location of your Java installation.
|
||||||
|
|
||||||
|
goto fail
|
||||||
|
|
||||||
|
:findJavaFromJavaHome
|
||||||
|
set JAVA_HOME=%JAVA_HOME:"=%
|
||||||
|
set JAVA_EXE=%JAVA_HOME%/bin/java.exe
|
||||||
|
|
||||||
|
if exist "%JAVA_EXE%" goto init
|
||||||
|
|
||||||
|
echo.
|
||||||
|
echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
|
||||||
|
echo.
|
||||||
|
echo Please set the JAVA_HOME variable in your environment to match the
|
||||||
|
echo location of your Java installation.
|
||||||
|
|
||||||
|
goto fail
|
||||||
|
|
||||||
|
:init
|
||||||
|
@rem Get command-line arguments, handling Windowz variants
|
||||||
|
|
||||||
|
if not "%OS%" == "Windows_NT" goto win9xME_args
|
||||||
|
if "%@eval[2+2]" == "4" goto 4NT_args
|
||||||
|
|
||||||
|
:win9xME_args
|
||||||
|
@rem Slurp the command line arguments.
|
||||||
|
set CMD_LINE_ARGS=
|
||||||
|
set _SKIP=2
|
||||||
|
|
||||||
|
:win9xME_args_slurp
|
||||||
|
if "x%~1" == "x" goto execute
|
||||||
|
|
||||||
|
set CMD_LINE_ARGS=%*
|
||||||
|
goto execute
|
||||||
|
|
||||||
|
:4NT_args
|
||||||
|
@rem Get arguments from the 4NT Shell from JP Software
|
||||||
|
set CMD_LINE_ARGS=%$
|
||||||
|
|
||||||
|
:execute
|
||||||
|
@rem Setup the command line
|
||||||
|
|
||||||
|
set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
|
||||||
|
|
||||||
|
@rem Execute Gradle
|
||||||
|
"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS%
|
||||||
|
|
||||||
|
:end
|
||||||
|
@rem End local scope for the variables with windows NT shell
|
||||||
|
if "%ERRORLEVEL%"=="0" goto mainEnd
|
||||||
|
|
||||||
|
:fail
|
||||||
|
rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
|
||||||
|
rem the _cmd.exe /c_ return code!
|
||||||
|
if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
|
||||||
|
exit /b 1
|
||||||
|
|
||||||
|
:mainEnd
|
||||||
|
if "%OS%"=="Windows_NT" endlocal
|
||||||
|
|
||||||
|
:omega
|
BIN
nonpers-assignment/nonpers-description.pdf
Normal file
BIN
nonpers-assignment/nonpers-description.pdf
Normal file
Binary file not shown.
2
nonpers-assignment/settings.gradle
Normal file
2
nonpers-assignment/settings.gradle
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
|
||||||
|
rootProject.name = "nonpers-assignment"
|
|
@ -0,0 +1,73 @@
|
||||||
|
package org.lenskit.mooc.nonpers.assoc;
|
||||||
|
|
||||||
|
import it.unimi.dsi.fastutil.longs.LongSet;
|
||||||
|
import org.lenskit.api.Result;
|
||||||
|
import org.lenskit.api.ResultList;
|
||||||
|
import org.lenskit.basic.AbstractItemBasedItemRecommender;
|
||||||
|
import org.lenskit.results.Results;
|
||||||
|
import org.lenskit.util.collections.LongUtils;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import javax.annotation.Nullable;
|
||||||
|
import javax.inject.Inject;
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Set;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An item-based item scorer that uses association rules.
|
||||||
|
*/
|
||||||
|
public class AssociationItemBasedItemRecommender extends AbstractItemBasedItemRecommender {
|
||||||
|
private static final Logger logger = LoggerFactory.getLogger(AssociationItemBasedItemRecommender.class);
|
||||||
|
private final AssociationModel model;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct the item scorer.
|
||||||
|
*
|
||||||
|
* @param m The association rule model.
|
||||||
|
*/
|
||||||
|
@Inject
|
||||||
|
public AssociationItemBasedItemRecommender(AssociationModel m) {
|
||||||
|
model = m;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public ResultList recommendRelatedItemsWithDetails(Set<Long> basket, int n, @Nullable Set<Long> candidates, @Nullable Set<Long> exclude) {
|
||||||
|
LongSet items;
|
||||||
|
if (candidates == null) {
|
||||||
|
items = model.getKnownItems();
|
||||||
|
} else {
|
||||||
|
items = LongUtils.asLongSet(candidates);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (exclude != null) {
|
||||||
|
items = LongUtils.setDifference(items, LongUtils.asLongSet(exclude));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (basket.isEmpty()) {
|
||||||
|
return Results.newResultList();
|
||||||
|
} else if (basket.size() > 1) {
|
||||||
|
logger.warn("Reference set has more than 1 item, picking arbitrarily.");
|
||||||
|
}
|
||||||
|
|
||||||
|
long refItem = basket.iterator().next();
|
||||||
|
|
||||||
|
return recommendItems(n, refItem, items);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Recommend items with an association rule.
|
||||||
|
* @param n The number of recommendations to produce.
|
||||||
|
* @param refItem The reference item.
|
||||||
|
* @param candidates The candidate items (set of items that can possibly be recommended).
|
||||||
|
* @return The list of results.
|
||||||
|
*/
|
||||||
|
private ResultList recommendItems(int n, long refItem, LongSet candidates) {
|
||||||
|
List<Result> results = new ArrayList<>();
|
||||||
|
|
||||||
|
// TODO Compute the n highest-scoring items from candidates
|
||||||
|
|
||||||
|
return Results.newResultList(results);
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,90 @@
|
||||||
|
package org.lenskit.mooc.nonpers.assoc;
|
||||||
|
|
||||||
|
import com.google.common.base.Preconditions;
|
||||||
|
import it.unimi.dsi.fastutil.longs.LongSet;
|
||||||
|
import org.lenskit.inject.Shareable;
|
||||||
|
import org.lenskit.util.keys.SortedKeyIndex;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import java.io.Serializable;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An association rule model, storing item-item association scores.
|
||||||
|
*
|
||||||
|
* <p>You <strong>should note</strong> need to change this class. It has some internal optimizations to reduce
|
||||||
|
* the memory requirements after the model is built.</p>
|
||||||
|
*/
|
||||||
|
@Shareable
|
||||||
|
public class AssociationModel implements Serializable {
|
||||||
|
private static final Logger logger = LoggerFactory.getLogger(AssociationModel.class);
|
||||||
|
private static final long serialVersionUID = 1L;
|
||||||
|
|
||||||
|
private final SortedKeyIndex index;
|
||||||
|
private final double[][] scores;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct a new association model.
|
||||||
|
* @param assocScores The association scores. The outer map's keys are the X items, and the inner map's keys are
|
||||||
|
* the Y items. So {@code assocScores.get(x).get(y)} should return the score for {@code y}
|
||||||
|
* with respect to {@code x}.
|
||||||
|
*/
|
||||||
|
public AssociationModel(Map<Long, ? extends Map<Long,Double>> assocScores) {
|
||||||
|
index = SortedKeyIndex.fromCollection(assocScores.keySet());
|
||||||
|
int n = index.size();
|
||||||
|
logger.debug("transforming input map for {} items into log data", n);
|
||||||
|
scores = new double[n][n];
|
||||||
|
for (int i = 0; i < n; i++) {
|
||||||
|
long itemX = index.getKey(i);
|
||||||
|
for (int j = 0; j < n; j++) {
|
||||||
|
if (i == j) {
|
||||||
|
continue; // skip self-similarities
|
||||||
|
}
|
||||||
|
|
||||||
|
long itemY = index.getKey(j);
|
||||||
|
Double score = assocScores.get(itemX).get(itemY);
|
||||||
|
if (score == null) {
|
||||||
|
logger.error("no score found for items {} and {}", itemX, itemY);
|
||||||
|
String msg = String.format("no score found for x=%d, y=%d", itemX, itemY);
|
||||||
|
throw new IllegalArgumentException(msg);
|
||||||
|
}
|
||||||
|
scores[i][j] = score;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the set of known items.
|
||||||
|
* @return The set of known item IDs.
|
||||||
|
*/
|
||||||
|
public LongSet getKnownItems() {
|
||||||
|
return index.keySet();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Query whether the model knows about an item.
|
||||||
|
* @param item The item ID.
|
||||||
|
* @return {@code true} if the model knows about the item {@code item}, {@code false} otherwise.
|
||||||
|
*/
|
||||||
|
public boolean hasItem(long item) {
|
||||||
|
return index.containsKey(item);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the association between two items.
|
||||||
|
* @param ref The reference item (X).
|
||||||
|
* @param item The item to score (Y).
|
||||||
|
* @return The score between X and Y.
|
||||||
|
* @throws IllegalArgumentException if either item is invalid.
|
||||||
|
*/
|
||||||
|
public double getItemAssociation(long ref, long item) {
|
||||||
|
// look up item positions
|
||||||
|
int refIndex = index.tryGetIndex(ref);
|
||||||
|
Preconditions.checkArgument(refIndex >= 0, "unknown reference item %d", ref);
|
||||||
|
int itemIndex = index.tryGetIndex(item);
|
||||||
|
Preconditions.checkArgument(itemIndex >= 0, "unknown target item %d", item);
|
||||||
|
|
||||||
|
return scores[refIndex][itemIndex];
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,82 @@
|
||||||
|
package org.lenskit.mooc.nonpers.assoc;
|
||||||
|
|
||||||
|
import it.unimi.dsi.fastutil.longs.*;
|
||||||
|
import org.lenskit.data.dao.DataAccessObject;
|
||||||
|
import org.lenskit.data.entities.CommonAttributes;
|
||||||
|
import org.lenskit.data.ratings.Rating;
|
||||||
|
import org.lenskit.inject.Transient;
|
||||||
|
import org.lenskit.util.IdBox;
|
||||||
|
import org.lenskit.util.collections.LongUtils;
|
||||||
|
import org.lenskit.util.io.ObjectStream;
|
||||||
|
|
||||||
|
import javax.inject.Inject;
|
||||||
|
import javax.inject.Provider;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Build a model for basic association rules. This class computes the association for all pairs of items.
|
||||||
|
*/
|
||||||
|
public class BasicAssociationModelProvider implements Provider<AssociationModel> {
|
||||||
|
private final DataAccessObject dao;
|
||||||
|
|
||||||
|
@Inject
|
||||||
|
public BasicAssociationModelProvider(@Transient DataAccessObject dao) {
|
||||||
|
this.dao = dao;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public AssociationModel get() {
|
||||||
|
// First step: map each item to the set of users who have rated it.
|
||||||
|
|
||||||
|
// This map will map each item ID to the set of users who have rated it.
|
||||||
|
Long2ObjectMap<LongSortedSet> itemUsers = new Long2ObjectOpenHashMap<>();
|
||||||
|
LongSet allUsers = new LongOpenHashSet();
|
||||||
|
|
||||||
|
// Open a stream, grouping ratings by item ID
|
||||||
|
try (ObjectStream<IdBox<List<Rating>>> ratingStream = dao.query(Rating.class)
|
||||||
|
.groupBy(CommonAttributes.ITEM_ID)
|
||||||
|
.stream()) {
|
||||||
|
// Process each item's ratings
|
||||||
|
for (IdBox<List<Rating>> item: ratingStream) {
|
||||||
|
// Build a set of users. We build an array first, then convert to a set.
|
||||||
|
LongList users = new LongArrayList();
|
||||||
|
// Add each rating's user ID to the user sets
|
||||||
|
for (Rating r: item.getValue()) {
|
||||||
|
long user = r.getUserId();
|
||||||
|
users.add(user);
|
||||||
|
allUsers.add(user);
|
||||||
|
}
|
||||||
|
// put this item's user set into the item user map
|
||||||
|
// a frozen set will be very efficient later
|
||||||
|
itemUsers.put(item.getId(), LongUtils.frozenSet(users));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Second step: compute all association rules
|
||||||
|
|
||||||
|
// We need a map to store them
|
||||||
|
Long2ObjectMap<Long2DoubleMap> assocMatrix = new Long2ObjectOpenHashMap<>();
|
||||||
|
|
||||||
|
// then loop over 'x' items
|
||||||
|
for (Long2ObjectMap.Entry<LongSortedSet> xEntry: itemUsers.long2ObjectEntrySet()) {
|
||||||
|
long xId = xEntry.getLongKey();
|
||||||
|
LongSortedSet xUsers = xEntry.getValue();
|
||||||
|
|
||||||
|
// set up a map to hold the scores for each 'y' item for this 'x'
|
||||||
|
Long2DoubleMap itemScores = new Long2DoubleOpenHashMap();
|
||||||
|
|
||||||
|
// loop over the 'y' items
|
||||||
|
for (Long2ObjectMap.Entry<LongSortedSet> yEntry: itemUsers.long2ObjectEntrySet()) {
|
||||||
|
long yId = yEntry.getLongKey();
|
||||||
|
LongSortedSet yUsers = yEntry.getValue();
|
||||||
|
|
||||||
|
// TODO Compute P(Y & X) / P(X) and store in itemScores
|
||||||
|
}
|
||||||
|
|
||||||
|
// save the score map to the main map
|
||||||
|
assocMatrix.put(xId, itemScores);
|
||||||
|
}
|
||||||
|
|
||||||
|
return new AssociationModel(assocMatrix);
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,83 @@
|
||||||
|
package org.lenskit.mooc.nonpers.assoc;
|
||||||
|
|
||||||
|
import it.unimi.dsi.fastutil.longs.*;
|
||||||
|
import org.lenskit.data.dao.DataAccessObject;
|
||||||
|
import org.lenskit.data.entities.CommonAttributes;
|
||||||
|
import org.lenskit.data.ratings.Rating;
|
||||||
|
import org.lenskit.inject.Transient;
|
||||||
|
import org.lenskit.util.IdBox;
|
||||||
|
import org.lenskit.util.collections.LongUtils;
|
||||||
|
import org.lenskit.util.io.ObjectStream;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import javax.inject.Inject;
|
||||||
|
import javax.inject.Provider;
|
||||||
|
import java.util.List;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Build an association rule model using a lift metric.
|
||||||
|
*/
|
||||||
|
public class LiftAssociationModelProvider implements Provider<AssociationModel> {
|
||||||
|
private static final Logger logger = LoggerFactory.getLogger(LiftAssociationModelProvider.class);
|
||||||
|
private final DataAccessObject dao;
|
||||||
|
|
||||||
|
@Inject
|
||||||
|
public LiftAssociationModelProvider(@Transient DataAccessObject dao) {
|
||||||
|
this.dao = dao;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public AssociationModel get() {
|
||||||
|
// First step: map each item to the set of users who have rated it.
|
||||||
|
// While we're at it, compute the set of all users.
|
||||||
|
|
||||||
|
// This set contains all users.
|
||||||
|
LongSet allUsers = new LongOpenHashSet();
|
||||||
|
|
||||||
|
// This map will map each item ID to the set of users who have rated it.
|
||||||
|
Long2ObjectMap<LongSortedSet> itemUsers = new Long2ObjectOpenHashMap<>();
|
||||||
|
|
||||||
|
// Open a stream, grouping ratings by item ID
|
||||||
|
try (ObjectStream<IdBox<List<Rating>>> ratingStream = dao.query(Rating.class)
|
||||||
|
.groupBy(CommonAttributes.ITEM_ID)
|
||||||
|
.stream()) {
|
||||||
|
// Process each item's ratings
|
||||||
|
for (IdBox<List<Rating>> item: ratingStream) {
|
||||||
|
// Build a set of users. We build an array first, then convert to a set.
|
||||||
|
LongList users = new LongArrayList();
|
||||||
|
// Add each rating's user ID to the user sets
|
||||||
|
for (Rating r: item.getValue()) {
|
||||||
|
long user = r.getUserId();
|
||||||
|
users.add(user);
|
||||||
|
allUsers.add(user);
|
||||||
|
}
|
||||||
|
// put this item's user set into the item user map
|
||||||
|
// a frozen set will be very efficient later
|
||||||
|
itemUsers.put(item.getId(), LongUtils.frozenSet(users));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Second step: compute all association rules
|
||||||
|
|
||||||
|
// We need a map to store them
|
||||||
|
Long2ObjectMap<Long2DoubleMap> assocMatrix = new Long2ObjectOpenHashMap<>();
|
||||||
|
|
||||||
|
|
||||||
|
// then loop over 'x' items
|
||||||
|
for (Long2ObjectMap.Entry<LongSortedSet> xEntry: itemUsers.long2ObjectEntrySet()) {
|
||||||
|
long xId = xEntry.getLongKey();
|
||||||
|
LongSortedSet xUsers = xEntry.getValue();
|
||||||
|
|
||||||
|
// set up a map to hold the scores for each 'y' item
|
||||||
|
Long2DoubleMap itemScores = new Long2DoubleOpenHashMap();
|
||||||
|
|
||||||
|
// TODO Compute lift association formulas for all other 'Y' items with respect to this 'X'
|
||||||
|
|
||||||
|
// save the score map to the main map
|
||||||
|
assocMatrix.put(xId, itemScores);
|
||||||
|
}
|
||||||
|
|
||||||
|
return new AssociationModel(assocMatrix);
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,68 @@
|
||||||
|
package org.lenskit.mooc.nonpers.mean;
|
||||||
|
|
||||||
|
import it.unimi.dsi.fastutil.longs.Long2DoubleOpenHashMap;
|
||||||
|
import it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap;
|
||||||
|
import org.lenskit.baseline.MeanDamping;
|
||||||
|
import org.lenskit.data.dao.DataAccessObject;
|
||||||
|
import org.lenskit.data.ratings.Rating;
|
||||||
|
import org.lenskit.inject.Transient;
|
||||||
|
import org.lenskit.util.io.ObjectStream;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import javax.inject.Inject;
|
||||||
|
import javax.inject.Provider;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Provider class that builds the mean rating item scorer, computing damped item means from the
|
||||||
|
* ratings in the DAO.
|
||||||
|
*/
|
||||||
|
public class DampedItemMeanModelProvider implements Provider<ItemMeanModel> {
|
||||||
|
/**
|
||||||
|
* A logger that you can use to emit debug messages.
|
||||||
|
*/
|
||||||
|
private static final Logger logger = LoggerFactory.getLogger(DampedItemMeanModelProvider.class);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The data access object, to be used when computing the mean ratings.
|
||||||
|
*/
|
||||||
|
private final DataAccessObject dao;
|
||||||
|
/**
|
||||||
|
* The damping factor.
|
||||||
|
*/
|
||||||
|
private final double damping;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Constructor for the mean item score provider.
|
||||||
|
*
|
||||||
|
* <p>The {@code @Inject} annotation tells LensKit to use this constructor.
|
||||||
|
*
|
||||||
|
* @param dao The data access object (DAO), where the builder will get ratings. The {@code @Transient}
|
||||||
|
* annotation on this parameter means that the DAO will be used to build the model, but the
|
||||||
|
* model will <strong>not</strong> retain a reference to the DAO. This is standard procedure
|
||||||
|
* for LensKit models.
|
||||||
|
* @param damping The damping factor for Bayesian damping. This is number of fake global-mean ratings to
|
||||||
|
* assume. It is provided as a parameter so that it can be reconfigured. See the file
|
||||||
|
* {@code damped-mean.groovy} for how it is used.
|
||||||
|
*/
|
||||||
|
@Inject
|
||||||
|
public DampedItemMeanModelProvider(@Transient DataAccessObject dao,
|
||||||
|
@MeanDamping double damping) {
|
||||||
|
this.dao = dao;
|
||||||
|
this.damping = damping;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct an item mean model.
|
||||||
|
*
|
||||||
|
* <p>The {@link Provider#get()} method constructs whatever object the provider class is intended to build.</p>
|
||||||
|
*
|
||||||
|
* @return The item mean model with mean ratings for all items.
|
||||||
|
*/
|
||||||
|
@Override
|
||||||
|
public ItemMeanModel get() {
|
||||||
|
// TODO Compute damped means
|
||||||
|
// TODO Remove the line below when you have finished
|
||||||
|
throw new UnsupportedOperationException("damped mean not implemented");
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,68 @@
|
||||||
|
package org.lenskit.mooc.nonpers.mean;
|
||||||
|
|
||||||
|
import com.google.common.base.Preconditions;
|
||||||
|
import it.unimi.dsi.fastutil.longs.Long2DoubleMap;
|
||||||
|
import it.unimi.dsi.fastutil.longs.LongSet;
|
||||||
|
import org.grouplens.grapht.annotation.DefaultProvider;
|
||||||
|
import org.lenskit.inject.Shareable;
|
||||||
|
import org.lenskit.util.collections.LongUtils;
|
||||||
|
|
||||||
|
import javax.annotation.concurrent.Immutable;
|
||||||
|
import java.io.Serializable;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A <em>model</em> class that stores item mean ratings.
|
||||||
|
*
|
||||||
|
* <p>The {@link Shareable} annotation is common for model objects, and tells LensKit that the class can be shared
|
||||||
|
* between multiple recommender instances.</p>
|
||||||
|
*
|
||||||
|
* <p>The {@link DefaultProvider} annotation tells LensKit to use a <em>provider class</em> — the mean item scorer
|
||||||
|
* provider — to create instances of this class.</p>
|
||||||
|
*
|
||||||
|
* <p>You <strong>should not</strong> need to make any changes to this class.</p>
|
||||||
|
*/
|
||||||
|
@Shareable
|
||||||
|
@Immutable
|
||||||
|
@DefaultProvider(ItemMeanModelProvider.class)
|
||||||
|
public class ItemMeanModel implements Serializable {
|
||||||
|
private static final long serialVersionUID = 1L;
|
||||||
|
|
||||||
|
private final Long2DoubleMap itemMeans;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct a new item mean model.
|
||||||
|
* @param means A map of item IDs to their mean ratings.
|
||||||
|
*/
|
||||||
|
public ItemMeanModel(Map<Long, Double> means) {
|
||||||
|
itemMeans = LongUtils.frozenMap(means);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the set of items known by the model.
|
||||||
|
* @return The set of items known by the model.
|
||||||
|
*/
|
||||||
|
public LongSet getKnownItems() {
|
||||||
|
return itemMeans.keySet();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Query whether this model knows about an item.
|
||||||
|
* @param item The item ID.
|
||||||
|
* @return {@code true} if the item is known by the model, {@code false} otherwise.
|
||||||
|
*/
|
||||||
|
public boolean hasItem(long item) {
|
||||||
|
return itemMeans.containsKey(item);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Get the mean rating for an item.
|
||||||
|
* @param item The item ID.
|
||||||
|
* @return The mean rating.
|
||||||
|
* @throws IllegalArgumentException if the item is not a known itemm.
|
||||||
|
*/
|
||||||
|
public double getMeanRating(long item) {
|
||||||
|
Preconditions.checkArgument(hasItem(item), "unknown item " + item);
|
||||||
|
return itemMeans.get(item);
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,69 @@
|
||||||
|
package org.lenskit.mooc.nonpers.mean;
|
||||||
|
|
||||||
|
import it.unimi.dsi.fastutil.longs.Long2DoubleOpenHashMap;
|
||||||
|
import it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap;
|
||||||
|
import org.lenskit.data.dao.DataAccessObject;
|
||||||
|
import org.lenskit.data.ratings.Rating;
|
||||||
|
import org.lenskit.inject.Transient;
|
||||||
|
import org.lenskit.util.io.ObjectStream;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import javax.inject.Inject;
|
||||||
|
import javax.inject.Provider;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Provider class that builds the mean rating item scorer, computing item means from the
|
||||||
|
* ratings in the DAO.
|
||||||
|
*/
|
||||||
|
public class ItemMeanModelProvider implements Provider<ItemMeanModel> {
|
||||||
|
/**
|
||||||
|
* A logger that you can use to emit debug messages.
|
||||||
|
*/
|
||||||
|
private static final Logger logger = LoggerFactory.getLogger(ItemMeanModelProvider.class);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The data access object, to be used when computing the mean ratings.
|
||||||
|
*/
|
||||||
|
private final DataAccessObject dao;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Constructor for the mean item score provider.
|
||||||
|
*
|
||||||
|
* <p>The {@code @Inject} annotation tells LensKit to use this constructor.
|
||||||
|
*
|
||||||
|
* @param dao The data access object (DAO), where the builder will get ratings. The {@code @Transient}
|
||||||
|
* annotation on this parameter means that the DAO will be used to build the model, but the
|
||||||
|
* model will <strong>not</strong> retain a reference to the DAO. This is standard procedure
|
||||||
|
* for LensKit models.
|
||||||
|
*/
|
||||||
|
@Inject
|
||||||
|
public ItemMeanModelProvider(@Transient DataAccessObject dao) {
|
||||||
|
this.dao = dao;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct an item mean model.
|
||||||
|
*
|
||||||
|
* <p>The {@link Provider#get()} method constructs whatever object the provider class is intended to build.</p>
|
||||||
|
*
|
||||||
|
* @return The item mean model with mean ratings for all items.
|
||||||
|
*/
|
||||||
|
@Override
|
||||||
|
public ItemMeanModel get() {
|
||||||
|
// TODO Set up data structures for computing means
|
||||||
|
|
||||||
|
try (ObjectStream<Rating> ratings = dao.query(Rating.class).stream()) {
|
||||||
|
for (Rating r: ratings) {
|
||||||
|
// this loop will run once for each rating in the data set
|
||||||
|
// TODO process this rating
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Long2DoubleOpenHashMap means = new Long2DoubleOpenHashMap();
|
||||||
|
// TODO Finalize means to store them in the mean model
|
||||||
|
|
||||||
|
logger.info("computed mean ratings for {} items", means.size());
|
||||||
|
return new ItemMeanModel(means);
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,92 @@
|
||||||
|
package org.lenskit.mooc.nonpers.mean;
|
||||||
|
|
||||||
|
import it.unimi.dsi.fastutil.longs.LongSet;
|
||||||
|
import org.lenskit.api.Result;
|
||||||
|
import org.lenskit.api.ResultList;
|
||||||
|
import org.lenskit.api.ResultMap;
|
||||||
|
import org.lenskit.basic.AbstractItemBasedItemRecommender;
|
||||||
|
import org.lenskit.results.Results;
|
||||||
|
import org.lenskit.util.collections.LongUtils;
|
||||||
|
import org.slf4j.Logger;
|
||||||
|
import org.slf4j.LoggerFactory;
|
||||||
|
|
||||||
|
import javax.annotation.Nullable;
|
||||||
|
import javax.inject.Inject;
|
||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.List;
|
||||||
|
import java.util.Set;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An item scorer that scores each item with its mean rating.
|
||||||
|
*/
|
||||||
|
public class MeanItemBasedItemRecommender extends AbstractItemBasedItemRecommender {
|
||||||
|
private static final Logger logger = LoggerFactory.getLogger(MeanItemBasedItemRecommender.class);
|
||||||
|
|
||||||
|
private final ItemMeanModel model;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Construct a mean global item scorer.
|
||||||
|
*
|
||||||
|
* <p>The {@code @Inject} annotation tells LensKit to use this constructor.</p>
|
||||||
|
*
|
||||||
|
* @param m The model containing item mean ratings. LensKit will automatically build an {@link ItemMeanModel}
|
||||||
|
* object. Its use as a parameter type in this constructor declares it as a <em>dependency</em> of the
|
||||||
|
* mean-based item scorer.
|
||||||
|
*/
|
||||||
|
@Inject
|
||||||
|
public MeanItemBasedItemRecommender(ItemMeanModel m) {
|
||||||
|
model = m;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* {@inheritDoc}
|
||||||
|
*
|
||||||
|
* This is the LensKit recommend method. It takes several parameters; we implement it for you in terms of a
|
||||||
|
* simpler method ({@link #recommendItems(int, LongSet)}).
|
||||||
|
*/
|
||||||
|
@Override
|
||||||
|
public ResultList recommendRelatedItemsWithDetails(Set<Long> basket, int n, @Nullable Set<Long> candidates, @Nullable Set<Long> exclude) {
|
||||||
|
LongSet items;
|
||||||
|
if (candidates == null) {
|
||||||
|
items = model.getKnownItems();
|
||||||
|
} else {
|
||||||
|
items = LongUtils.asLongSet(candidates);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (exclude != null) {
|
||||||
|
items = LongUtils.setDifference(items, LongUtils.asLongSet(exclude));
|
||||||
|
}
|
||||||
|
|
||||||
|
logger.info("computing {} recommendations from {} items", n, items.size());
|
||||||
|
|
||||||
|
return recommendItems(n, items);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Recommend some items from a set of candidate items.
|
||||||
|
*
|
||||||
|
* <p>Your code needs to obtain the mean rating, if one is available, for each item, and return a list of the
|
||||||
|
* {@code n} highest-rated items, in decreasing order of score.</p>
|
||||||
|
*
|
||||||
|
* <p>To create the {@link ResultMap} data structure, do the following:</p>
|
||||||
|
*
|
||||||
|
* <ol>
|
||||||
|
* <li>Create a {@link List} to hold {@link Result} objects.</li>
|
||||||
|
* <li>Create a result object for each item that can be scored. Use {@link Results#create(long, double)} to
|
||||||
|
* create the result object. If an item cannot be scored (because there is no mean available), ignore it and
|
||||||
|
* do not add a result to the list.</li>
|
||||||
|
* <li>Convert the list of results to a {@link ResultList} using {@link Results#newResultList(List)}.</li>
|
||||||
|
* </ol>
|
||||||
|
*
|
||||||
|
* @param n The number of items to recommend. If this is negative, then recommend all possible items.
|
||||||
|
* @param items The items to score.
|
||||||
|
* @return A {@link ResultMap} containing the scores.
|
||||||
|
*/
|
||||||
|
private ResultList recommendItems(int n, LongSet items) {
|
||||||
|
List<Result> results = new ArrayList<>();
|
||||||
|
|
||||||
|
// TODO Find the top N items by mean rating
|
||||||
|
|
||||||
|
return Results.newResultList(results);
|
||||||
|
}
|
||||||
|
}
|
Loading…
Reference in a new issue