coursera-recommender-systems/nonpers-assignment/README.md
2021-09-06 16:04:46 -05:00

216 lines
9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Non-Personalized Recommender Assignment
In this assignment, you will implement some non-personalized recommenders. In particular, you will
implement raw and damped item mean recommenders and simple and advanced association rule
recommenders.
You will implement these recommenders in the LensKit toolkit.
## Downloads and Resources
- Project template (from Coursera)
- [LensKit for Teaching website](http://mooc.lenskit.org) (links to relevant documentation)
- [JavaDoc for included code](http://mooc.lenskit.org/assignments/nonpers/javadoc/)
- [Fastutil API docs](http://fastutil.di.unimi.it/docs/) documents the Fastutil optimized data
structure classes that are used in portions of LensKit.
The project template contains support code, the build file, and the input data that you will use.
## Input Data
The input data contains the following files:
- `ratings.csv` contains user ratings of movies
- `movies.csv` contains movie titles
- `movielens.yml` is a LensKit data manifest that describes the other input files
## Getting Started
To get started with this assignment, unpack the template and import it in to your IDE as a Gradle
project. The assignment video demonstrates how to do this in IntelliJ IDEA.
## Mean-Based Recommendation
The first two recommenders you will implement will recommend items with the highest average rating.
With LensKit's scorer-model-builder architecture, you will just need to write the recommendation
logic once, and you will implement two different mechanisms for computing item mean ratings.
You will work with the following classes:
- `MeanItemBasedItemRecommender` (the *item recommender*) computes top-*N* recommendations based
on mean ratings. You will implement the logic to compute such recommendation lists.
- `ItemMeanModel` is a *model class* that stores precomputed item means. You will not need to
modify this class, but you will write code to construct instances of it and use it in your
item recommender implementation.
- `ItemMeanModelProvider` computes item mean ratings from rating data and constructs the model.
It computes raw means with no damping.
- `DampedItemMeanModelProvider` is an alternate builder for item mean models that computes
damped means instead of raw means. It takes the damping term as a parameter. The configuration
file we provide you uses a damping term of 5.
There are `// TODO` comments in all places where you need to write new code.
### Computing Item Means
Modify the `ItemMeanModelProvider` class to compute the mean rating for each item.
### Recommending Items
Modify the `MeanItemBasedItemRecommender` class to compute recommendations based on item mean
ratings. For this, you need to:
1. Obtain the mean rating for each item
2. Order the items in decreasing order
3. Return the *N* highest-rated items
### Computing Damped Item Means
Modify the `DampedItemMeanModelProvider` class to compute the damped mean rating for each item.
This formula uses a damping factor $\alpha$, which is the number of 'fake' ratings at the global
mean to assume for each item. In the Java code, this is available as the field `damping`.
The damped mean formula, as you may recall, is:
$$s(i) = \frac{\sum_{u \in U_i} r_{ui} + \alpha\mu}{|U_i| + \alpha}$$
where $\mu$ is the *global* mean rating.
### Example Outputs
To help you see if your output is correct, we have provided the following example correct values:
| ID | Title | Mean | Damped Mean |
| :-: | :---- | :--: | :---------: |
| 2959 | *Fight Club* | 4.259 | 4.252 |
| 1203 | *12 Angry Men* | 4.246 | 4.227 |
## Association Rules
In the second part of the assignment, you will implement two versions of an association rule
recommender.
The association rule implementation consists of the following code:
- `AssociationItemBasedItemRecommender` recommends items using association rules. Unlike the mean
recommenders, this recommender uses a *reference item* to compute the recommendations.
- `AssociationModel` stores the association rule scores between pairs of items. You will not need
to modify this class.
- `BasicAssociationModelProvider` computes an association rule model using the basic association
rule formula ($P(X \wedge Y) / P(X)$).
- `LiftAssociationModelProvider` computes an association rule model using the lift formula ($P(X \wedge Y) / P(X) P(Y)$).
### Computing Association Scores
Like with the mean-based recommender, we pre-compute product association scores and store them in
a model before recommendation. We compute the scores between *all pairs* of items, so that the
model can be used to score any item. When computing a single recommendation from the command line,
this does not provide much benefit, but is useful in the general case so that the model can be used
to very quickly compute many recommendations.
The `BasicAssociationModelProvider` class computes the association rule scores using the following
formula:
$$P(i|j) = \frac{P(i \wedge j)}{P(j))} = \frac{|U_i \cap U_j|/|U|}{|U_j|/|U|}$$
In this case, $j$ is the *reference* item and $i$ is the item to be scored.
We estimate probabilities by counting: $P(i)$ is the fraction of users in the system
who purchased item $i$; $P(i \wedge j)$ is the fraction that purchased both $i$ and $j$.
**Implement the association rule computation in this class.**
### Computing Recommendations
Implement the recommendation logic in `AssociationItemBasedItemRecommender` to recommend items
related to a given reference item. As with the mean recommender, it should compute the top *N*
recommendations and return them.
### Computing Advanced Association Rules
The `LiftAssociationModelProvider` recommender uses the *lift* metric that computes how
much more likely someone is to rate a movie $i$ when they have rated $j$ than they would have if we do not know anything about whether they have rated $j$:
$$s(i|j) = \frac{P(j \wedge i)}{P(i) P(j)}$$
### Example Outputs
Following is the correct output for the basic association rules with reference item 260 (*Star Wars*), as generated with `./gradlew runBasicAssoc -PreferenceItemm=260`:
2571 (Matrix, The (1999)): 0.916
1196 (Star Wars: Episode V - The Empire Strikes Back (1980)): 0.899
4993 (Lord of the Rings: The Fellowship of the Ring, The (2001)): 0.892
1210 (Star Wars: Episode VI - Return of the Jedi (1983)): 0.847
356 (Forrest Gump (1994)): 0.843
5952 (Lord of the Rings: The Two Towers, The (2002)): 0.841
7153 (Lord of the Rings: The Return of the King, The (2003)): 0.830
296 (Pulp Fiction (1994)): 0.828
1198 (Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)): 0.791
480 (Jurassic Park (1993)): 0.789
And lift-based association rules for item 2761 (*The Iron Giant*):
631 (All Dogs Go to Heaven 2 (1996)): 4.898
2532 (Conquest of the Planet of the Apes (1972)): 4.810
3615 (Dinosaur (2000)): 4.546
1649 (Fast, Cheap & Out of Control (1997)): 4.490
340 (War, The (1994)): 4.490
1016 (Shaggy Dog, The (1959)): 4.490
2439 (Affliction (1997)): 4.490
332 (Village of the Damned (1995)): 4.377
2736 (Brighton Beach Memoirs (1986)): 4.329
3213 (Batman: Mask of the Phantasm (1993)): 4.317
## Running your code
The Gradle build file we have provided is set up to automatically run all four of your recommenders.
The following Gradle targets will do this:
- `runMean` runs the raw mean recommender
- `runDampedMean` runs the damped mean recommender
- `runBasicAssoc` runs the basic association rule recommender
- `runLiftAssoc` runs the advanced (lift-based) association rule recommender
You can run these using the IntelliJ Gradle runner (open the Gradle panel, browse the tree to find
a task, and double-click it), or from the command line:
./gradlew runMean
The association rule recommenders can also take the reference item ID on the command line as a
`referenceItem` parameter. For example:
./gradlew runLiftAssoc -PreferenceItem=1
The IntelliJ Run Configuration dialog will allow you to specify additional script parameters to
the Gradle invocation.
### Debugging
If you run the Gradle tasks using IntelliJ's Gradle runner, you can run them under the debugger to debug your code.
The Gradle file also configures LensKit to write log output to log files under the `build`
directory. If you use the SLF4J logger (the `logger` field on the classes we provide) to emit debug
messages, you can find them there when you run one of the recommender tasks such as `runDampedMean`.
## Submitting
You will submit a compiled `jar` file containing your solution. To prepare your project for
submission, run the Gradle `prepareSubmission` task:
./gradlew prepareSubmission
This will create file `nonpers-submission.jar` under `build/distributions` that contains your final
solution code in a format the grader will understand. Upload this `jar` file to the Coursera
assignment grader.
## Grading
Your grade for each part will be based on two components:
- Outputting items in the correct order: 75%
- Computing correct scores for items (within an error tolerance): 25%
The parts themselves are weighted equally.