390 lines
14 KiB
Markdown
390 lines
14 KiB
Markdown
# Homework 2: Working with higher order functions.
|
|
|
|
*CSci 2041: Advanced Programming Principles, Spring 2017*
|
|
|
|
**Due:** Friday, February 17 at 5:00pm
|
|
|
|
Lab 4 on February 7 and Lab 5 on February 14 will be dedicated to
|
|
answering questions about this assignment and to providing
|
|
clarifications if any are needed.
|
|
|
|
Note that for this assignment you are not to write **any**
|
|
recursive functions. Further information on this restriction is
|
|
detailed in Part 3 of the assignment.
|
|
|
|
## Corrections to mistakes in original specification
|
|
+ The type of ``convert_to_non_blank_lines_of_words`` shoud be ``char list -> line list`` not ``string -> line list``.
|
|
|
|
## Introduction - The Paradelle
|
|
|
|
In this homework assignment you will write an OCaml program that
|
|
reads a text file and reports if it contains a poem that fits the
|
|
"fixed-form" style known as a *paradelle*.
|
|
|
|
Below is a sample paradelle called "Paradelle for Susan" by Billy
|
|
Collins from his book *Picnic, Lightning*.
|
|
|
|
|
|
> I remember the quick, nervous bird of your love. <br/>
|
|
> I remember the quick, nervous bird of your love. <br/>
|
|
> Always perched on the thinnest, highest branch. <br/>
|
|
> Always perched on the thinnest, highest branch. <br/>
|
|
> Thinnest of love, remember the quick branch. <br/>
|
|
> Always nervous, I perched on your highest bird the.
|
|
>
|
|
> It is time for me to cross the mountain. <br/>
|
|
> It is time for me to cross the mountain. <br/>
|
|
> And find another shore to darken with my pain. <br/>
|
|
> And find another shore to darken with my pain. <br/>
|
|
> Another pain for me to darken the mountain. <br/>
|
|
> And find the time, cross my shore, to with it is to.
|
|
>
|
|
> The weather warm, the handwriting familiar. <br/>
|
|
> The weather warm, the handwriting familiar. <br/>
|
|
> Your letter flies from my hand into the waters below. <br/>
|
|
> Your letter flies from my hand into the waters below. <br/>
|
|
> The familiar waters below my warm hand. <br/>
|
|
> Into handwriting your weather flies your letter the from the.
|
|
>
|
|
> I always cross the highest letter, the thinnest bird. <br/>
|
|
> Below the waters of my warm familiar pain, <br/>
|
|
> Another hand to remember your handwriting. <br/>
|
|
> The weather perched for me on the shore. <br/>
|
|
> Quick, your nervous branch flies for love. <br/>
|
|
> Darken the mountain, time and find my into it with from to to is.
|
|
|
|
|
|
Following this poem, Collins provides the following description of this form:
|
|
|
|
> The paradelle is one of the more demanding French fixed forms, first
|
|
appearing in the *langue d'oc* love poetry of the eleventh century. It
|
|
is a poem of four six-line stanzas in which the first and second lines,
|
|
as well as the third and fourth lines of the first three stanzas, must
|
|
be identical. The fifth and sixth lines, which traditionally resolve
|
|
these stanzas, must use *all* the words from the preceding
|
|
lines and *only* those words. Similarly, the final stanza must
|
|
use *every* word from *all* the preceding stanzas and
|
|
*only* those words.
|
|
|
|
Collins is actually being satirical here and poking fun at overly
|
|
rigid fixed-form styles of poetry. There is actually no form known as
|
|
the *paradelle*. This did not stop people from going off and trying
|
|
to write their own however. In fact, the above poem is slightly
|
|
modified from his original so that it actually conforms to the rules
|
|
of a paradelle.
|
|
|
|
|
|
To write an OCaml program to detect if a text file contains a paradelle
|
|
we add some more specific requirements to Collin's description above.
|
|
You should take these into consideration when completing this
|
|
assignment:
|
|
|
|
+ Blank lines are allowed, but we will assume that blank lines
|
|
consist of only a single newline ``'\n'`` character.
|
|
|
|
+ Punctuation and spacing (tabs and the space characters) should
|
|
not affect the comparison of lines in a stanza. For example, the
|
|
following two lines would be considered as "identical" because the
|
|
same words are used in the same order even though spacing and
|
|
punctuation are different.
|
|
|
|
``"And find the time,cross my shore, to with it is to"``
|
|
|
|
``"And find the time , cross my shore, to with it is to ."``
|
|
|
|
Thus, we will want to ignore punctuation symbols to some extent,
|
|
being careful to notice that they can separate words as in ``"time,cross"``.
|
|
|
|
Specifically, the punctuation we will
|
|
consider are the following :
|
|
|
|
``. ! ? , ; : -``
|
|
|
|
Other punctuation symbols will not be used in any input to assess
|
|
your program.
|
|
|
|
+ Also, we will need to split lines in the file (of Ocaml type
|
|
``string``) into a list of lines
|
|
and then split each line individual line into a list of
|
|
words. In the list of words there
|
|
should be no spaces, tabs, or punctuation symbols. Then we can
|
|
compare lists of words.
|
|
|
|
+ Capitalization does not matter. The words ``"Thinnest"``
|
|
and "``thinnest"`` are to be considered as the same.
|
|
|
|
|
|
+ In checking criteria for an individual stanza, each instance of
|
|
a word is counted. But in checking that the final stanza uses all
|
|
the words of the first 3, duplicate words should be removed.
|
|
|
|
That is, in checking that two lines "use the same words" we must
|
|
check that each word is used the same number of times in each line.
|
|
|
|
In checking that the final stanza uses all (and only) words from the
|
|
first 3 stanza, we do not care about how many times a word is
|
|
used. So if a word is used 4 times in the first 3 stanzas, it need
|
|
not be used 4 times in the final stanza.
|
|
|
|
|
|
+ Your program must return a correct answer for any text file.
|
|
For example, your program should report that an empty file or a file
|
|
containing a single character or the source code for this assignment
|
|
are not in the form of a paradelle.
|
|
|
|
|
|
|
|
## Getting started
|
|
|
|
Copy the contexts of the ``Homework/Hwk_02`` directory from the public
|
|
class repository into a ``Hwk_02`` directory in your individual
|
|
repository.
|
|
|
|
This file ``hwk_02.ml`` contains some helper functions that we'll use
|
|
in this assignment. The remainder are sample files containing
|
|
paradelles or text that is not a paradelle. The file names should
|
|
make this all clear.
|
|
|
|
|
|
|
|
## Part 1. Some useful functions.
|
|
|
|
Your first step is to define these functions that will be useful in
|
|
solving the paradelle check. Place this near the top of the
|
|
``hwk_02.ml`` file, just after the comment that says
|
|
```
|
|
(* Place part 1 functions 'take', 'drop', 'length', 'rev',
|
|
'is_elem_by', 'is_elem', 'dedup', and 'split_by' here. *)
|
|
```
|
|
|
|
### a length function, ``length``
|
|
|
|
Write a function, named ``length`` that, as you would expect, takes a
|
|
list and returns its length as a value of type ``int``
|
|
|
|
Annotate your function with types or add a comment
|
|
indicating the type of the function.
|
|
|
|
|
|
### list reverse ``rev``
|
|
|
|
Complete the definition of the reverse function ``rev`` in
|
|
``hwk_02.ml``. Currently is just raises an exception. Remove this
|
|
and replace the body with an expression that uses List.fold_left
|
|
or List.fold_right to do the work of reversing the list.
|
|
|
|
### list membership ``is_elem_by`` and ``is_elem``
|
|
|
|
Define a function ``is_elem_by`` which has the type
|
|
```
|
|
('a -> 'b -> bool) -> 'b -> 'a list -> bool
|
|
```
|
|
The first argument is a function to check if an element in the list
|
|
(the third argument) matches the values of the second argument. It
|
|
will return ``true`` if any element in the list "matches" (based on
|
|
what the first argument determines) an element in the list.
|
|
|
|
For example, both
|
|
```
|
|
is_elem_by (=) 4 [1; 2; 3; 4; 5; 6; 7]
|
|
```
|
|
and
|
|
```is_elem_by (fun c i -> Char.code c = i) 99 ['a'; 'b'; 'c'; 'd']``
|
|
evaluate to true.
|
|
|
|
Next, define a function ``is_elem`` whose first argument is a value and second
|
|
argument is a list of values of the same type. The function returns
|
|
``true`` if the value is in the list.
|
|
|
|
For example, ``is_elem 4 [1; 2; 3; 4; 5; 6; 7]`` should evaluate to
|
|
``true`` while ``is_elem 4 [1; 2; 3; 5; 6; 7]`` and ``is_elem 4 [ ]``
|
|
should both evaluate to ``false``.
|
|
|
|
``is_elem`` should be be implemented by calling ``is_elem_by``.
|
|
|
|
Annotate both of your functions with type information on the arguments
|
|
and for the result type.
|
|
|
|
|
|
### removing duplicates from a list, ``dedup``
|
|
|
|
Write a function named ``dedup`` that takes a list and removes all
|
|
duplicates from the list. The order of list elements returned is up
|
|
to you. This can be done with only a call to ``List.fold_right``,
|
|
providing you pass it the correct function that can be used to fold a
|
|
list up into one without any duplicate elements.
|
|
|
|
|
|
### a splitting function, ``split_by``
|
|
|
|
Write a splitting function named ``split_by`` that takes three arguments
|
|
|
|
1. an equality checking function that takes two values
|
|
and returns a value of type ``bool``,
|
|
|
|
2. a list of values that are to be separated,
|
|
|
|
3. and a list of separators values.
|
|
|
|
|
|
This function will split the second list into a list of lists. If the
|
|
checking function indicates that an element of the first list
|
|
(the second argument) is an element of the second list (the third
|
|
argument) then that element indicates that the list should be split at
|
|
that point. Note that this "splitting element" does not appear
|
|
in any list in the output list of lists.
|
|
|
|
For example,
|
|
+ ``split_by (=) [1;2;3;4;5;6;7;8;9;10;11] [3;7]`` should evaluate to
|
|
``[ [1;2]; [4;5;6]; [8;9;10;11] ]`` and
|
|
+ ``split_by (=) [1;2;3;3;3;4;5;6;7;7;7;8;9;10;11] [3;7]`` should
|
|
evaluate to ``[[1; 2]; []; []; [4; 5; 6]; []; []; [8; 9; 10; 11]]``.
|
|
|
|
Note the empty lists. These are the list that occur between the 3's
|
|
and 7's.
|
|
|
|
+ ``split_by (=) ["A"; "B"; "C"; "D"] ["E"]`` should evaluate to
|
|
``[["A"; "B"; "C"; "D"]]``
|
|
|
|
Annotate your function with types.
|
|
|
|
Also add a comment explaining the behavior of your function and its
|
|
type. Try to write this function so that the type is as general as
|
|
possible.
|
|
|
|
|
|
## Reading file contents.
|
|
|
|
Notice the provide helper functions ``read_chars`` and ``read_file``.
|
|
The second will read a file and return the list of characters, wrapped
|
|
up in an ``option`` type if it finds the file. If the file, with the
|
|
name passed to the function, can't be found, it will return ``None``.
|
|
|
|
|
|
|
|
## Part 2. Preparing text for the paradelle check.
|
|
|
|
The poems that we aim to check are stored as values of type ``string``
|
|
in text files. But the ``read_file`` function above will return this
|
|
data in a value of type ``char list option``.
|
|
|
|
We will need to break the input into a list of lines of text, removing
|
|
the blank lines, and also splitting the lines of text into lists of
|
|
words.
|
|
|
|
We need to write a function called
|
|
``convert_to_non_blank_lines_of_words`` that takes as input the poem
|
|
as an OCaml ``char list`` and returns a list of lines, where each line is
|
|
a list of words, and each word is a list of characters.
|
|
|
|
Thus, ``convert_to_non_blank_lines_of_words`` can be seen as having
|
|
the type ``char list -> char list list list``.
|
|
|
|
We can use the type system to name new types that make this type
|
|
easier to read.
|
|
|
|
First define the type ``word`` to be ``char list`` by
|
|
```
|
|
type word = char list
|
|
```
|
|
Then define a ``line`` type to be a ``word list``.
|
|
|
|
Then, we can specify that
|
|
``convert_to_non_blank_lines_of_words`` has
|
|
the type ``char list -> line list``.
|
|
|
|
In writing ``convert_to_non_blank_lines_of_words`` you may want to
|
|
consider a helper function that breaks up a ``char
|
|
list`` into lines, separated by new line characters (``'\n'``) and
|
|
another that breaks up lines into lists of words.
|
|
|
|
|
|
At this point you are not required to directly address the problems
|
|
relating to capitalization of letters which we eventually need to
|
|
address in checking that the same words appear in various parts of the
|
|
poem. You are also not required to deal with issues of punctuation,
|
|
but you may need to do something the be sure that words are correctly
|
|
separated. For example, we would want to see ``that,barn`` as two
|
|
words.
|
|
|
|
|
|
## Part 3. The paradelle check.
|
|
|
|
We will now need to consider how punctuation is to be handled, how
|
|
words are to be compared and, in the comparisons of lines, when
|
|
duplicate words should be dropped and when they should not be.
|
|
|
|
We can now begin to write the function to check that a poem is a
|
|
"paradelle".
|
|
|
|
To do this, write a function named ``paradelle`` that takes as input a
|
|
filename (a ``string``) of a file containing a potential paradelle.
|
|
This function then returns a value of the following type:
|
|
```
|
|
type result = OK
|
|
| FileNotFound of string
|
|
| IncorrectNumLines of int
|
|
| IncorrectLines of (int * int) list
|
|
| IncorrectLastStanza
|
|
```
|
|
This type describes the possible outcomes of the analysis. For example,
|
|
1. ``OK``- The file contains a paradelle.
|
|
1. ``FileNotFound "test.txt"`` - The file ``test.txt`` was not found.
|
|
1. ``IncorrectNumLines 18`` - The file contained 18 lines after the
|
|
blank lines were removed. A paradelle must have 24 lines.
|
|
1. ``IncorrectLines [ (1,2); (11,12) ]`` - Lines 1 and 2 are not the
|
|
same and thus this is not a paradelle. Also lines 11 and 12, in the
|
|
second stanza, do not have the same words as in the first 4 lines
|
|
of that stanza, and
|
|
this is another reason why this one is not a paradelle.
|
|
1. ``IncorrectLastStanza`` - the last stanza does not properly contain
|
|
the words from the first three stanzas.
|
|
|
|
|
|
**Remember, you are not to write any recursive functions.** Only
|
|
``read_chars``, ``take``, and ``drop`` can be used.
|
|
|
|
|
|
Furthermore, below is a list of functions from various OCaml modules
|
|
that you may also use. Functions not in this list may not be used.
|
|
(Except for functions such as ``input_char`` in functions that were
|
|
given to you.)
|
|
+ List.map, List.filter, List.fold_left, List.fold_right
|
|
+ List.sort, List.concat,
|
|
+ Char.lowercase, Char.uppercase
|
|
+ string_of_int
|
|
|
|
The ``sort`` function takes comparison functions as its first argument.
|
|
We saw how such functions are written and used in lecture.
|
|
|
|
These restrictions are in place so that you can see how interesting
|
|
computations can be specified using the common idioms of mapping,
|
|
filtering, and folding lists. The goal of this assignment is not
|
|
simply to get the paradelle checker to work, but to get it to work and
|
|
for you to understand how these higher order functions can be used.
|
|
|
|
|
|
## Some advice.
|
|
You will want to get started on this assignment sooner rather than
|
|
later. There are many aspects that you need to think about. Most
|
|
importantly is the structure of your program the various helper
|
|
functions that you may want to use.
|
|
|
|
We recommend writing your helper functions at the "top level" instead
|
|
of nested in a ``let`` expression so that you can inspect the type
|
|
inferred for them by OCaml and also run them on sample input to check
|
|
that they are correct.
|
|
|
|
|
|
## Feedback tests.
|
|
|
|
Feedback tests are not initially turned on. You should read these
|
|
specifications and make an effort to understand them based on the
|
|
descriptions.
|
|
|
|
If you have questions, ask your TAs in lab or post them to the "Hwk
|
|
02" forum on Moodle.
|
|
|
|
Feedback tests will be available next week.
|
|
|
|
|