An open API service indexing awesome lists of open source software.

https://github.com/andreacrotti/code-kata

my solutions of the code-katas from Dave Thomas
https://github.com/andreacrotti/code-kata

Last synced: 10 months ago
JSON representation

my solutions of the code-katas from Dave Thomas

Awesome Lists containing this project

README

          

* Code Kata One - Supermarket Pricing
** Problem
This kata arose from some discussions we’ve been having at the DFW
Practioners meetings. The problem domain is something seemingly
simple: pricing goods at supermarkets.

Some things in supermarkets have simple prices: this can of beans
costs $0.65. Other things have more complex prices. For example:

three for a dollar (so what’s the price if I buy 4, or 5?)
$1.99/pound (so what does 4 ounces cost?) buy two, get one free (so
does the third item have a price?)

This kata involves no coding. The exercise is to experiment with
various models for representing money and prices that are flexible
enough to deal with these (and other) pricing schemes, and at the
same time are generally usable (at the checkout, for stock
management, order entry, and so on). Spend time considering issues
such as:

does fractional money exist? when (if ever) does rounding take
place? how do you keep an audit trail of pricing decisions (and do
you need to)? are costs and prices the same class of thing? if a
shelf of 100 cans is priced using "buy two, get one free", how do
you value the stock?

This is an ideal shower-time kata, but be careful. Some of the
problems are more subtle than they first appear. I suggest that it
might take a couple of weeks worth of showers to exhaust the main
alternatives. Goal

The goal of this kata is to practice a looser style of experimental
modelling. Look for as many different ways of handling the issues as
possible. Consider the various tradeoffs of each. What techniques
use best for exploring these models? For recording them? How can you
validate a model is reasonable? What’s a Code Kata?

As a group, software developers don’t practice enough. Most of our
learning takes place on the job, which means that most of our
mistakes get made there as well. Other creative professions
practice: artists carry a sketchpad, musicians play technical
pieces, poets constantly rewrite works. In karate, where the aim is
to learn to spar or fight, most of a student’s time is spent
learning and refining basic moves. The more formal of these
exercises are called kata.

To help developers get the same benefits from practicing, we’re
putting together a series of code kata: simple, artificial exercises
which let us experiment and learn without the pressure of a
production environment. Our suggestions for doing the kata are:

find a place and time where you won’t be interrupted focus on the
essential elements of the kata remember to look for feedback for
every major decision if it helps, keep a journal of your progress
have discussion groups with other developers, but try to have
completed the kata first

There are no right or wrong answers in these kata: the benefit comes
from the process, not from the result.

** Answer

* Karate chop
** Problem
A binary chop (sometimes called the more prosaic binary search)
finds the position of value in a sorted array of values. It achieves
some efficiency by halving the number of items under consideration
each time it probes the values: in the first pass it determines
whether the required value is in the top or the bottom half of the
list of values. In the second pass in considers only this half,
again dividing it in to two. It stops when it finds the value it is
looking for, or when it runs out of array to search. Binary searches
are a favorite of CS lecturers.

This Kata is straightforward. Implement a binary search routine
(using the specification below) in the language and technique of
your choice. Tomorrow, implement it again, using a totally different
technique. Do the same the next day, until you have five totally
unique implementations of a binary chop. (For example, one solution
might be the traditional iterative approach, one might be recursive,
one might use a functional style passing array slices around, and so
on). Goals

This Kata has three separate goals:

As you’re coding each algorithm, keep a note of the kinds of error
you encounter. A binary search is a ripe breeding ground for "off by
one" and fencepost errors. As you progress through the week, see if
the frequency of these errors decreases (that is, do you learn from
experience in one technique when it comes to coding with a different
technique?). What can you say about the relative merits of the
various techniques you’ve chosen? Which is the most likely to make
it in to production code? Which was the most fun to write? Which was
the hardest to get working? And for all these questions, ask
yourself "why?". It’s fairly hard to come up with five unique
approaches to a binary chop. How did you go about coming up with
approaches four and five? What techniques did you use to fire those
"off the wall" neurons?

Specification

Write a binary chop method that takes an integer search target and a
sorted array of integers. It should return the integer index of the
target in the array, or -1 if the target is not in the array. The
signature will logically be:

chop(int, array_of_int) -> int

You can assume that the array has less than 100,000 elements. For
the purposes of this Kata, time and memory performance are not
issues (assuming the chop terminates before you get bored and kill
it, and that you have enough RAM to run it). Test Data

Here is the Test::Unit code I used when developing my methods. Feel
free to add to it. The tests assume that array indices start at
zero. You’ll probably have to do a couple of global
search-and-replaces to make this compile in your language of choice
(unless your enlightened choice happens to be Ruby).

#+begin_src ruby
def test_chop
assert_equal(-1, chop(3, []))
assert_equal(-1, chop(3, [1]))
assert_equal(0, chop(1, [1]))
#
assert_equal(0, chop(1, [1, 3, 5]))
assert_equal(1, chop(3, [1, 3, 5]))
assert_equal(2, chop(5, [1, 3, 5]))
assert_equal(-1, chop(0, [1, 3, 5]))
assert_equal(-1, chop(2, [1, 3, 5]))
assert_equal(-1, chop(4, [1, 3, 5]))
assert_equal(-1, chop(6, [1, 3, 5]))
#
assert_equal(0, chop(1, [1, 3, 5, 7]))
assert_equal(1, chop(3, [1, 3, 5, 7]))
assert_equal(2, chop(5, [1, 3, 5, 7]))
assert_equal(3, chop(7, [1, 3, 5, 7]))
assert_equal(-1, chop(0, [1, 3, 5, 7]))
assert_equal(-1, chop(2, [1, 3, 5, 7]))
assert_equal(-1, chop(4, [1, 3, 5, 7]))
assert_equal(-1, chop(6, [1, 3, 5, 7]))
assert_equal(-1, chop(8, [1, 3, 5, 7]))
end
#+end_src

* Kata Three: How Big, How Fast?
** Problem

Rough estimation is a useful talent to possess. As you’re coding away,
you may suddenly need to work out approximately how big a data
structure will be, or how fast some loop will run. The faster you can
do this, the less the coding flow will be disturbed.

So this is a simple kata: a series of questions, each asking for a
rough answer. Try to work each out in your head. I’ll post my answers
(and how I got them) in a week or so. How Big?

roughly how many binary digits (bit) are required for the unsigned representation of:
1,000
1,000,000
1,000,000,000
1,000,000,000,000
8,000,000,000,000

My town has approximately 20,000 residences. How much space is
required to store the names, addresses, and a phone number for all
of these (if we store them as characters)? I’m storing 1,000,000
integers in a binary tree. Roughly how many nodes and levels can I
expect the tree to have? Roughly how much space will it occupy on
a 32-bit architecture?

How Fast?

My copy of Meyer’s Object Oriented Software Construction has about
1,200 body pages. Assuming no flow control or protocol overhead,
about how long would it take to send it over an async 56k baud
modem line? My binary search algorithm takes about 4.5mS to
search a 10,000 entry array, and about 6mS to search 100,000
elements. How long would I expect it to take to search 10,000,000
elements (assuming I have sufficient memory to prevent paging).
Unix passwords are stored using a one-way hash function: the
original string is converted to the ‘encrypted’ password string,
which cannot be converted back to the original string. One way to
attack the password file is to generate all possible cleartext
passwords, applying the password hash to each in turn and checking
to see if the result matches the password you’re trying to
crack. If the hashes match, then the string you used to generate
the hash is the original password (or at least, it’s as good as
the original password as far as logging in is concerned). In our
particular system, passwords can be up to 16 characters long, and
there are 96 possible characters at each position. If it takes 1mS
to generate the password hash, is this a viable approach to
attacking a password?

* Kata Four: Data Munging
** Problem
Martin Fowler gave me a hard time for KataTwo, complaining that it was
yet another single-function, academic exercise. Which, or course, it
was. So this week let’s mix things up a bit.

Here’s an exercise in three parts to do with real world data. Try hard
not to read ahead—do each part in turn. Part One: Weather Data

In weather.dat you’ll find daily weather data for Morristown, NJ for
June 2002. Download this text file, then write a program to output the
day number (column one) with the smallest temperature spread (the
maximum temperature is the second column, the minimum the third
column). Part Two: Soccer League Table

The file football.dat contains the results from the English Premier
League for 2001/2. The columns labeled ‘F’ and ‘A’ contain the total
number of goals scored for and against each team in that season (so
Arsenal scored 79 goals against opponents, and had 36 goals scored
against them). Write a program to print the name of the team with the
smallest difference in ‘for’ and ‘against’ goals. Part Three: DRY
Fusion

Take the two programs written previously and factor out as much common
code as possible, leaving you with two smaller programs and some kind
of shared functionality. Kata Questions

To what extent did the design decisions you made when writing the
original programs make it easier or harder to factor out common code?
Was the way you wrote the second program influenced by writing the
first? Is factoring out as much common code as possible always a good
thing? Did the readability of the programs suffer because of this
requirement? How about the maintainability?

** Python
First attempt were two dirty regular expression matching. Then I
factored out the regular expression handling in a generic *Matcher*
class. This class has a /compute/ function, which skips the non
matching lines and apply the given function to the result.

* Kata Five - Bloom Filters
** Problem
There are many circumstances where we need to find out if something is
a member of a set, and many algorithms for doing it. If the set is
small, you can use bitmaps. When they get larger, hashes are a useful
technique. But when the sets get big, we start bumping in to
limitations. Holding 250,000 words in memory for a spell checker might
be too big an overhead if your target environment is a PDA or cell
phone. Keeping a list of web-pages visited might be extravagant when
you get up to tens of millions of pages. Fortunately, there’s a
technique that can help.

Bloom filters are a 30-year-old statistical way of testing for
membership in a set. They greatly reduce the amount of storage you
need to represent the set, but at a price: they’ll sometimes report
that something is in the set when it isn’t (but it’ll never do the
opposite; if the filter says that the set doesn’t contain your object,
you know that it doesn’t). And the nice thing is you can control the
accuracy; the more memory you’re prepared to give the algorithm, the
fewer false positives you get. I once wrote a spell checker for a
PDP-11 which stored a dictionary of 80,000 words in 16kbytes, and I
very rarely saw it let though an incorrect word. (Update: I must have
mis-remembered these figures, because they are not in line with the
theory. Unfortunately, I can no longer read the 8" floppies holding
the source, so I can’t get the correct numbers. Let’s just say that I
got a decent sized dictionary, along with the spell checker, all in
under 64k.)

Bloom filters are very simple. Take a big array of bits, initially all
zero. Then take the things you want to look up (in our case we’ll use
a dictionary of words). Produce ‘n’ independent hash values for each
word. Each hash is a number which is used to set the corresponding bit
in the array of bits. Sometimes there’ll be clashes, where the bit
will already be set from some other word. This doesn’t matter.

To check to see of a new word is already in the dictionary, perform
the same hashes on it that you used to load the bitmap. Then check to
see if each of the bits corresponding to these hash values is set. If
any bit is not set, then you never loaded that word in, and you can
reject it.

The Bloom filter reports a false positive when a set of hashes for a
word all end up corresponding to bits that were set previously by
other words. In practice this doesn’t happen too often as long as the
bitmap isn’t too heavily loaded with one-bits (clearly if every bit is
one, then it’ll give a false positive on every lookup). There’s a
discussion of the math in Bloom filters at
www.cs.wisc.edu/~cao/papers/summary-cache/node8.html.

So, this kata is fairly straightforward. Implement a Bloom filter
based spell checker. You’ll need some kind of bitmap, some hash
functions, and a simple way of reading in the dictionary and then the
words to check. For the hash function, remember that you can always
use something that generates a fairly long hash (such as MD5) and then
take your smaller hash values by extracting sequences of bits from the
result. On a Unix box you can find a list of words in /usr/dict/words
(or possibly in /usr/share/dict/words). For others, I’ve put a word
list up at pragprog.com/katadata/wordlist.txt.

Play with using different numbers of hashes, and with different bitmap
sizes.

Part two of the exercise is optional. Try generating random
5-character words and feeding them in to your spell checker. For each
word that it says it OK, look it up in the original dictionary. See
how many false positives you get.

** Notes during development
An interface for a checker can be divided in two important phases:
- populate
- check

In the first phase we read the word list and generate the set or do
any other preparation operation, while in the second phase we
actually check if a word is part of the dictionary.

The naive implementation uses just a *set* as data structure,
reading all the words from the words file.

# It would be useful for this and other cases to define decorators
# able to tell what is the memory usage and the speed of each
# function.

Then I started to reason about the performances, and how do I
compare the results given by a Naive checking against the results
of the Bloom filter. So it would be interesting to have some unit
tests which are able to tell me if things are gettings slower.

*** Implementation
The first idea to implement the Bloom filter was to generate an
array of booleans and use it smartly. Unfortunately booleans are
not supported in as types in the array, so we have to resort to
use long integers and do some math with them.
* Kata Six: Anagrams
** Problem
Back to non-realistic coding this week (sorry, Martin). Let's solve some crossword puzzle clues.

In England, I used to waste hour upon hour doing newspaper crosswords. As crossword fans will know, English cryptic crosswords have a totally different feel to their American counterparts: most clues involve punning or word play, and there are lots of anagrams to work through. For example, a recent Guardian crossword had:

Down:
..
21. Most unusual form of arrest (6)

The hint is the phrase ‘form of,’ indicating that we’re looking for an anagram. Sure enough ‘arrest’ has six letters, and can be arranged nicely into ‘rarest,’ meaning ‘most unusual.’ (Other anagrams include raster, raters, Sartre, and starer)

A while back we had a thread on the Ruby mailing list about finding anagrams, and I’d like to resurrect it here. The challenge is fairly simple: given a file containing one word per line, print out all the combinations of words that are anagrams; each line in the output contains all the words from the input that are anagrams of each other. For example, your program might include in its output:

kinship pinkish
enlist inlets listen silent
boaster boaters borates
fresher refresh
sinks skins
knits stink
rots sort

If you run this on the word list here you should find 2,530 sets of anagrams (a total of 5,680 words). Running on a larger dictionary (about 234k words) on my OSX box produces 15,048 sets of anagrams (including all-time favorites such as actaeonidae/donatiaceae, crepitant/pittancer, and (for those readers in Florida) electoral/recollate).

For added programming pleasure, find the longest words that are anagrams, and find the set of anagrams containing the most words (so "parsley players replays sparely" would not win, having only four words in the set).
Kata Objectives

Apart from having some fun with words, this kata should make you think somewhat about algorithms. The simplest algorithms to find all the anagram combinations may take inordinate amounts of time to do the job. Working though alternatives should help bring the time down by orders of magnitude. To give you a possible point of comparison, I hacked a solution together in 25 lines of Ruby. It runs on the word list from my web site in 1.5s on a 1GHz PPC. It’s also an interesting exercise in testing: can you write unit tests to verify that your code is working correctly before setting it to work on the full dictionary.

* Kata Seven: How'd I Do?
** Problem
The last couple of kata have been programming challenges; let’s move back into mushier, people-oriented stuff this week.

This kata is about reading code critically—our own code. Here’s the challenge. Find a piece of code you wrote last year sometime. It should be a decent sized chunk, perhaps 500 to 1,000 lines long. Pick code which isn’t still fresh in your mind.

Now we need to do some acting. Read through this code three times. Each time through, pretend something different. Each time, jot down notes on the stuff you find.

The first time through, pretend that the person who wrote this code is the best programmer you know. Look for all the examples of great code in the program.
The second time through, pretend that the person who wrote this code is the worst programmer you know. Look for all the examples of horrible code and bad design.
The third (and last) time though, pretend that you’ve been told that this code contains serious bugs, and that the client is going to sue you to bankruptcy unless you fix them. Look for every potential bug in the code.

Now look at the notes you made. What is the nature of the good stuff you found? Would you find similar good stuff in the code you’re writing today. What about the bad stuff; are similar pieces of code sneaking in to your current code too. And finally, did you find any bugs in the old code? If so, are any of them things that that you’d want to fix now that you’ve found them. Are any of them systematic errors that you might still be making today?
Moving Forward By Looking Back

Perhaps you’re not like me, but whenever I try this exercise I find things that pleasantly surprise me and things that make me cringe in embarrassment. I find the occasional serious bug (along with more frequent less but serious issues). So I try to make a point of looking back at my code fairly frequently.

However, doing this six months after you write code is not the best way of developing good software today. So the underlying challenge of this kata is this: how can we get into the habit of critically reviewing the code that we write, as we write it? And can we use the techniques of reading code with different expectations (good coder, bad coder, and bug hunt) when we’re reviewing our colleagues code?

* Kata Eight: Conflicting Objectives
** Problem

Why do we write code? At one level, we’re trying to solve some particular problem, to add some kind of value to the world. But often there are also secondary objectives: the code has to solve the problem, and it also has to be fast, or easy to maintain, or extend, or whatever. So let’s look at that.

For this kata, we’re going to write a program to solve a simple problem, and we’re going to write it with three different sub-objectives. Our program is going do process the dictionary we used in previous kata, this time looking for all six letter words which are composed of two concatenated smaller words. For example:

al + bums => albums
bar + ely => barely
be + foul => befoul
con + vex => convex
here + by => hereby
jig + saw => jigsaw
tail + or => tailor
we + aver => weaver

Write the program three times.

The first time, make program as readable as you can make it.
The second time, optimize the program to run fast fast as you can make it.
The third time, write as extendible a program as you can.

Now look back at the three programs and think about how each of the three subobjectives interacts with the others. For example, does making the program as fast as possible make it more or less readable? Does it make easier to extend? Does making the program readable make it slower or faster, flexible or rigid? And does making it extendible make it more or less readable, slower or faster? Are any of these correlations stronger than others? What does this mean in terms of optimizations you may perform on the code you write?

* Kata Nine: Back to the CheckOut
** Problem

Back to the supermarket. This week, we’ll implement the code for a checkout system that handles pricing schemes such as "apples cost 50 cents, three apples cost $1.30."

Way back in KataOne we thought about how to model the various options for supermarket pricing. We looked at things such as "three for a dollar," "$1.99 per pound," and "buy two, get one free."

This week, let’s implement the code for a supermarket checkout that calculates the total price of a number of items. In a normal supermarket, things are identified using Stock Keeping Units, or SKUs. In our store, we’ll use individual letters of the alphabet (A, B, C, and so on). Our goods are priced individually. In addition, some items are multipriced: buy n of them, and they’ll cost you y cents. For example, item ‘A’ might cost 50 cents individually, but this week we have a special offer: buy three ‘A’s and they’ll cost you $1.30. In fact this week’s prices are:

Item Unit Special
Price Price
--------------------------
A 50 3 for 130
B 30 2 for 45
C 20
D 15

Our checkout accepts items in any order, so that if we scan a B, an A, and another B, we’ll recognize the two B’s and price them at 45 (for a total price so far of 95). Because the pricing changes frequently, we need to be able to pass in a set of pricing rules each time we start handling a checkout transaction.

The interface to the checkout should look like:

co = CheckOut.new(pricing_rules)
co.scan(item)
co.scan(item)
: :
price = co.total

Here’s a set of unit tests for a Ruby implementation. The helper method price lets you specify a sequence of items using a string, calling the checkout’s scan method on each item in turn before finally returning the total price.

class TestPrice < Test::Unit::TestCase

def price(goods)
co = CheckOut.new(RULES)
goods.split(//).each { |item| co.scan(item) }
co.total
end

def test_totals
assert_equal( 0, price(""))
assert_equal( 50, price("A"))
assert_equal( 80, price("AB"))
assert_equal(115, price("CDBA"))

assert_equal(100, price("AA"))
assert_equal(130, price("AAA"))
assert_equal(180, price("AAAA"))
assert_equal(230, price("AAAAA"))
assert_equal(260, price("AAAAAA"))

assert_equal(160, price("AAAB"))
assert_equal(175, price("AAABB"))
assert_equal(190, price("AAABBD"))
assert_equal(190, price("DABABA"))
end

def test_incremental
co = CheckOut.new(RULES)
assert_equal( 0, co.total)
co.scan("A"); assert_equal( 50, co.total)
co.scan("B"); assert_equal( 80, co.total)
co.scan("A"); assert_equal(130, co.total)
co.scan("A"); assert_equal(160, co.total)
co.scan("B"); assert_equal(175, co.total)
end
end

There are lots of ways of implementing this kind of algorithm; if you have time, experiment with several.
Objectives of the Kata

To some extent, this is just a fun little problem. But underneath the covers, it’s a stealth exercise in decoupling. The challenge description doesn’t mention the format of the pricing rules. How can these be specified in such a way that the checkout doesn’t know about particular items and their pricing strategies? How can we make the design flexible enough so that we can add new styles of pricing rule in the future?