Iterative Divide & Conquer or How to Solve Problems

coaching, kata, problem-solving, programming, pseudo-code, teaching

Let’s imagine a beginner programmer, who have learned a programming language basics, can write programs and can read someone else’s programs. Now, for any somewhat non-trivial problem, they have trouble coming up with a solution. And if they read one of the possible solutions, they will understand it and conclude: “I understand how this program works, but I don’t know how to get there.”. If that sounds like you, my dear reader, or someone you would like to coach or teach, then come and learn Iterative Divide & Conquer problem-solving technique.

English Numbers Kata

Given an integer number in the range of -999 999 999 to 999 999 999, write a program, that will output that number in English words.

Example:

Given number 37545, the program outputs thirty-seven thousands five hundred forty-five.

I can solve that!

If you are confident, that you are able to solve that problem, my congratulations, you have problem-solving skill at a necessary level!

If you still feel, that you can solve this Kata easily, although there are more complex problems that give you troubles, then, probably, your problem-solving strategy is not scalable. The technique described in this article is scalable.

Let’s dive in

NOTE: this article uses pseudo-code, that doesn’t belong to any programming language, so it makes sense for you, my reader, to implement this step-by-step in the programming language you know.

OK. Let’s imagine, that you don’t know how to solve the whole English Numbers Kata and you don’t even know where to start.

Now, let’s try to solve much simpler problem:

Given an integer number in the range of 0 to 9, write a program that will output that number in English words.

Can you solve that? I am pretty sure you can. And easily at that.

Let’s write down some of our possible inputs and corresponding outputs:

1
2
3
4
print english_number(0)   # zero
print english_number(1)   # one
print english_number(7)   # seven
print english_number(9)   # nine

If we run that program, we will get an error, that english_number function is not defined yet. Let’s define it:

1
2
3
function english_number(number) {
  return ""
}

Now if we run our program, we will get four empty lines on the screen as expected. Easiest implementation of english_number function would probably look like that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
if (number == 0) {
  return "zero"
} else if (number == 1) {
  return "one"
} else if (number == 2) {
  return "two"
} else if (number == 3) {
  return "three"
} else if (number == 4) {
  return "four"
} else if (number == 5) {
  return "five"
} else if (number == 6) {
  return "six"
} else if (number == 7) {
  return "seven"
} else if (number == 8) {
  return "eight"
} else if (number == 9) {
  return "nine"
}

If we run our program, then we will get our expected output:

1
2
3
4
zero
one
seven
nine

If you already know arrays and access array by index, english_number can be simplified greatly:

1
2
3
4
5
6
NUMBERS = ["zero", "one", "two", "three", "four",
           "five", "six", "seven", "eight", "nine"]

function english_number(number) {
  return NUMBERS[number]
}

After making this change, let’s run the program, we should see the same output:

1
2
3
4
zero
one
seven
nine

OK. Seems like we have solved our current problem at hand. What about the initial problem? How do we get there?

Increase size of the problem slightly

Or as you would say in real-world programming: Add a new requirement.

Our initial problem has only two axes, where we can add requirements to our current solution to move it towards final solution:

  • increase the supported range
  • allow negative integers

Let’s go with latter, it seems easier. First we write down our possible inputs and outputs:

1
2
3
4
5
print english_integer(0)     # zero
print english_integer(4)     # four
print english_integer(-3)    # minus three
print english_integer(9)     # nine
print english_integer(-9)    # minus nine

Now, implement english_integer as a simple call to english_number (that will make half of our inputs produce correct output):

1
2
3
function english_integer(number) {
  return english_number(number)
}

Depending on what your programming language this can:

  • output partly incorrect data (for negative values)
  • fail at run time
  • fail at compile time

We can fix that by handling the case of negative numbers:

1
2
3
4
5
if (number < 0) {
  return "minus " + english_integer(-number)
} else {
  return english_number(number)
}

That might be confusing at first. Especially, part, where we call english_integer from the body of the same function. Let’s draw a diagram on how that function works:

Diagram for english_integer after adding negative integers

If we were to unwind this diagram into possible paths, we would end up with two possible cases:

  1. When number < 0:
    • english_integer(number)
    • if (number < 0) - YES
    • english_integer(-number)
    • if (number < 0) - NO
    • english_number(number)
    • prepend result of last call with minus
    • and return it
  2. When number >= 0:
    • english_integer(number)
    • if (number < 0) - NO
    • english_number(number)
    • return result of last call

Running this program results in expected output:

1
2
3
4
5
zero
four
minus three
nine
minus nine

Implementation of english_integer(number) function can be simplified by eliminating else clause and treating number < 0 as an edge case and using “guard clause” to handle it:

1
2
3
4
5
function english_integer(number) {
  if (number < 0) return "minus " + english_integer(-number)

  return english_number(number)
}

If we run our program, the output should be the same as before.

Add more requirements!

Now, that we are done with handling a negative case, we have only one requirement axis left: range of integers. Currently, we support integers from -9 to 9. Now let’s extend support for integers from 10 to 19. This problem should be as trivial as the problem for the range of 0...9.

Our inputs with corresponding outputs:

1
2
3
4
5
print english_teen_number(10)    # ten
print english_teen_number(11)    # eleven
print english_teen_number(13)    # thirteen
print english_teen_number(18)    # eighteen
print english_teen_number(19)    # nineteen

And the implementation to make the output of the program correct (very similar to english_number function):

1
2
3
4
5
6
7
8
9
TEEN_NUMBERS = ["ten", "eleven", "twelve", "thirteen", "fourteen",
                "fifteen", "sixteen", "seventeen", "eighteen", "nineteen"]

function english_teen_number(number) {
  // "number - 10" comes from the fact, that range 10..19
  // maps to a range of 0..9 of our array, so we need to
  // shift 10..19 by -10 to get 0..9
  return TEEN_NUMBERS[number - 10]
}

Now we want to glue our current solution with english_teen_number to add support for range 10..19. This sounds like another if .. else case handling:

  • when number < 10, use english_number
  • when number >= 10, use english_teen_number

Let’s change our input/output pairs for english_integer function:

1
2
3
4
5
6
7
8
print english_integer(0)     # zero
print english_integer(4)     # four
print english_integer(-3)    # minus three
print english_integer(9)     # nine
print english_integer(-9)    # minus nine
print english_integer(10)    # ten
print english_integer(15)    # fifteen
print english_integer(19)    # nineteen

That should fail in some way (compile error, runtime error or incorrect output). Now it is time to implement missing functionality for last 3 examples in our english_integer function. Replace the last call to english_number with:

1
2
3
4
5
if (number < 10) {
  return english_number(number)
} else {
  return english_teen_number(number)
}

If we run the program, we should see correct output:

1
2
3
4
5
6
7
8
zero
four
minus three
nine
minus nine
ten
fifteen
nineteen

What about negative numbers in the range of -19...-10? Let’s add input/output examples and see what happens if we run the program:

1
2
3
print english_integer(-12)    # minus twelve
print english_integer(-16)    # minus sixteen
print english_integer(-19)    # minus nineteen

And the output if we run the program:

1
2
3
minus twelve
minus sixteen
minus nineteen

Yes, it works already thanks to the guard statement that we have at the top of english_integer function. And the whole function body:

1
2
3
4
5
6
7
8
9
function english_integer(number) {
  if (number < 0) return "minus " + english_integer(-number)

  if (number < 10) {
    return english_number(number)
  } else {
    return english_teen_number(number)
  }
}

Going further!

Next smallest requirement seems to be an addition of the range 20...99. Let’s write down our example inputs and outputs:

1
2
3
4
5
print english_integer_in_tens(20)    # twenty
print english_integer_in_tens(27)    # twenty-seven
print english_integer_in_tens(42)    # forty-two
print english_integer_in_tens(70)    # seventy
print english_integer_in_tens(99)    # ninety-nine

So how do we solve this problem? Can we follow the same pattern as before, for ranges 0...9 and 10...19?

We certainly do. We create a constant INTEGERS_IN_TENS = [...], where [...] will contain 79 (mostly two-word) strings. It seems like an excess effort to me. So can we do better?

Yes! We can apply the same problem-solving technique here. What is the smallest problem, that we can solve here easily and independently from other problems?

What about solving the problem only for numbers 20, 30, 40, …, 90? Sounds simple enough! Let’s write our example input/outputs:

1
2
3
4
print english_ten(20)    # twenty
print english_ten(30)    # thirty
print english_ten(60)    # sixty
print english_ten(90)    # ninety

This is no longer different from ranges 0...9 and 10...19. The only detail is that we need to find a way to convert range 20...90 to 0...7 to be able to access the array by that index. This can be done in 2 steps:

  1. obtain the first digit of the number: number / 10, which results in the range 2...9,
  2. and shift resulting digit by -2: number / 10 - 2, which results in the range 0...7,

exactly, what we expect. Now, we can implement english_ten similarly to english_number and english_teen_number:

1
2
3
4
5
6
TENS = ["twenty", "thirty", "forty", "fifty",
        "sixty", "seventy", "eighty", "ninety"]

function english_ten(number) {
  return TENS[number / 10 - 2]
}

Now it is time to return to the original requirement: support range 20...99. Looking at our example input/outputs for a function english_integer_in_tens, we can conclude, that we have 2 different cases:

  • When second digit is 0 (number % 10 == 0):
    • we output only twenty, thirty, …, ninety, depending on the first digit of the number;
    • by the way, this is exactly, what english_ten function is doing. Great!
  • When the second digit is not 0:
    • we output twenty, thirty, …, ninety, depending on the first digit of the number;
    • and we output a hyphen character: -;
    • and we output one, two, …, nine, depending on the second digit;
    • by the way, latter is exactly, what english_number function is doing. Great!

So, let’s implement that in english_integer_in_tens:

1
2
3
4
5
6
7
8
9
function english_integer_in_tens(number) {
  second_digit = number % 10

  if (second_digit == 0) {
    return english_ten(number)
  } else {
    return english_ten(number) + " " + english_number(second_digit)
  }
}

Running the program will produce the expected output:

1
2
3
4
5
twenty
twenty-seven
forty-two
seventy
ninety-nine

Now we should glue the solution for range 20...99 with our main solution, that currently supports only -19...19. As always, start with example input/outputs:

1
2
3
print english_integer(60)   # sixty
print english_integer(43)   # forty-three
print english_integer(-97)  # minus ninety-seven

This results in incorrect output or failure. How can we merge two solutions together now? Let’s remember our current main solution’s different cases:

  • Guard number < 0, that prepends “minus ” and makes number non-negative.
  • When number < 10, use english_number.
  • Otherwise, use english_teen_number.

Seems, like first two cases should not be touched and we are interested in the last one. We should split it in two:

  • Otherwise:
    • When number < 20, use english_teen_number.
    • Otherwise, use english_integer_in_tens.

I believe, we are done and can implement merged version:

1
2
3
4
5
6
7
8
9
10
11
function english_integer(number) {
  if (number < 0) return "minus " + english_integer(-number)

  if (number < 10) {
    return english_number(number)
  } else if (number < 20) {
    return english_teen_number(number)
  } else {
    return english_integer_in_tens(number)
  }
}

Running the program confirms that our new solution now supports integers in the range -99...99.

Final solution

I’m leaving the final solution as an exercise for you, my dear reader. There is nothing more to this technique. Careful application of this technique and careful choice of smallest baby-steps as new requirements to your current solution will get you there - to the final solution, that supports the range of -999 999 999...999 999 999.

If you have any troubles, don’t hesitate to share your questions, ideas and code with me. I will try my best to help you, my dear reader. I am reachable via Twitter by handle @tdd_fellow.

Challenge

Extend solution to support floating-point numbers to a precision level of 6 digits after the dot.

Thank you for reading!

Final note: did you notice, how tedious it is to check, that output of all these print-s is correct, each time you run the program? This can and should be automated! Stay tuned, I am going to publish an article on how to easily automate these checks really soon!

Next issue is out: Automating Manual Checks or How to Save Time and Get Rapid Feedback!

TDD #4: Path Finding

graph-theory, screencast, tdd, tdd-screencasts, test-driven-development

Recommended to watch in a full-screen and on 720p or higher quality.

List of all TDD Screencasts can be found here.

Script

Hello. I am Oleksii Fedorov and I am wearing a hat of That TDD Fellow. That means you are watching TDD Screencast episode #4.

As mentioned in the previous episode, today we are going to implement a path finding algorithm.

Specifically, our first problem in a path finding theme will be the following:

  • We are given a directional graph, that consists of nodes and edges between them.
  • We are given two nodes: start and finish.
  • We need to answer such questions:
    • Is there a path from start to finish?
    • If there is, What this path is?

I need to make a remark, that we clearly dont have to return the shortest path, just any path, that we can find. We will tackle shortest path problem in later episodes.

Now, I think we can start.

1
2
./watch.sh
vim         # coding session

This was pretty easy to make a DFS algorithm to emerge on its own, by driving it with the specification. Lets see if it is possible to do the same with shortest path problem.. Next time, on the TDD Screencast. Have a nice day.

Why Are You Slow?

cleancode, craftsmanship, preaching, professionalism, software-industry

We are slow because we have the worst codebase!

So why don’t you clean it?

Because we have to go fast!

I will let you figure out logical inconsistency there.

Ask yourself a question: how much times in your career you were slowed down by a horrible, dirty, untested code? It doesn’t matter who have written it, usually, it is already there and already a fact you have to deal with.

If your answer was: “Once.. or twice..” (and your career is sufficiently long) - you can freely stop reading this post and simply be a happy developer.

I was slowed down by bad code horrible amount of times. I come to be sick of it. I don’t want to be slow, because I am trying to go fast. I want my tool to be clean, I want my creations to be manageable, and I don’t want to fear the code, I have created, or anyone else have created. And I want to go fast while enjoying it!

Trust me, I met lots of developers during my career. Not as much as heroes of our industry have met, but enough to be able to tell you, that every single one of them, who has any reasonable length of the career, had same problems.

And that is totally not normal. Do you understand, that from the outside world, we, as an industry, are perceived as very non-professional, because of that? Business even came to a conclusion, that they can’t trust us to deliver working software - and that is how QA role was born.

I will ask again.

So why don’t you clean it?

My company pays me for features, not for <insert your statement here>

That is very common response. And, in essence, it is the truth.

OK, now, let’s think. As time passes by, after certain threshold on every codebase, that is not clean, adding next feature costs more time, even if the features are of roughly the same size. After 1 year of such development, features can literally become 2-4 times more expensive. Couple of years in, and it, practically, becomes impossible to develop anything except “Change the color of this button” (or provide your own example).

What does that mean?

One would say: “Features become more expensive over time”. Business usually sees it as: “Arrgh, our programmers slower and slower with each month”. So from the business perspective, after perceived effectiveness of developers drops by 2-4 times, why would business be willing to pay these developers the same salary? You should be afraid of this question.

Now imagine land of unicorns!

Now imagine, that you started a bit slower in the beginning (like 8% slower), but you have kept your effectiveness over time either at roughly same level, or even have increased it over time. How hard do you think it would be to ask for a raise, after 2 years of loyal work?

Not hard at all. And probably business will be doing just great (if the business idea itself was sustainable to begin with, of course), and will be capable of satisfying the request.

Why such a thing happens?

Because at the moment we are not perceived as professionals, nor as experts of our field. We are perceived as just some coding monkeys, who you always need to ask: “Can this be delivered on Friday” with intimidating tone. And we gladly reply through our teeth: “I will try”, meaning “Just go already”; ending up on Friday evening with: “I tried” - “You have tried not enough!”.

Clean your code already!

Believe me, investing 15-45 minutes every single day into increasing test code coverage and some little refactorings will not make you any slower (you probably already very slow, so that it will not make any perceivable difference). Rather, over time, you (and your fellow programmers) will start to actually being bit-by-bit faster, as long as your application (or applications) get cleaner and cleaner.

It goes without saying, that you should be using proper XP (pair-programming and TDD) techniques, while writing any new piece of code (read: new class, module, library, package, etc.). Because it will be extremely easy to unit-test it with near-100% test coverage. Believe me, that is easy and fun.

Refactoring and covering old and messy parts of an application is not fun, though. You have to face the truth. Consider it as a chore, like the one you do to your apartment, when you clean it on a regular basis. And as we all know, the more you wait to clean an apartment, the harder it would be to do it. And the function is not linear..

If you have a really big legacy application, you are probably in doing that sort of thing for 2-3 years, before you can proudly call this application clean again.

There is one important trick though: prioritize cleaning of parts of application, that change often. If some part changes once half a year, you should probably clean it once half a year too.

You are hired for your expertise!

Believe me, you do have all the knowledge, required to make it.

The only thing that stops you is your inability to say “No”, when “No” is the correct answer from your experience and expertise point of view.

Parallel example

Do you know that surgeon, before any surgery, washes his hands. You not just know, it is your expectation! In some countries if he doesn’t, he can easily end up in jail.

Do you know, how surgeon washes his hands?

He rubs each finger from 4 different sides 10 times. That is stated in Doctors’ code of ethics. And they have to follow it, since it has power of law.

Why 10 times? Wouldn’t 7 be enough, or maybe 14 should be a minimum? It doesn’t matter. It is a discipline, that is to be followed to the letter, and there are no exceptions.

Well, he probably can rub 11 times, without getting in jail..

Nobody will ever ask surgeon, why he does it. Everyone expects him to do it.

Back to our universe

You are surgeons!

You are surgeons, that operate on a heart of the business.

With your wrong move, with your mistake, the whole business can go out of business overnight.

With your wrong move, with your mistake, thousands of people can die (if you are in certain domains).

So why increasing a chance of your own mistakes (and everybody else on a team) by not cleaning the code?

Even if you are not in such critical domain, you are part of industry, and you should be a great professional, just to be an example for others, who might end up working in such crucial business domain in the future.

And I know, you can pull it off. And you do know it yourself.

So now go, and be professional, be professional for everyone around you.

Thanks!

This article might be a bit too rough - I believe that is the truth we face now, as an industry. And let us be the ones fixing it!

I recommend reading this: Software Engineering Code of Ethics by ACM organization (originally, created in 1999, why are we all still not using it?!).

TDD #3: Kind Sort

screencast, tdd, tdd-screencasts, test-driven-development

Recommended to watch in a full-screen and on 720p or higher quality.

Script

Hello, hello! I am Oleksii Fedorov, and this is the third episode of TDD Screencast.

In last episode we have implemented a sorting algorithm using TDD, without thinking about algorithm beforehand. As a result, bubble sort have emerged.

We have noticed, that there is a small weird thing about this implementation: it has unspecified behavior - mutation of the original array.

So we asked, which algorithm would emerge, if we were to ban such side-effects from our algorithm. Let’s find out!

1
2
./watch.sh
vim           # implement sorting algorithm

I think we are done here. If you look closely, it is a quicksort. It is not the most memory-efficient implementation, but that is something that is simple to optimize (instead of passing recursively arrays, pass original array and indexes). That optimization will involve actual mutation of the array in place, so if we want to stay true to our specification /show test for no-side-effects/. We will have to copy the array once, using some sort of wrapper function.

Applying this optimization I leave as an exercise to you my users.

In the next episode we will look into path-finding problem, and we will see, how these techniques apply there. See you next time! Have a nice day.

Test-Driven-Development Screencast #2

screencast, tdd, tdd-screencasts, test-driven-development

Recommended to watch in a full-screen and on 720p or higher quality.

List of all TDD Screencasts can be found here.

Script

Hello, hello! I am Oleksii Fedorov, and this is the second episode of Test-Driven-Development Screencast.

Now that I think about it, first episode was more of an audio podcast (with two and half visual slides), rather than a screencast. Don’t worry - this episode will be full of code and actual action time on the screen.

Today we are going to implement a sorting algorithm. We will not come up with algorithm beforehand and we will simply let it emerge by itself, while we are doing TDD.

Lets jump in!

1
2
3
4
5
6
./watch.sh  # So I have here small script, that will watch changes in my code
            # and will run all my tests. Additionally, it will show me the
            # result in the notification bar (you will see, shortly).

vim         # 1) Create test file
            # 2) Follow TDD rules to the letter

I think we are done here. And notice, that this is a bubble sort algorithm. Now lets ask a question, why the worst possible algorithm have emerged, while we were using TDD. That is an interesting question.

But first, lets ask ourselves a question: Are we kind to the user of our function? The answer is - NO. We are mutating the argument, that user passed to us. And this mutation might be unexpected. At least our test suite doesn’t even mention such behavior.

The root of the problem is this little swap operation that we have here:

1
2
# show swap operation on the screen
a[0], a[1] = {a[1], a[0]}

I wonder, what will happen if we were to ban swap operation (and any kind of mutation of the argument of the function), and implement sorting algorithm again?

Find out next time, on the next episode of Test-Driven-Development! Have a good day.

Test-Driven-Development Screencast #1

screencast, tdd, tdd-screencasts, test-driven-development

Recommended to watch in a full-screen and on 720p or higher quality.

List of all TDD Screencasts can be found here.

Script

Hello! I am Oleksii Fedorov, and this is the first episode of Test-Driven-Development Screencast.

Today, I am going to briefly address following questions about Test-Driven-Development: - What TDD is? - What are main benefits of doing TDD?

At the end I am going to demonstrate a small example on how to implement simple sorting algorithm using TDD.

Let me open my slides. Don’t worry: there are only three small slides.

1
vim ./slides/*

TDD is Software Development discipline. Therefore, basic rules, defined by TDD, are arbitrary, weird and are to be followed to the letter, if you want it to be useful. There is a reason for that – but we will talk shortly about that. Let me read these basic rules for you:

  1. You are not allowed to write a line of production code, unless test fails.

    Which means, that I will have to write the test even before I have something to test. It may sound stupid. Maybe, it is stupid. Maybe, it is not.

    But the next rule is even more weird than the first one:

  2. You are not allowed to write more of a test, that is sufficient to fail.

    It is important to clarify, what ˝fail˝ means in this context. It means test expectation failure, and compilation/parsing/interpretation failure (depending if your programming language is compiled or interpreted).

    Which means, that you will have to interrupt yourself, while writing a test, because you have mentioned the class or package, that does not exist yet. Or you have mentioned the method or function, that does not exist yet.

    Now that may sound really stupid to you. Bear with me, and lets see how weird the last rule is:

  3. You are not allowed to write more production code, that is sufficient to make the failing test pass.

    Which means, that once you have defined a class, that was mentioned - you have just fixed a failing test, and you have to go back and write more of the test, or add new test.

    Which means, that once you have defined a method, that was mentioned - you have just fixed a failing test, and you have to got back and write more of the test again.

    Which means, that once you changed your production code only slightly in direction of the correct implementation, you have to go back and write more tests.

    This is interesting. Now you got yourself in a very tight lock. In a very tight feedback loop. Write a line of test, write a line of code, write a line of test, write a line of code, and so on. The length of this loop is probably 5, 10, 30 seconds. If you have a test suite that needs half an hour to run, you will not do TDD.

What happens if you do not follow this discipline to the letter? If you slip there and there?: write a bit more production code, than you had to?, write a bit more of a test, than you had to?, or even wrote a test after writing the whole class under test?

Well, you have just lost the main benefit of TDD: you can no longer trust the test suite. For sure, there will be some untested code if you do it this way.

Why do we need 100% test coverage, you ask? 70% sounds like an awesome achievement! Or is it?..

What can you tell from the fact, that only 70% of your code is covered by test suite? Only that 30% is not covered, and therefore, there is no quick and easy way to verify that it works.

Lets imagine the following scenario:

  • You open a file on your screen.
  • You spot some nasty duplication, and you know you want to fix it.
  • You even see an obvious way to fix it.
  • You touch your keyboard.
  • And now, the fear overwhelms you: this class is not tested.
  • And your reaction? - I won’t touch it!

That is where code starts to rot, while nobody cleans it up, because test suite can not be trusted, and the whole codebase slowly down-slides to a big pile of mess.

Now, lets imagine, that you have 100% code coverage (Well, maybe 98%, because 100% is the goal, that is not achievable). And the same scenario:

  • You open a file on your screen.
  • You spot some nasty duplication.
  • You fix the duplication.
  • You run tests - and they are green.
  • You check-in cleaner code in your code control system.

Or, lets say, that the problem is not trivial: - You spot the long method. - You split it in 3 methods - tests are still green. - You proceed and extract these methods to the new class. - And tests fail. - Undo-Undo-Undo. And you are back to the green state. - And now you think for a moment, what happened there. - And you already have this ˝Gotcha!˝. - And you successfully extract a class again - and the test suite is green. - You check-in cleaner code in your code control system.

Undo button becomes your best friend. Once you stop knowing what is going on, or what you are doing, or you simply confused, you can always go back to the green state; that just happens to be 25 seconds ago (or 2-3 undo) away, because of the tight feedback loop you got yourself into.

Now, there is a hidden rule of TDD. That feels more, like an implementation detail of TDD:

1
:next

As tests become more specific, production code should become more generic.

And it is very true, otherwise, you would end up adding a bunch of if statements every time you add a failing test.

What that means, I will point out during the example.

And lets sum up now:

100% Code Coverage => Lack of Fear => Consistent Random Kindness to the Code => Clean Code.


60% Code Coverage => Fear to Break It => ˝I won’t touch it!˝ => Mess.

Now we can finally move on to the example. Next time, on the next episode of Test-Driven-Development Screencast! Have a good day.

Why Do You Need to Be Careful With Loop Variable in Go

computers, concurrency, golang, pitfalls, programming

TL;DR

This post describes two different issues:

Race conditions when taking reference of loop variable and passing it to another goroutine:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// WRONG: pass message by reference
for message := range inbox {
        outbox <- EnhancedMessage{
                // .. more fields here ..
                Original: &message,
        }
}

// CORRECT: pass message by value
for message := range inbox {
        outbox <- EnhancedMessage{
                // .. more fields here ..
                // Pass message by value here
                Original: message,
        }
}

See explanation here.

Race conditions when using loop variable inside of goroutine inside of loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
// WRONG: use loop variable directly from goroutine
for message := range inbox {
        go func() {
                // .. do something important with message ..
        }()
}

// CORRECT: pass loop variable by value as an argument for goroutine's function
for message := range inbox {
        go func(message Message) {
                // .. do something important with message ..
        }(message)
}

See explanation here.

Taking reference of loop variable

Lets start off with simple code example:

1
2
3
4
5
6
for message := range inbox {
        outbox <- EnhancedMessage{
                // .. more fields here ..
                Original: &message,
        }
}

Looks quite legit. In practice it will often cause race conditions. Because message variable is defined once and then mutated in each loop iteration. The variable is passed pointer to some concurrent collaborators, which causes race conditions and very confusing bugs.

Above code can be re-written as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// local scope begins here
        var (
                message Message
                ok bool
        )
        for {
                message, ok = <-inbox
                if !ok {
                        break
                }

                outbox <- EnhancedMessage{
                        // .. more fields here ..
                        Original: &message,
                }
        }
// local scope ends here

Looking at this code, it is quite obvious why it would have race conditions.

Correct way of doing that would be to either define a new variable manually each iteration and copy message’s value into it:

1
2
3
4
5
6
7
for message := range inbox {
        m := message
        outbox <- EnhancedMessage{
                // ...
                Original: &m,
        }
}

Another way of doing that would be to take control of how loop variable works yourself:

1
2
3
4
5
6
7
8
9
10
11
for {
        message, ok := <-inbox
        if !ok {
                break
        }

        outbox <- EnhancedMessage{
                // ...
                Original: &message,
        }
}

Taking into account, that until the EnhancedMessage is processed by concurrent collaborator and garbage collected, variables, created during each iteration, i.e.: m and message for both examples, will stay in memory. Therefore it is possible to just use pass-by-value instead of pass-by-reference to achieve the same result. It is simpler too:

1
2
3
4
5
6
7
8
9
for message := range inbox {
        outbox <- EnhancedMessage{
                // ...

                // Given the fact that `EnhancedMessage.Original` definition
                // changed to be of value type `Message`
                Original: message,
        }
}

Personally, I prefer latter. If you know of any drawbacks of this approach comparing to other 2, or if you know of entirely better way of doing that, please let me know.

Running goroutine, that uses loop variable

Example code:

1
2
3
4
5
for message := range inbox {
        go func() {
                // .. do something important with message ..
        }()
}

This code might look legit too. You might think it will process whole inbox concurrently, but most probably it will process only a couple of last elements multiple times.

If you rewrite the loop in a similar fashion as in previous section, you would notice that message would be mutated while these goroutines are still processing it. This will cause confusing race conditions.

Correct way of doing that is:

1
2
3
4
5
for message := range inbox {
        go func(message Message) {
                // .. do something important with message ..
        }(message)
}

In this case, it is basically the same as copying the value to the new defined variable at each iteration of the loop. It just looks nicer.

Thanks!

If you have any questions, suggestions or just want to chat about the topic, you can ping me on twitter @tdd_fellow or drop a comment on hackernews.

Especially, if you think I am wrong somewhere in this article, please tell me, I will only be happy to learn and iterate over this article to improve it.

Happy coding!

Credits

Intention-revealing Code

computers, design, golang, pragmatic, programming, rant

Lets start off with very simple code:

1
2
3
4
5
6
7
func toJSON(post Post) string {
        return fmt.Sprintf(
                `{"post_title": "%s", "post_content": "%s"}`,
                post.Title,
                post.Content,
        )
}

This is very simple code and it is easy to understand what it is doing and what it is doing wrong:

  • It tries to marshal post struct to custom JSON representation.
  • It fails when there are special characters in these strings.
  • It does not use standard MarshalJSON interface.

It can be fixed in a pretty simple way:

1
2
3
4
5
6
func (post Post) MarshalJSON() ([]byte, error) {
        return json.Marshal(map[string]interface{}{
                "post_title":   post.Title,
                "post_content": post.Content,
        })
}

And at the usage site, you can now just use standard encoding/json package capabilities:

1
2
3
4
5
6
rawPost, err := json.Marshal(post)
if err != nil {
  return err
}

// do stuff with rawPost

And now you can notice that tests do not pass. And the place that failed is totally unrelevant. Long story short. Name of the original method was not revealing any real intent: it was actually specific json representation for usage with external API, but normal json.Marshal is used by this same application for providing responses to its own HTTP clients.

Were the name a bit more intention-revealing, nobody would waste their time on finding this out by trial and mistake:

1
2
3
4
// should probably even sit in different package
func marshalToExternalAPIFormat(post Post) ([]byte, err) {
        // ...
}

And this is only a tip of the iceberg of how non-intention-revealing code can trip you over.

Using contracts.ruby With RSpec. Part 2

contracts.ruby, design-by-contract, rspec, ruby

Remember Using contracts.ruby With RSpec ?

RSpec mocks violate all :class contracts because is_a?(ClassName) returns false for mock. That post describes 2 possible solutions:

  • stub :is_a?: allow(my_double).to receive(:is_a?).with(MyClass).and_return(true), or
  • use contracts-rspec gem, that patches instance_double RSpec helper.

Custom validators

Since custom validators have finally landed here: egonShiele/contracts.ruby#159, now you can just override :class validator to accept all RSpec mocks:

1
2
3
4
5
6
7
# Make contracts accept all RSpec doubles
Contract.override_validator(:class) do |contract|
  lambda do |arg|
    arg.is_a?(RSpec::Mocks::Double) ||
      arg.is_a?(contract)
  end
end

Now, RSpec mocks will not violate all the :class contracts.

More information can be found here: Providing your own custom validators.

Additionally this refactoring enabled valuable speed optimization for complex contracts - validators for them will be evaluated only once and memoized.

Links

Thanks!

If you have any questions, suggestions or just want to chat about how contracts.ruby is awesome, you can ping me on twitter @tdd_fellow. If you have any issues using contracts.ruby you can create issues on corresponding github project. Pull requests are welcome!

Comments on hackernews.

Happy coding! @tdd_fellow on twitter.

Docker Machine Guide (VirtualBox on Mac OS X)

docker, docker-machine, guide, macosx, virtualbox

(VirtualBox on Mac OS X)

This guide is a combination of official docs and usage experience.

Installation

Install virtualbox first virtualbox downloads.

Look at this page: https://github.com/docker/machine/releases/ and pick latest release. At the time of writing this is v0.4.1.

Now, assign version number to an environment variable, together with your architecture:

1
2
DOCKER_MACHINE_VERSION=v0.4.1
DOCKER_MACHINE_ARCH=darwin-amd64

Now download docker-machine binary and put it onto your PATH (recommended is ~/bin/):

1
2
3
4
5
mkdir -p ~/bin
URL=https://github.com/docker/machine/releases/download/${DOCKER_MACHINE_VERSION}/docker-machine_${DOCKER_MACHINE_ARCH}
OUTPUT=~/bin/docker-machine
curl -L ${URL} > ${OUTPUT}
chmod +x ${OUTPUT}

If you still haven’t yet, put ~/bin/ on your PATH with export PATH=$PATH:$HOME/bin into your .bashrc or .zshrc (or whatever shell you use).

Installing docker client

1
2
3
4
URL=https://get.docker.com/builds/Darwin/x86_64/docker-latest
OUTPUT=~/bin/docker
curl -L ${URL} > ${OUTPUT}
chmod +x ${OUTPUT}

Creating your first docker machine

1
docker-machine create -d virtualbox dev

This is how you create docker machine with name dev having virtualbox as a backend.

But after some time you will encounter a problem of running out of memory. So my recommended command to create your primary development docker machine is this:

1
docker-machine create -d virtualbox dev --virtualbox-memory "5120"

This will create virtualbox VM with enough memory to run low-to-moderate size clusters with docker-compose. Which should be enough for development.

This should be your primary docker machine that is always activated and used. There is no need to destroy and re-create this dev machine unless you are testing some edge-cases. And better to use additional docker machine with different name for this.

Connecting to your dev docker machine

1
eval $(docker-machine env dev)

After this command, you will have everything you need to run docker in the same terminal:

1
2
# Using alpine image here because it is only ~5 MB
docker run -it --rm alpine echo 'hello world'

You should see:

1
2
# .. pulling alpine image here .. and:
hello world

It might be annoying to run eval $(docker-machine env dev) each time you open new terminal. So feel free to put this line into your .bashrc or .zshrc or whatever shell you use:

1
2
# in .bashrc:
eval $(docker-machine env dev)

If you have just powered on your Mac (or just stopped your docker machine) you will experience this error:

1
Error: Host does not exist: dev

In that case just start it with:

1
docker-machine start dev

And re-open your terminal.

Dealing with docker machine’s IP address

Fact: docker machine’s IP address stays the same, usually 192.168.99.100, unless:

  • you destroy your docker machine dev, create another VirtualBox VM and create docker machine dev afterwards, or
  • you have custom VirtualBox configuration.

Given that docker machine’s IP address stays the same or changes very rarely, you can simply put its IP address in your /etc/hosts.

First, figure out current docker machine’s IP address:

1
docker-machine ip dev

And put it in /etc/hosts:

1
2
3
4
# in /etc/hosts

# Put IP address previous command has returned here:
192.168.99.100 docker-dev

To test that it works correclty try to run:

1
docker run -it --rm -p 80:80 nginx

And now reach http://docker-dev in your browser - you should see default Nginx page.

If you want to refer docker machine dev in your scripts, it is better to use $(docker-machine ip dev) capabilities for that. For example, curl-ing the page we have seen in browser just now:

1
curl $(docker-machine ip dev)

For teams it would make sense to agree on the same name for primary development docker machine. dev works just great!

NOTE: personally, I use both docker-dev and just dev as a hostname to type less, but that might clash with something else, so docker-dev it is.

Upgrading

To upgrade docker-machine or docker binaries, just follow Installation instructions again.

To upgrade docker server inside of already running docker machine, use:

1
docker-machine upgrade dev

This will update to the latest version of docker server and boot2docker image.

Re-creating a fresh dev docker machine

1
2
docker-machine rm dev
docker-machine create -d virtualbox dev --virtualbox-memory "5120"

Further reading

Comments on hackernews.

Happy hacking! @tdd_fellow on twitter.