“What do you mean by “software engineering practices?” — was the developer’s response.
When I explain what I mean by these three words (apparently, I took for granted), the person starts listing technologies. This conversation has been repeating a bit too frequently for my liking recently.
Look, I get it. Technologies knowledge and experience are vital for a software engineer. However, the fundamental practices of software engineering are insanely important nevertheless.
Let’s discover why.
Why do these practices exist in the first place, and why are countless books written on them? And why is the industry trying to improve on the existing ones and invent new ones?
Why just a bunch of technology skills is not enough?
That’s because we all want to get stuff done fast. And we want it done with a high level of quality. And we want it to be valuable.
The “Holy Grail of Software Development” is the ability to deliver fast and with high quality.
While innovative programming language or framework can improve speed and reduce the chance of making mistakes, the improvement effect is negligible in “speed x quality x value.”
What matters then?
How the software is developed creates a tremendous effect on speed and quality.
It’s the processes applied, techniques used, disciplines practiced.
Just not necessarily the good ones.
Slap some code together, use a lot of copy-paste from StackOverflow and your previous project, and get it to production as quickly as possible? — That’s a software engineering practice.
It gives you great speed (at first) and, of course, reduces quality substantially. Do that for a few weeks, and you’ll inevitably slow down because you’ll constantly be debugging all the bugs produced.
Then, on the other hand, there is another practice of unit-testing and integration-testing the code and refactoring it until it becomes “Clean Code.” — This is a simple example of a few good practices applied:
Let’s say that you’ve committed to using a whole bunch of best practices, such as keeping your code decoupled and flexible using design patterns. Moreover, you’ve decided to exhaustively implement and test all the different scenarios that you think could happen in the future.
These are all that many would consider best practices.
That sounds quite reasonable, however, applied in the wrong context, that could lead to the following:
As you can see, the idea was to increase quality, increase speed in the future, and deliver as much customer and business value as possible.
What happened instead?
The quality was excellent — that’s all good!
Now, the speed was initially slower, and it kept being slower in the future because none of the expected came to pass.
The best thing you can expect in software development is change (not the one you expected). Also, prepare to have your expectations and assumptions hammered all the time.
Finally, too much flexibility harmed the business value.
The same set of practices could’ve been fantastic in another context, where you, for example, precisely know what will happen in the future. An example scenario is when you’re rebuilding an existing successful system.
In that case, you would have gotten lots of speed, quality, and business value throughout the endeavor.
What are these practices based on? How to detect whether the context is proper or not? When to apply and when not? How to apply?
You guide the choice of practices and patterns by the software engineering principles!
When you adopt multiple principles (such as YAGNI, DRY, SOLID, etc.), they will guide you when you need to act in specific ways (e.g., apply one practice or another, or make this or that choice).
The challenge would be when multiple principles you’ve adopted conflict with each other.
That’s when you’ll have to make trade-offs.
Principles should be based on values. The values that you, your team, and your organization adopts.
For example, when I was talking about the “Holy Grail of SWE,” I’ve mentioned the “speed x quality x value” as the three most important ones. That’s a value statement.
Also, who said that these three values are the silver bullet to success? Nobody knows. Every person and organization can have its own set of values.
However, for any organization or group to achieve a big goal and execute a complex strategy, it would need to align everybody involved on the same set of values.
Once you and your group have agreed on what values you should adopt to achieve the challenging long-term goals, you must pick or create the principles that will get you there.
Picking existing principles that are tried and tested to improve software engineering in terms of the values you’ve chosen is the most straightforward approach. However, that shouldn’t deter you from thinking about a custom set of principles that would work for your group’s values in particular.
I’ll give you a few values that I usually go for in most software products where the only constant is the change itself:
Once these values are defined, it’s pretty simple to pick the principles to guide the decisions.
And given the principles, you can pick or design your software engineering practices.
Most of these are parts of original Extreme Programming, and this is no surprise because the values I have listed above are almost the same as values of XP.
Moreover, there are a few more principles and practices that are modern, and they could be considered as the evolution of XP.
Now I want to return to that situation where the developer responded with a list of technologies when asked about practices.
Here’s the thing: the practices, principles, and values are universal. No matter which technology you use, you can rely on your practices, given that your principles and values haven’t changed.
Moreover, when you master the practices and principles, you’ll be able to change technologies and tools like gloves while maintaining high parameters on all your values and delivering excellent results!
Of course, if you are starting in software engineering, knowing specific technologies is most essential for you.
That’s because your effectiveness is a product of your tools experience and the practices you apply. And to use and understand the practices, you need to have at least a bit of skill with the programming language and basic problem-solving.
However, beyond that, the equation:
TOOLS SKILL x PRACTICES SKILL x SOFT SKILLS
kicks in!
So the most effective way to develop as a software engineer is to have both a robust set of hard and soft skills and learn and apply the best software engineering practices (that are applicable in your context).
Thank you for reading!
If you have other ideas on the topic, feel free to send me a Tweet!
]]>It’s the third one that is the most worrying because it happens way too often, and it’s probably more harmful than you think.
This article isn’t preaching against new technologies, architectures, etc.
Instead, I’d like to point out that in about 95% of the cases, they are introduced years before they are actually necessary. Instead of making the software more straightforward, this makes the software more complex.
What should we do instead if we want to learn more new stuff then?
Well, one thing is to invert your focus. Instead of chasing something more complex, take on the challenge of making what you have much simpler.
In fact, making the existing codebase and architecture more straightforward than it already is — is a considerable challenge, in the process of which you are guaranteed to learn new things!
Suppose you feel the urge to learn something new and deal with a more complex task. Perhaps, you should talk to your business stakeholders to see if they have an idea that they don’t even want to mention to product & engineering. (Because they think it’s too challenging or impossible)
These types of ideas are where there is real gold for learning, innovation, and valuable complexity!
If you have other ideas on the topic, feel free to send me a Tweet!
]]>First, let me give you some context.
At Pivotal we pair all the time. It is very rare to see anyone solo. Pairing does not end with pair-programming, it also includes other roles such as design and product. Also, pairing is, at times, cross-functional, e.g.: product designer pairs with a software engineer, or product designer pairs with a product manager, etc. I love that.
With pairing, not everything is so shiny.
Now and then, you need to pair with someone, you have never paired with before. That happens in the event of new hires, team member rotations between teams, and cross-team pairing sessions.
We are all humans, so, from time to time, it feels like you are not getting along very well with your pair. There is a certain amount of tension. Of course, that harms your productivity, and, also, drains your energy.
On the contrary side, the chemistry between pairs might be so good, that you are just having fun the whole day and enjoy the pairing session, and the amount of work being done is suboptimal.
The pairing session might go not so well not only because of chemistry. Pairs could have chosen a non-suitable style of pairing (one of: ping-pong, switch on red, driver-navigator, etc.). From time to time, how pairs solve the problem, can also be improved.
That is a technique that we are utilizing all the time when we have such problems. First, ask your pair if they want to apply this technique and reserve 5 minutes at the end of the pairing session (end of the work day is the most suitable). Then, when the time comes, give each other the feedback in the format of “Pluses and Deltas”:
When both pairs are proficient with this technique, then most of the problems can be resolved in 2-4 pairing sessions. By “proficient” we mean here: both are open to receiving the feedback, and are capable of calling out complicated things without triggering defensiveness on the receiving side.
If the whole team applies this technique daily for two or three weeks, they will have to stop using it because there will be nothing to talk about anymore. That means that the team has solved most of the problems, and there is no longer need in a daily application of such technique.
All these problems are not unique to pairing. They are inherent to the collaboration. Essentially, any team will have these problems. It is just that, in non-pairing environments, these problems will become apparent only after months of work. That is all while they continue harming productivity and people’s happiness for these long months.
With pairing these problems become apparent immediately. So you can start fixing them on the day one, and not after the half a year of the broken collaboration.
If you pair often, or if you have any other sort of collaboration within the team, I recommend trying out this technique. It will take some time to get proficient at giving feedback.
There is a marvelous talk from Dan North on how to provide an effective feedback in different contexts.
Thank you for reading, my dear reader. If you liked it, please share this article on social networks, Reddit, and Hacker News, and follow me on Twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>A year ago and before I was pulling out late nights almost every single day. Making an open-source contribution, working on the astounding side project idea, writing a blog post, watching a conference talk or a fun video. As an owl type, I have a lot of bright ideas at that time of day. Also, my productivity is at its top. Or at least it feels that way.
That usually meant, that I would go to work next day somewhere between 10 AM and 12 PM. Mostly, that was okay because a lot of other developers around me did the same thing.
So, imagine having your work day (for your employer, client or yourself) start this way:
You can see, that my work day wasn’t as productive as it should. One may say that I’m just shifting focus to my activities and invest less productivity into the work for my employer/client/etc. And I have another story to tell here.
I have been noticing for some time that all the work that I do in the late evenings and the night for myself needs to be re-worked all the time. To put it simply, quality of my most “productive” work was not as good as I would expect it to be.
Additionally, since I was working in a pair-programming environment and everybody started their work day at a different time, it was hard to pair-program all the time - there was a lot of time I had to solo. When I have joined Pivotal I was amazed by the fact that they pair-program nearly 100% of the time. I was enjoying that right from the first day. Pairing 100% of the time also means that the whole team has to start the work day and end it at the same time. We are starting early in the morning, and ending our work day at 5 PM every day.
Also, at Pivotal we finish all of our daily meetings in 10 minutes right at the beginning of the day. We’re off to work right after. That builds a crazy momentum of making things done and makes us productive.
That kind of momentum results in a lot of done work. The amount of work being done before lunch ultimately exceeds the amount of work I remember doing after lunch and, especially, in the evening.
Even though I feel more productive in the late evening, and in the night, practical results show that most productive work is done in the morning if I maintain that kind of momentum.
Maybe it is only about work, what about personal things that I need to do? - One might ask. I have another story for you. A couple of months ago I started developing a “mini habit”: wake up every day at 6 AM. That gives me about 50 minutes of free time before I need to go to the office. Add to that 35-45 minutes commute on the subway. Now and then, when I am talking to my pair during our break (we are using the pomodoro productivity technique), I realize that at 9:05 AM I have done all the personal things that I have to do today. These things include morning exercise, household chores, contributing to open-source, writing a blog post, reading a book, sending a letter, etc. By the way, most of these things are also mini habits.
Another thing that I am experiencing right now is that the quality of the work is much higher than before - I have to do much less reworking. I still get the most remarkable ideas in the evening and before going to sleep, so what do I do about them? Note them down, and go through them next morning or later.
I am amazed so far, how waking up in the morning every day, earlier than you normally have to, allows to build up a significant momentum, complete all the personal tasks that need to be done and transfer that increased momentum to my work. It works very well for me, and I recommend trying it, even if you are an owl type.
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on Twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>Or did you ever had an incident happening in production, and half of your vocabulary at that time was not normative? Also, did you feel stressed and down afterward?
Or did you ever had to converse with a difficult person or someone holding a strong opinion, and thought not very good about them? Or even dismissed a valid argument just because you didn’t like the conversation?
Did you ever felt or did something negative, and thought afterward that it would be better not to? Then the technique described in this article is perfect for you!
Shall we get the ball rolling?
About one year ago I was talking to Daniel Irvine at the Software Craftsmanship meetup in Berlin about the technique that he was applying daily. I was amazed by how challenging it is and what results it can potentially yield. Since then I’m using this technique to increase awareness of how I feel, what I think, and what I am about to say.
It gives me the ability to respond to events around me instead of reacting to them. The reaction is the immediate reply executed by the subconscious part of our brain. The response is the delayed reply produced by the concious part of our brain. Responding is much better because it gives us time to think what would be the best response to the current situation. Especially it is better when something negative or weird is happening around us.
Enough preaching, let me tell you the challenge itself. For twenty-four hours do not say anything negative and do not think negatively.
Over this year, I felt like I can detect the negative feeling before it manifests as a negative thought, judgement, or a phrase. That helps me to be calmer, be more patient, and not spend time on insignificant things. If I know that something makes me unhappy or confuses me, I can call it out, and fix the problem without causing people around me to feel bad about it. Additionally, it makes me feel better.
Another thing that I have learned that, as soon as I am tired, I lose these abilities and fall back to reacting instead of responding. Good night sleep helps a lot with that ;)
I highly recommend you try this challenge. Caution: it is very hard.
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on Twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>This article is the sixth one of the series “Build Your Own Testing Framework” so make sure to stick around for next parts! Find all posts of these series here.
Shall we get started?
We will start from the RunTestSuiteTest
and run the test suite with the single test. Then we are going to assert that for that the test with the name testOk
has been reported as passing:
1 2 3 4 5 6 7 8 9 10 11 |
|
If we run this test suite, we can see that only one test executes!
1 2 3 4 |
|
Oh, that is interesting. This test suite does not run. Upon investigating, it turns out, that process.exit(0)
is being called during the runTestSuite(...)
function run. That is because of the latest feature that we have implemented - “exit with an appropriate exit code (zero for success, and one for failure).” We should be able to fix that by providing the process spy in the options of the runTestSuite
function that we are calling from the inside of the individual tests in the RunTestSuiteTest
test suite. And we ought to alleviate this kind of mistake somehow - we need a mechanism that would alert us if not all tests have been run. Maybe something like verifyAllTestsRun: true
option for the runTestSuite
. For that let’s write a test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
That might be a bit complex at first. Let’s take a closer look how this test is supposed to work:
runTestSuite
without process spy providedIf we run this test, it will pass. That is unexpected because we wanted it to fail. Apparently, most inner runTestSuite
is doing process.exit(0)
.
For that to work, we will need to be able to provide a hook into process.exit(code)
function. For that, we would need to create a SimpleProcess
class, that allows installation of such hooks. Let’s test-drive it!
process.exit
with hooksFirst, we should start from the normal behavior without any hooks:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
When running this test, we will get a failure about SimpleProcess
being undefined. So let’s define it:
1 2 3 4 5 6 7 8 9 10 11 |
|
If we run our test suite now, we will get an error TypeError: process.exit is not a function
. To fix that failure we will have to define the exit(code)
method on our newly created class SimpleProcess
:
1 2 3 4 5 |
|
After doing that we will get an assertion failure Error: Expected to equal 0, but got: null
, as expected. To make the test pass, it would be enough to call globalProcess.exit(0)
:
1 2 3 |
|
If we run our test suite now, we will get no failures. That is great! Now, we can see that globalProcess.exit(0)
is probably not exactly what we want to have there. We ought to pass the code
parameter to the exit
function. To test-drive this properly, we will have to triangulate, i.e.: add another test with the different value of the code
parameter:
1 2 3 4 5 6 7 8 |
|
That fails as expected: Error: Expected to equal 1, but got: 0
. To make it pass we can either write some weird “if” statement or we could pass the code
parameter to the globalProcess.exit
function. The second option is simpler. According to the third rule of test-driven development, we should go for it:
1 2 3 |
|
That change makes our test suite pass. We probably should refactor the test suite to reduce the level of the duplication by extracting common variables from the tests:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
At that point, we should move on to tests for the hook installation functionality. Because right now we need only at most one hook we will not support multiple hooks at the same time - only one:
1 2 3 4 5 6 7 8 |
|
When we run this test, it fails because installHook
function is not defined: TypeError: process.installHook is not a function
. So we should define it:
1 2 3 4 5 6 |
|
Upon running these tests, we get Error: Expected to be called
because we didn’t call this hook yet. The simplest way to make it pass is to just call the hook from the installHook
function:
1 2 3 4 5 6 |
|
While that will make the tests pass it is not the behavior that we are after. To drive out the correct behavior, we ought to check that the function is being called only after process.exit(..)
, not earlier. For that we will need to have a sanity-check assertion:
1 2 3 4 5 6 7 8 9 10 |
|
That fails as expected with the error Error: Expected not to be called
. To make it pass we need to store the function in the variable and call it from the process.exit(..)
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
All the tests pass now! Finally, we want to be able to uninstall the hook, so let’s write the test for it:
1 2 3 4 5 6 7 8 9 10 |
|
To make it work it is enough to introduce this function and set hook
variable back to null
in it:
1 2 3 4 5 6 |
|
And all the tests will pass. Now we, also, want to replace the default value for the options.process
option with the instance of SimpleProcess
object. And all the tests should work as they were working before:
1 2 3 4 5 6 7 8 9 10 |
|
Now, we can get back to our “verify all tests run” test. It still doesn’t fail as expected, so we need to install the hook, count all tests, count tests that had already run and compare them in the hook:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
At this point, this throws an error Expected all tests to run
and finishes the test fully without reaching our assertThrow(..)
assertion. That happens because we catch this error in the function runTest
, where we mark the test as failed, log the error and ignore the error object itself from there. One way to solve this problem is to have a particular error, that can propagate up the stack:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Now our current test is passing, and the next test is failing with the error Expected all tests to run
. That happens because we have not uninstalled the hook as soon as it has triggered. Let’s do that:
1 2 3 4 5 6 7 8 9 10 11 |
|
That makes the next test run, succeed and exit immediately after that with error code zero. Let’s see what will happen if we put verifyAllTestsRun: true
on the top test suite here:
1 2 3 |
|
That doesn’t work because we re-install different hook inside of this test and as soon as this test finishes, we uninstall it. So we have two ways out of this situation: allow multiple hooks, or move that single test to its own test suite file. I think the second options is much simpler. Also, we will add the test for the negative case, where all tests run correctly (when we provide proper process spy):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
And this new test suite passes as expected. Just to double-check that these tests verify anything, we can break them (change expected error message and change assertNotThrow
to assertThrow
) and see if there is a failure and if it looks as expected:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
And it fails as expected, which means that our refactored tests still work as they should.
We have just applied a neat technique here: whenever we do a major refactoring in tests, we need to make sure they are still functioning correctly. For that, we break every single one of them (by changing the assertion or breaking the production code). Then we see if they fail as we expect them to. When they don’t, we know that refactoring didn’t quite work.
Now we can go back to the RunTestSuiteTest
and see if it works as expected without that test. And it does: Error: Expected all tests to run
. To fix that we need to provide a process spy in every inner call to runTestSuite
. For that we will first extract {reporter: reporter}
as a common variable of the test suite:
1 2 3 4 5 6 7 |
|
And to make the error go away, we now can create a process spy and provide it through options:
1 2 3 4 5 |
|
If we run tests now, they all pass. And we can see that they all execute. Now we just need to double-check that all tests, that have inner calls to runTestSuite
have verifyAllTestsRun
option enabled. The only other test suite is the FailureTest
. Adding the option does not produce a failure because this test suite already uses process spy in all inner calls to runTestSuite
.
Today we learned that it is tricky to work with process.exit
or any function that can exit our program in the middle of the test. Such functions need to be mocked out completely inside of the tests. Also, we learned that it is possible to make sure we don’t forget to do that. That is quite important because, if we do forget, everything runs smoothly, and we don’t know that we made a mistake.
There is still a lot to go through. In a few next episodes we will:
See you reading the next exciting article of the series: “Formatting the Output”!
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on Twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>Today we are going to learn the basic principles behind the Test-Driven Development Discipline. We will learn three rules of TDD. We will learn what are the benefits of doing Test-Driven Development. And we will take a look at the example application of these laws.
Articles of these series have exercises and going through them would make for more effective learning. Also, if you feel stuck with these exercises, fell free to shoot me an email or get in touch on Twitter: @tdd_fellow.
“Learning Test-Driven Development with Javascript” is a series of articles and you, my dear reader, can shape the content by providing an invaluable feedback. To do that, shoot me an email - oleksii@tddfellow.com.
A test is a single program, procedure or function, that executes our system under the test and verifies that it works as expected. The system under the test is any other program, part of the program or library, procedure or function. The system under the test is called SUT for short.
SUT is the code that executes with a purpose of satisfying needs of our end users. Using the term “end user” we mean actual people using our system via the user interface (graphical and non-graphical), and other automated systems using our system via application programming interface (for short, API). A different term for the “end user” is “consumer of the system.” Another term for the SUT is “production code” - we are going to use the latter, as it is used in Test-Driven Development much more often.
Test-Driven Development is based on a simple concept of writing production code only when there is a failing test that demands that production code for it to pass.
We call a test failing when there is some error happening when we execute it. Possible failures are the following: there is a syntax error, or code does not compile, or there is a runtime failure during test setup or production code execution, or there is an assertion error. The assertion failure is the particular kind of error that happens during the final phase of the test, where we are verifying the outcomes after running the production code under the test. An assertion error signals that the production code executed successfully, but it produced incorrect results or changed the state of the system in a wrong fashion.
We call a test passing when there are no errors happen when we execute it. We, also, call the failing test “a red test” and we call the passing test “a green test.” That is because we can not deliver the code to our customers or consumers when there are “red” tests - think of red traffic light, and we are free to go when all the tests are “green” - think of green traffic light. Most of the testing tools and frameworks format their output to present failing tests in the red color and passing tests in the green color.
Multiple tests aiming to test single SUT or a particular feature of the system are usually called test suite. Depending on the context, test suite could mean that collection of tests all testing the same thing, or it could mean all tests of the entire system. For example, in the sentence “Let’s read User
class’ test suite” that phrase means a collection of tests testing class User
. On the other hand, in the sentence “Let’s run the whole test suite and see if we can deploy that right now” that phrase means all tests of the entire system. The latter, sometimes, is called “suite of tests.”
When the system behaves in an unexpected way, and the expected behavior was previously defined or present in the code, it is called a “bug.” For example, the behavior that is specified by the development team and not implemented correctly considered a bug. The behavior that is defined by the development team and implemented correctly, but now it is not working, considered a bug. And, finally, the behavior that was not defined may or may not be considered as a bug. The latter depends on the produced results - if it harms or brings any value. This phenomenon is called a “bug” for historical reasons: the first bug in computing was a real bug, that stuck in the computer’s hardware and was causing short circuits which made the computer misbehave.
In the core of Test-Driven Development there are three steps that we need to follow:
We repeat these steps over and over until we finish the implementation of the system under the test. Strictly following these steps will lock us in a very tight loop, where we will be switching between test code and production code all the time: write one or two lines of the test code and write or change one or two lines of the production code, repeat. This cycle is, probably, twenty or thirty seconds long. It, of course, depends on how fast we can run our tests. Ideally, we want our test suite for the current system under the test to run as fast as one clap of hands, or blink of an eye.
This tight cycle gives us following benefits:
The most important of these benefits is the confidence to make any change to the software and know in one minute or two, if that change is good to be delivered to the end user or not, with a simple push of the button. No quality assurance (QA) manual testing cycles are required. Let’s take a look at the example of the application of three rules of test-driven development. We will start from a very simple example, so that we don’t have to touch more advanced TDD techniques.
Given an integer number
When I call the system with that number
Then I receive a string representation of that number in an English languageFor example:
- for number
37
I receive"thirty-seven"
- for number
-17451
I receive"minus seventeen thousand four hundred fifty-one"
According to the first rule of TDD, we have to start the implementation from the test. In test-driven development it is important to start from the simplest tests, that can be implemented via a small, simple change to production code. For example: with number zero we expect the result to be “zero.” When writing the first test for the new functionality we are going to design its API. In our case we are going to come up with the function name and its argument list:
1 2 3 4 5 6 7 8 9 |
|
As soon as we write toEnglishNumber(number)
we have designed the function’s signature, at least, for the single simplest case. Also, the test suite is failing now, because toEnglishNumber
is not a function - in fact, it is undefined. That means that we have entered the red stage of test driven development and according to the second rule of TDD we have to switch back to the production code. And according to the third rule, we have to write just enough of it to make the failing test pass. That means writing the simplest and easiest code possible to make it succeed. In this case, we could just return nothing (null
in javascript):
1 2 3 |
|
This is going to turn our test suite back to the green stage. At this point it is a good idea to look at both test code and production code and see if there are any opportunities for refactoring, such as better names, extracting methods/functions, clarifying variable names, de-duplication, etc. Because we are currently in a green state, we can safely apply any refactoring, automated or manual, and see if it was successful by running the test again. If for some reason, the test is failing after the refactoring, we always have an option to CTRL/CMD+Z back to the green state, back to safety. At this point, we have finished applying one cycle of rules of TDD. Now we need to start over. We go back to our test code. There we extend the existing test to be more specific. Or we add more tests. Since we didn’t finish writing the test yet - we have only “arrange” and “act” parts, and we are missing the “assert” part of the test, - we ought to extend the current test to make it more specific. So what can we assert about the result of the toEnglishNumber
call? We probably ought to return the string “zero”, aren’t we? So, let’s make an appropriate assertion:
1 2 3 4 5 6 7 8 9 10 11 |
|
If we run our tests right now, they will fail. We should take a careful look at the failure and see if the failure message is readable and it is what we expected. In the current situation, we will receive an assertion failure, telling us that null
was not equal to “zero.” Imagine now, that we are working on some other feature and we are not in the context of the English number conversion feature. What would be our reaction, when we run this test and see this failure? - We will probably be confused for a moment and will resort to jump to the line in the source code of the test suite that produces that error and try to understand what happened there. That is a huge context switch, and it will disrupt our current flow. So we can do better and make the failure much more readable by just adding a small thing - a cue, what that null
was supposed to be: “english number”:
1
|
|
In Jasmine, the second argument to the toEqual
matching function is the description of the failure. I think our test looks great, and so is the failure in the test output. And because we are having a good failing test, according to the second rule, we don’t have to write any of test code. Now, we should make the test pass according to the third rule. And what would be the simplest way to make it work? - Return “zero”:
1 2 3 |
|
At this point, we should look out for the refactoring opportunities, and I don’t think there are any yet. So let’s start the cycle over. To apply the first rule, we might just copy the existing test and change the input and the expected output accordingly. In that case, we would want to test another simple one-digit number - “one”:
1 2 3 4 5 6 7 8 9 10 11 |
|
Now, because the test is failing, we apply the second rule and switch back to the production code. To make the test pass, according to the third rule, we will need to introduce the simplest change possible: in that case, an “if” statement (and we will use ===
instead of ==
for comparison to avoid implicit type conversions in javascript):
1 2 3 4 5 6 7 |
|
That is going to make all our tests pass. If we continue adding tests for the single-digit numbers while following these three rules, we will wind up with something like that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
At this point, we can see a clear pattern: one-to-one correspondence of a single-digit integer to the string. This code can be simplified as a pre-defined array of strings at the corresponding indices:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
We also include string “zero” in that array because return "zero"
is only happening when the number is equal to zero since we don’t have any other tests right now - only from zero to nine. And the function itself will look much simpler:
1 2 3 |
|
Since we are done with the refactoring, we should proceed using the first rule of test-driven development, which means we need to write a test that fails. So let’s increase the complexity of our tests and go for teen numbers. We’ll start from ten:
1 2 3 4 5 6 7 8 9 10 11 |
|
That will fail because so far our production code tries to fetch a string representation of the english number from the array using the number itself as an index. At the index ten, we don’t have anything, so our function returns nothing. According to the second rule of test-driven development, we need to switch to production code to make it pass. There are a few ways to fix the current problem: add specific if statement to the production code, or add a string “ten” to the array. Second seems to be simpler, and we know that we can do it for the teen numbers because they can not be composed of any other small parts. So, according to the third rule, we should go for it because it is a much simpler solution:
1 2 3 4 |
|
That makes our tests pass. And I think we have a refactoring opportunity: variable name singleDigitNumbers
does not make any sense anymore - it contains not only single digit numbers, but, also, number “ten,” which is a two-digit number. So what is common between single digit numbers and ten? They are simple numbers, i.e.: can not be composed out of other english number string representations. Let’s just call them simpleNumbers
in this case:
1 2 3 4 5 6 7 |
|
After making this refactoring, we need not forget to run the test suite to see if we didn’t make a mistake. When we run it, then all our tests pass. So we can go back to the first rule of test-driven development again. Going through this cycle again for a few times will produce tests for numbers from eleven to nineteen. The production code will only have those numbers’ english string representations added to the simpleNumbers
array:
1 2 3 4 5 6 7 8 9 10 11 |
|
Now it is time to introduce the concept of a complex number, such as twenty-three. It is a number that consists of the “tens” part and single-digit part. According to the first rule we have to start with the failing test, and I think we should just go for the number twenty-three:
1 2 3 4 5 6 7 8 9 10 11 |
|
If we run our test suite, that test will fail. According to the second rule of test-driven development, now we have to switch to the production code. According to the third rule, we will have to choose the simplest code that makes this test pass (and doesn’t break any other test). In our case, we have multiple options, which are similar in their simplicity. One of them is to return “twenty-three” if the number is greater than or equal to twenty:
1 2 3 4 5 6 7 |
|
If we run our test suite, all tests will pass. Now we should see if there are any opportunities for making the code more readable and easier to understand. While the whole if statement returning a constant might feel strange, there is a concept that we can already give a name to in there: “three.” We already can obtain a string “three” from number three using the method toEnglishNumber(number)
. Let’s try this refactoring:
1 2 3 4 5 6 7 |
|
That code now looks interesting. And it passes all its tests. Since, of course, we are not done yet with the implementation, according to the first rule of test-driven development, we ought to write another failing test. And we have a multitude of choices what it could be, we can just come up with other random two-digit number, such as forty-two, or we could leave the “twenty-” part in and change the “three” part to “seven,” for example. Also, we could change “twenty-” part to “thirty-.” Generally, in test-driven development it is better to go for the test, that will cause the smallest change to the production code, - later we will explore more on why that is. So, we could go for twenty-seven, as it will cause the smallest change to our production code:
1 2 3 4 5 6 7 8 9 10 11 |
|
This test is failing, as expected. According to the second rule, we have to switch to the production code and make it pass. Also, the simplest change (third rule) that we could do is to change “3” to the last digit of the number, the remainder of the division by ten - “number % 10”:
1 2 3 4 5 6 7 |
|
Now if we run our test suite all the tests will pass. “The remainder of division by ten” part looks right and “twenty-” constant still feels like it is not gonna work for every two-digit number. I think it is time to write a new failing test (first rule). That will be the test that will prove that “twenty-” is not correct code. In that case, we just need to change the first digit of the number so that we could go for forty-two:
1 2 3 4 5 6 7 8 9 10 11 |
|
As soon as we finish writing the assertion we will have a test failure: we are returning “twenty-two” instead of “forty-two.” So it is time to switch to the production code (second rule). And we need to write just enough of it to make this test pass (third rule). We can do that by having yet another if statement:
1 2 3 4 5 6 7 8 9 10 11 |
|
And this will make the test pass. It looks very similar to what we had with one-digit numbers, where we had an “if” statement checking that some value is equal to some number and returning an appropriate string. That is where we converted it to an array with string values, and in the function, we were fetching these strings by their index. To see if that pattern applies here we could write another similar test that will make us write another if statement:
1 2 3 4 5 6 7 8 9 10 11 |
|
And this fails as expected because our production code in no case can return “thirty-.” So let’s write the simplest if statement to make it pass. Also, to make it uniform, we would wrap “forty-” case in its appropriate if statement as a refactoring after we have a passing test suite:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Certainly, there is a fair bit of duplication right now: “number / 10” and “number % 10”. Let’s extract them as variables. Also, let’s extract “twenty”, “thirty”, “forty” and “toEnglishNumber(lastDigit)” parts as variables:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
Now, we could extract the function for conversion of the first digit to the first english part, such as twenty or thirty:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Now, it looks like the function convertTens
can be simplified through usage of array in the same way as we did before with toEnglishNumber
:
1 2 3 4 5 |
|
At this point, we can write more tests to cover all different first digits, for example fifty-seven, sixty-five, seventy-three, eighty-nine and ninety-one. To make them pass we will have to add corresponding “tens” number to our array:
1 2 3 4 5 6 |
|
So, that is how we apply three rules of test-driven development. Here is the full code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
|
Today we have learned a lot of concepts from testing and test-driven development. Also, we have learned the essence of TDD - three rules of TDD. We have learned how to apply these rules on a very simple example. We have touched on how beneficial test-driven development can be when applied well.
In the next article of the series, we will discuss what different kinds of tests exist, how and when to write them and how to apply TDD in these tests. Also, we will get back to our application and implement a new feature.
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>“Learning TDD with Javascript” is the series of articles where we learn basics of automated testing and test-driven development. While the language of choice for the code examples is Javascript, all described concepts are language-agnostic and are applicable in various technological stacks. In these articles, a reader is expected to do small exercises after each major topic to reinforce that theoretical knowledge with practice. Some of these exercises are practical and will involve coding, or simple writing; others will be food for thought. Also, a reader might want to get their feedback on these exercises, so don’t hesitate to send the results my way: oleksii@tddfellow.com - feedback on the practice is quite important as it helps to improve quicker, when you know what is well and what can be improved and how. Also, don’t hesitate to send any questions and feedback regarding the content of these articles. Your questions, feedback, and your practical results will help authors shape this content better.
Today we are going to learn how to write tests that imitate real user interaction for the whole application. We are going to build a small web application using Vanilla Javascript. Vanilla Javascript is a plain Javascript without any framework or library. Such tests that imitate real user interaction via User Interface (UI) are called End-to-End Tests. These tests are the most simple to write because we only need to think about our application in the same way user does:
We don’t need to think about specific implementation details, such as: which functions and classes do we have in our code and how they interact with each other, is there any interaction with the back-end server or 3rd-party API. Also, we don’t need to be proficient with interaction testing - this is the topic for the future series.
Of course, for that kind of simplicity we are trading something off. In the case of End-to-End tests, they are slower, suffer concurrency, wait, and timeout problems, and are harder to maintain in the long run. We don’t have to worry about that just yet because we want to learn how to write tests in general, and this kind of simplicity is perfect for us in this case.
Such simplicity stems from the fact that End-to-End tests, mostly, are direct translations of user stories (use case scenarios) into the UI manipulation code.
User stories are scenarios describing an individual feature of the software via user story context, sequences of user interactions and user expectations. User story context is the description of the situation the user and the software system are in at the beginning of the scenario. An example of the system context: “user John is registered in the system with password ‘welcome’.” An example of the user context: “user is at the login page.” User interaction is the description of a particular action user takes inside of the system, usually, doing something within the UI, for example: “User enters email ‘john@example.org’ in the email input field” or “User clicks on the submit button.” Finally, user expectation is the description of what particular information user should receive from the system, for example: “User sees the success message on the page” or “User receives the email with the verification code.”
User stories come in different flavor and formats. It can be a free-form text, describing three parts: context, interactions, and expectations; or it can be in a formal “Given-When-Then” form. “Given” part is the sequence of the user story context descriptions, “When” part is the sequence of the user interaction descriptions, and “Then” part is the sequence of user expectation descriptions. Both forms can be used interchangeably, and some software companies use one or another formal version of the user story format consistently.
Let’s take a look at the example of the free-form user story together with the example of the Given-When-Then user story for the same feature:
Free-form | Given-When-Then |
---|---|
User with email ‘john@example.org’ and password ‘welcome’ exists in the system John enters his email ‘john@example.org’ into the email field and his password ‘welcome’ into the password field on the login page After that John submits the login form Finally, John expects to see their profile page with the indication of him being logged in (name ‘John’ is present on the page) |
Given User with email ‘john@example.org’ and password ‘welcome’ exists And I am at the login page When I enter ‘john@example.org’ in the email field And I enter ‘welcome’ in the password field And I click on the submit button Then I see the profile page And I see my name as the title of the profile page |
As we can see, free-form can be very vague and is very flexible, and formal form is more strict and precise. The free-form on its own doesn’t have much upsides or downsides - it is as good as it is written. On the other hand, the formal form does give us some value and also trades something off for that value. They are generally easy to write and, because they are so specific, are easy to translate to the automated test. On the other hand, they may hamper creativity either while creating the user story or when implementing it.
It is important to mention that these use case scenarios are not full user stories or features. One feature can have multiple scenarios like that - together they are called acceptance criteria. When all such scenarios of the given feature work correctly, the feature is done. There is another vital part of the user story - general description, that should contain the rationale behind the story and the value for the user or any other important actor in the system, such as the stakeholder. Before we write scenarios, we often come up with the rationale like that and it drives us to write a scenario. For example, for the feature above we would have used something like that:
Free-form | Given-When-Then |
---|---|
John needs to authenticate to the system so that he can access his private content | As John, I want to be able to authenticate to the system So that I can access my private content |
Because formal form user stories are more precise, easier to write and simpler to translate to the automated test, we are going to use them to learn End-to-End testing in the context of Test-Driven Development.
Do you have questions? Or do you want to get quick feedback on how you did the exercises? - mail me: oleksii@tddfellow.com
Now we will be writing a simple web application, using Vanilla Javascript (ECMAScript 5), so that our setup is rather simple. For the testing, we will be using a standalone version of the Jasmine testing framework. Also, we will be writing a single page application (SPA), so that we don’t have to worry about rendering different pages in our tests for now. We are aiming for the following directory structure of our project:
1 2 3 4 5 6 7 8 9 |
|
First, create required directories: lib
, spec
and src
. Then download the latest standalone Jasmine release here: https://github.com/jasmine/jasmine/releases (jasmine-standalone-{version}.zip file). At the time of writing, the version is 2.5.2
. Unzip that file into your project directory. You should get the following files from it:
./lib/jasmine-2.5.2/
directory - contains all the resources required by Jasmine../SpecRunner.html
- example entry point to our test suite../src/Player.js
and ./src/Song.js
- example source files../spec/PlayerSpec.js
and ./spec/SpecHelper.js
- example automated Jasmine tests.Now, try to open SpecRunner.html
in your browser. It should run these example tests, and they should all pass. Here is how it should look like:
Also, create an empty index.html
:
1 2 3 4 5 6 7 8 9 10 |
|
And now we should have the desired project structure. So how does that Jasmine testing framework works, anyways?
Let’s take a look at the example test file to get the gist of how Jasmine works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
First important concept here is the describe("...", function () { ... })
. describe
function is used to describe a certain concept or a certain context. For example, describe("Player", ...)
means that we are going to define tests for Player
class or some other Player
concept. Also, describes can be nested to indicate that we are describing some specific context (describe("when a song has been paused", ...)
) or sub-concept of current concept, such as the method of currently described class (describe("#resume", ...)
). That is a good example, of what the unit test suite might be describing. In the case of End-to-End tests we would like to describe a full feature, so describe("Login Feature", ...)
is a good bet. The second argument for the describe
is the function that will contain all tests and sub-describes for the described concept. This function is the capturing closure, so defining variables and functions on the outer-level describe will make them available on the inner-level describes and the tests themselves.
Second important concept here is the it("...", function () { ... })
. it
function is used to create a test for the currently described concept, context or sub-concept. The first argument is the description of what it does, where it is the described concept. For example, given we describe a music player
and our context is when the volume is at max
, then we might write the test it("is deafening", ...)
. In the case of End-to-End tests we are describing a feature, so it will refer to our application, or application’s user interface, for example: it("shows user's nickname", ...)
. The second argument to the it
is the function with the test itself. Here we will setup the stage for the test, call our main code and verify that everything happened as we expect.
Finally, the third important concept is the expect(...).to...
. That is Jasmine’s form of assertion. That is where we verify that our code worked as we expect it to. As an argument to expect we provide an actual value. The actual value is something that our code has returned as a result of the function or method execution or something that we have read from the UI using UI manipulation code, or something that we have read from some 3rd party service, such as our back-end server, 3rd-party API or database. Essentially, this is the value that we are verifying to be correct. The second part is the Jasmine matcher - the method defined on the object, that is returned from expect(value)
call, that allows us to define what we want to assert about that value. The most used one is toEqual(...)
, which asserts that the value was equal to some expected value.
These three concepts should be enough to start writing tests. Don’t worry about everything else that you see in this example file from standalone Jasmine distribution. We will discover some of these concepts as we go. Now, let’s remove src/Player.js
, src/Song.js
, spec/PlayerSpec.js
and spec/SpecHelper.js
, and run our test suite - it is enough to reload the page to re-run our test suite. Test run should report that No specs found
:
To get the gist of how describe("...", function () { ... })
, it("...", function () { ... })
and expect(...).toEqual(...)
works, let’s write our first failing test. Also, that will let us see if the testing framework is configured correctly and is capable of showing us the test failure. Let’s create a new file called spec/JasmineWorksSpec.js
:
1 2 3 4 5 6 7 |
|
And we need to add this test file to the SpecRunner.html
:
1 2 3 4 |
|
And if we run our test suite, we should see a failure:
1 2 |
|
And we ought to make it pass by fixing our incorrect assertion: expect(2 + 2).toEqual(4);
. And if we run the test suite again, by reloading the page in the browser, all the test should pass.
Now we can write some real tests to practice usage of describe
, it
and expect(...).toEqual
: let’s create ArithmeticsSpec.js
and write some tests for behavior of add
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Don’t forget to add the <script src="spec/ArithmeticsSpec.js"></script>
to the SpecRunner.html
. This test will fail first because Arithmetics
module is not defined. We will define it as an empty object. Next failure is because Arithmetics.add
is not a function - it is undefined. We will define that function with two arguments inside of the Arithmetics
object. Finally, the test will fail, because we expect the result to be seven, but it was undefined. We will make the simplest thing we can do to pass the failing test - return seven. That will make the test pass. The code will be in src/Arithmetics.js
, which we include in our SpecRunner.html
, and will look like that:
1 2 3 4 5 6 7 |
|
That, of course, is not correct implementation, so we need another test to drive out the proper implementation - test with different inputs and a different result. To make it pass we will have to use a + b
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Have you noticed three comments that I have left in the example tests’ code: ARRANGE
, ACT
and ASSERT
? These are three “A”s of writing a good test. Arrange is the part of the test, where we set up the stage: prepare input data, create objects, load resources, change the state of the system - it is the part where we create the context for our test. Act is the part of the test, where we call our system under the test. In the Arithmetics
example it was a function Arithmetics.add(a, b)
. The system under the test can return some useful value or change its state. To verify that either is correct we, finally, use Assert section of our test - part of the test where we verify the outcome of the call to the system under the test.
The sections ARRANGE
, ACT
and ASSERT
are spelled out in the comments only for the reader’s convenience - usually, real projects don’t have such comments. It is worth noting that for learning and practicing purposes it is a good idea to name these sections explicitly in our tests - this develops a habit of recognizing which part of the test should belong to which section. Also, the formula Act -> Arrange -> Assert makes it easier to come up with the test when we lack deep experience in testing, or with the particular testing framework, or an environment.
Going a bit back to our user story scenarios: have you noticed the connection between “Arrange, Act and Assert” and “Given-When-Then”? Arrange part of the test corresponds to the Given part of the scenario, Act part of the test corresponds to the When part of the scenario, and Assert part of the test corresponds to the Then part of the scenario. Let’s see it on of our previous example scenarios:
Test Section | Scenario Step |
---|---|
ARRANGE | Given User with email ‘john@example.org’ and password ‘welcome’ exists And I am at the login page |
ACT | When I enter ‘john@example.org’ in the email field And I enter ‘welcome’ in the password field And I click on the submit button |
ASSERT | Then I see the profile page And I see my name as the title of the profile page |
Arithmetics
module: subtract
, multiply
and divide
.Now that we know approximately, how to arrange the steps of our scenario into the test, let’s give it a shot. We will start by creating a new test file for our login feature called LoginFeatureSpec.js
. Don’t forget to put an appropriate script tag in the SpecRunner.html
. We will start by writing the skeleton of our test suite: describe
and it
inside of it. Next, we will put scenario steps as comments to our test, and we will split them into three sections: Arrange, Act, and Assert. It will look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Next step would be to change the first Given comment to the function call. We should give a good readable name to that function. One straightforward option would be givenUserExists(email, password).
Another good option is addUser({email: email, password: password})
. While they are not that different, I prefer addUser
for its higher conceptual flexibility - we will likely need that function in some different scenario step in the future. While I prefer that, we should not do that yet, because we might never need the function like that, and givenUserExists
will do us more good right now since it resembles the scenario step so much. When we need this flexibility, we’ll perform a refactoring. So for now, let’s create an empty function with that name in a new file spec/FeatureSteps.js
and load this file from our SpecRunner.html
.
This empty function, on its own, doesn’t do us much good because all our tests will pass. If we continue replacing our steps in comments with such functions, we will end up with one big failure at the Assert section, and we will have to write a lot of code at once to fix that. To drive ourselves to implement the function givenUserExists
properly right now, we should write an assertion right after the call. This assertion is not part of Assert section - it is a part of test-driving the functionality of our feature steps. A good assertion here will be to ask our user storage mechanism if such user exists right after we created that user. Also, it would be a good idea to check that user does not exist, before we created it. Also, we will extract variables email
and password
because we have to repeat them all over the place already. Let’s see how it will look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
When we run our tests, the breadcrumbs of test failures will drive us to create this basic functionality. First, we will create Users.js
with empty Users
module and import it from SpecRunner.html
. Then, next failure will drive us to add method exists(email, password)
on Users
module, that will always return false
. Next, make function givenUserExists(email, password)
call Users.add(email, password)
, which in turn will make us create a function Users.add(email, password)
, that will store email-password pair to the list of users in the memory. And, finally, make Users.exists
to search for the email-password pair in that in-memory list. And finally, our test will pass. Let’s take a look at how these steps will look in our code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
It is funny, how simple Users.exists(email, password)
function is. It verifies that we have at least one user. While this is not a correct code, this is good enough for our current test. As we know that this code is not entirely correct, we need to remember to write the test(s) to prove it incorrect, so that we can make it proper with confidence. Since we want first to finish the current test, we should add a to-do list item to write such test. We have two edge cases here that will not work with our implementation: when we have only one user in the system and the credentials we provided do not match and when we have multiple users in the system, and we provide correct credentials for the second one:
1 2 3 4 5 6 7 8 9 10 11 |
|
Have you noticed xdescribe
? It is a different form of the describe
, that allows us to mark the whole context as pending. It won’t run the tests inside, and it will mark them as pending in the test run report. That is the ultimate way to maintain a Test-Driven TODO List. As we test-drive our code, we will find more of these. In the report they look like this:
Let’s finish continue implementing our steps. The comment And I am at the login page
transparently becomes a function givenIAmAtTheLoginPage()
. As we already have seen, it doesn’t do us any good just to replace the comment with a function call that does nothing - so we should surround it with proper assertions. We know that login page should have some text input for email and another password field for password, and we will need a button to confirm user’s intent to log in. Also, because we are developing a single page application, we would need some container for the currently active page. Let’s say we need these things:
id="page"
and no content. We probably ought to define it in our HTML file.id="email"
,id="password"
,id="do_login"
.Now that we have spelt this out, it is fairly straightforward to write assertions surrounding the givenIAmAtTheLoginPage()
call:
1 2 3 4 5 |
|
That fails because we don’t have such element in our HTML. We need to create it both in our SpecRunner.html
. Also, now it will be important to move all our <script>
tags from the <head>
to the <body>
below the container that we have just created. SpecRunner.html
should look this way after that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
Now, our next failure is that givenIAmAtTheLoginPage()
function is not defined. We can define it as empty function in our spec/FeatureSteps.js
file for now. This will turn our tests green again. We still haven’t made our assertions about the state of the UI after the call to givenIAmAtTheLoginPage
- let’s do this now:
1 2 3 |
|
That fails with the error Cannot read property tagName of null
, which means that we don’t have #email
element inside of the #page
container. Simplest thing to do would be to add that element to the #page
container in our SpecRunner.html
. And it won’t work! Because we have an assertion that verifies, that before calling to the givenIAmAtTheLoginPage
we do not have anything in the #page
container. Now we have to do something useful in the function givenIAmAtTheLoginPage
. For example, we can call LoginPage.render()
. Which does not exist yet and we will need to create it in the file src/LoginPage.js
and load it from our SpecRunner.html
. To fix current failure we will need to create a #email
element there and append it to our #page
container:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
That makes the current test failure go away, but we have two more: Expected 'DIV' to equal 'input'.
and Expected undefined to equal 'email'.
. To make these pass, we would need to change document.createElement(...)
call to use input
tag name and also we will need to set the input type to email
. And as we see, the tag name stored in emailInput.tagName
is all-caps, so we will have to fix our assertion also to expect that:
1 2 3 4 5 6 7 8 9 |
|
And if we run our tests, they all pass. Great! We should now do the same for our password input field and the login button:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
And all tests pass again. Let’s take another look at how our test looks like. It is quite complicated, and it has so much stuff, that is hugely detailed and precise, that it is not possible to see a user story scenario there anymore. One possible solution to that problem is to push the assertions that are related to the feature scenario step to the respective step functions. Now it looks much better. We should use the same concept for all our further steps. After refactoring test code looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
Now, let’s follow the same pattern for our Act section. First, we will deal with When I enter ‘john@example.org’ in the email field
. This seems to be a call to a function whenIEnterInTheField("#email", email)
- we will implement it using Javascript’s APIs. We will do the same for the And I enter ‘welcome’ in the password field
, which will use the same function whenIEnterInTheField("#password", password).
Finally, we will implement whenIClickOn("#do_login")
as a replacement for the comment And I click on the submit button
. We will also sprinkle assertions inside of the steps to make sure that we are using the Javascript APIs correctly. The code will look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
Now comes the most interesting part of writing this feature test - Assert section. So far, Arrange and Act sections were driving us to create an infrastructure-like code of our application. Now, with Assert section we will have to implement more of our domain logic. Let’s start with the Then I see the profile page
. Let’s try to figure our what that could mean:
#page
container.#page
container should somehow indicate that we are on the Profile
page:
id="profile_page"
to it.Interesting, so far, we didn’t have a concept of the Name of the User. I guess it is time to create one in our arrange block, with all the changes and additional assertions in our feature steps that we have to do:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
And the tests will pass again. And now we have a tested concept of the name in our code. Tested - to the extent required for this test, where we have only one user in the system. Now we can replace the comment Then I see the profile page
with the scenario step function call thenISeeTheProfilePage()
. The implementation of it will verify that #email
, #password
and #do_login
are no longer present on the page and it will verify that container #page-profile
is present and it has #title
element in it. Making it pass will require us to add a click
event listener to the loginButton
in the LoginPage
, that will remove contents of the #page
container and will call to ProfilePage.render()
following the analogy of LoginPage
. That drives us to create this module and its render()
function. According to our next test failure, this function should create #profile_page
sub-container in #page
container, so we do that. Finally, we have one last failure, that drives us to create #title
element in ProfilePage.render()
function. And all tests are green again. Let’s take a look at these changes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
At last, we can implement the last step - And I see my name as the title of the profile page
. A good guess for the step name would be thenISeeTextAt("#title", name)
. The function will simply select element #title
and verify its element.textContent
. As expected, it fails with the error Expected '' to equal 'John Smith'
and we should fix that within the ProfilePage.render()
function by assigning title.textContent
to the Users.currentUser().name
. This fails because we didn’t define Users.currentUser()
function and this is simple to do for our current test - just return the first user. After that all our tests pass. The code will look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
We have finally, implemented our first feature test. That was quite some work. Also, the functionality of Users
module is way incomplete - we need to write more tests to cover different cases. That is how our test code and production code looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Now we could add <div id="page">
to our index.html
, and load our src/*
scripts after it, and add <script>Users.add("john@example.org", "welcome", "John Smith"); LoginPage.render();</script>
at the end to start our application. Enjoy the application that implements one happy path of our feature (we still have quite a few different paths to cover). Also, it might be a good idea to style the application slightly better, than plain inputs and buttons, but we are not going to cover that in these series. This single feature test takes a lot of time to write because it is the first feature test in the empty application. Essentially, it has driven a lot of different architectural decisions, which don’t necessary need to be done the way they were done in this article. Writing a second test for the same feature is much easier and third one, and all consecutive are also easier.
x
prefix from the xdescribe
tests and implement them using techniques described in this article - write a user story scenario, translate it to the test code and make sure to fix all the test failures one feature step at a time.Today we have learned how to write tests for the whole application that imitate real user interaction. We have seen how, test-driving the functionality can help discovering new user story scenarios. Essentially, every time we write simple, but not so correct code that makes our test pass, we need to think what would be the scenario to prove that code wrong and add that scenario on our to-do list, which is represented by pending tests, which have only their descriptions.
In the next article of the series, we will dig deeper on what exactly we did today, what it means to Test-Drive the code, what are the laws, rules, tips and tricks of Test-Driven Development. Next article of the series assumes, that login and sign-up features are fully test-driven and implemented. We will be implementing a brand new feature of our application.
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>Would you like to deploy a bug fix? - Half a week. Unless it makes your application crash all the time. Then you can get it going in half a day or one day. Which is still pretty slow.
Would you like to make a canary release to 1% of your users and test an assumption quickly using your analytics tools? - Forget it. Waiting for three days to get results on your assumption negates the benefits of the Canary release. You should be getting your feedback in minutes, not days!
At such disadvantage deploy takes half a week and makes software development teams switch to much more defensive mode:
That is Waterfall. Right there.
In that environment, how could we get our quick feedback back? - What if we had the ability to send logic in the form of Domain-Specific Language (DSL) from our server, that we can deploy to whenever we wish to? What if our mobile application could have interpreted this DSL and could have updated itself every time user starts the application while connected to the Internet?
We would be able to get a quick feedback! It would make fast deploys possible and also it would allow us to do canary releases, which can enable us to be LEAN again - To be Agile again.
This approach has a few problems:
If you take the idea of such DSl and the interpreter to the extreme, you will wound up with the programming language. There is already such programming language and platform that has the same characteristics - Javascript + React Native. Nowadays, application stores allow to download javascript code updates from your server, but you have to deploy all native bindings via application store with a manual review; also, one can not change the essence of their application.
Why would you want to go with your DSL instead of React Native?:
DSL might be as simple as Abstract Syntax Tree represented in JSON. Let’s imagine that we have an application, where users can buy some items and now we want to contact the recommendation service and present them a NEW view with the list of recommendations. Normally, you would have to do a full development and full deployment via application store. With DSL you might end up just writing some JSON:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Of course, parser and interpreter for this JSON, and actions (such as: request
and render
) are written in the native code and, therefore, have to be deployed via application store every time you change them or add a new capability. One can implement views in the native code, or represent them in DSL (aka simplified HTML + CSS).
As an industry, we understand why different application store vendors want to review every release of every application:
Nevertheless, we need to reject manual review as a bad practice and aim for fully automated deployments; that deliver our new application releases to users in minutes. As an industry, we need to push companies running application stores to improve their process to enable us to do that.
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>For this article, we will need to define what Legacy Code means.
Legacy code is challenging to understand when reading. Such code has no or close to no tests. Also, any legacy code brings the value to the business and customers.
Let’s give an outline of what we will be going through today:
Shall we get the ball rolling?
“Knowledge in Production Code” is any small bit of functionality that represents any part of the business rule or underlying infrastructure rule. Example bits of knowledge in production code:
a_variable = ...
,if
statement: if ... end
,if
condition: if has_certain_property()
,if
body: if ... do_something_interesting end
,else
clause: if ... else ... end
,else
body: if ... else do_something_different end
,a_function(arguments)
, receiver.a_method(arguments)
,42
,if ... return 42 end
,...each do |x| ... end
,list.each do ...
,...each do |x| do_something_with(x) end
,I think the idea “Knowledge in Production Code” should be more or less precise. More interesting is what we can do with knowledge in our system: we can re-organize knowledge differently keeping all the behaviors of the system - everyone calls this Refactoring nowadays; or do the opposite: change bits of knowledge without modifying the structure of the code - we will call one such change a Mutation:
Mutation - granular change of the knowledge in the system that changes the behavior of the application. Let’s take a look at the simple example:
1 2 3 4 5 |
|
This code is maybe a part of some cell organism simulation (like Game Of Life or similar). Let’s see which different mutations can be applied here:
if
condition always to be true
: if true ...
,if
condition always to be false
: if false ...
,if
condition: if !cell_is_alive
,if
body: # do_this
,else
body: # do_some_other_thing
.With that done, let’s take a look at how production code and its test suite relate to each other.
So, how does the test suite affect production code? First, it makes sure the production code is correct. Also, good test suite enables quick and ruthless refactoring by eliminating (or minimizing) the risks of breaking it. Well-crafted test suite gives us the power and courage to introduce changes. Also, test code always couples to the production code it is testing in one way or another.
Okay, how does the production system affect its test suite? As tests couple to the production code they test, the changes in production system may cause ripple effects on its test suite. Practically speaking, a mutation should always lead to a test failure if the test suite is good enough because its test suite should verify every tiny bit of knowledge in the production code (except, maybe, some configuration).
Such knowledge change is an act of assertion about the presence of the test. When information is covered by test suite well, there should be a test failure. If, after the introduction of the mutation, there is no test failure, this is a failed assertion about test presence or correctness. So one might say:
Knowledge Change is a Test for the Test
That is a fascinating idea since it implies we can use production code can as a test suite for its test suite, which may enable TDD-like iterative development of the test suite that does not exist.
So far, we have covered the idea of knowledge in the production code, explored ways of modifying this information in a way that changes the behavior - we call it a mutation, and also we explored the mirror-like relation between production code and its test suite. We have still much ground to cover, let’s dive in:
There is a few well-known test coverage metrics that are being used quite often by software engineering teams, such as:
There is another one, called Path coverage - it is about coverage of all possible code paths in the system, which quickly becomes impractical as the application size grows because of the exponential growth of the amount of these different code paths.
Line coverage and Branch coverage (also, path coverage) all share one major problem - covered line/branch/path does not mean test suite verifies it - only executes it. Great example: remove all the assertions from your tests and the coverage metric will stay the same.
So, what if we could introduce all possible and sane mutations to our code and count how much of them cause test failure? - We will get the knowledge coverage metric. Another name for it is Test Semantic Stability, and it can range from 0% to 100%. Even 100% line/path coverage can easily yield 0% Test Semantic Stability. This metric proves that code is, indeed well-tested and verified (although, it does not say anything about tests’ design and cleanliness): make one assertion incorrect, or not precise enough and the metric will go down by a few mutations.
That makes Test Semantic Stability the most useful coverage metric.
So, how do we check if our test(s) cover well some bit of knowledge in the system? We break it! - Introduce a tiny granular breaking change to that bit of knowledge. The test suite should fail. If it does not - information is not covered well enough. That leads us to the technique that allows us to keep Semantic Test Stability up high:
Let’s see it in action:
1 2 3 4 5 |
|
First, we need to narrow our scope to a single bit of knowledge. For example, the if
condition: if cell_is_alive
. Then we need to introduce the mutation if true,
and we need to make sure that there is a test failure. Let’s run the test suite:
1 2 3 4 5 |
|
Oh no! It did not fail anywhere! That means that we have a “failing test” for our test suite. In this case, we need to add the test for the negative case:
1 2 |
|
When we run the test suite:
1 2 3 4 5 |
|
It fails! Great - that means that our test for the test suite is passing now. As the last step of this mutational testing iteration we have to return the code to its original state:
1 2 3 4 5 |
|
After doing this, our tests should pass!:
1 2 3 4 5 |
|
They do. That concludes one iteration of the mutational testing. Usually, to accomplish any useful behavior we would like to combine many bits of knowledge. If we want to understand better how the system works, we need to focus on groups of bits of knowledge. This is what Explorative TDD technique is about:
The technique used to increase our understanding of the Legacy Code while enhancing its Test Semantic Stability (the most useful coverage metric). The process roughly looks like that:
At this point, a nice example would help understand that technique:
Let’s imagine that we have some legacy system, that is a social network and allows for users to receive notifications on things that happened. You need to change slightly what “Followed” notification means. The code looks like this, and it does not have any tests:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
The first step is to isolate this code and make it testable. For this we need to find a low-risk way to refactor all dependencies that this code has:
Database.where
,StatusUpdate.find
,User.find
, andAnalytics.tag
.We can promote these things to the following roles:
Database.where
=> @table_reader.where
,StatusUpdate.find
=> @status_update_finder.where
,User.find
=> @user_finder.find
, andAnalytics.tag
=> @event_tagger.tag
.We should be able to have these default to their original values and also allow to substitute different implementation from the test. Also, it is helpful to pull out this method into the clean environment, where accessing a dependency, without us substituting it - is not possible, for example in a separate code-base, so that we can write a test “it works” and see what fails. The first failure is, of course, all our referenced classes are missing. Let’s define all of them without any implementation and make them fail at runtime if we ever call them from our testing environment:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
In our tests, we need to implement our substitutes. For now, they all should be just simple double/stubs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Then, we should write the simplest test, that sets up the stage and substitutes all the collaborators and runs the function under the test (no assertion, we are just verifying that we indeed replaced everything right):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Since we have not defined all the with_*
methods yet, let’s define them now and also define getters for particular instance variables (properties):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
If we run our test, it should fail with RuntimeError: Database:nope
in here:
1 2 3 |
|
To fix that, we will need to replace Database
with table_reader
getter. That will correct the current error, and we will get the next one: RuntimeError User:nope
. Following all these failures and replacing direct dependencies with getters we will finally get a Green Bar (passing the test). Our function under the test will look like that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
Structure and logic of the function did not change at all, but now all the dependencies are injectable and can be used to test it nicely. That concludes the first step - narrow & isolate. Now it is time to select a group of knowledge bits that we would like to cover with tests. Since we want to change how followed_notification
is behaving, we might as well start checking there.
The group of knowledge bits that are related to followed_notification
looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Now we want to write a test. At the first thought, something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
This test fails right away - we don’t have any notifications. This is strange. Let’s take a closer look on the filtering that we are doing:
1
|
|
I believe, we have satisfied the first part of this condition, but not the second one. The user id is not the same as the 3rd element of this row. Let’s make them same:
1 2 3 4 |
|
This fails again! This code just keeps proving our assumptions wrong. I think we need to take a careful look at that it.to_s
. .to_s
is a conversion to string, so the foreign key is stored as a string (who could have thought?). Let’s try to make it work:
1 2 3 4 |
|
If we run our tests, they pass! Great, now we know that this function is capable of obtaining some followed notifications. Of course, our coverage right now is super small. Let’s apply mutational testing to it. We should start from the condition:
1
|
|
First, let’s replace the whole thing with false
:
1
|
|
The test fails - mutant does not survive - our tests are covering for this mutation. Let’s try another one: replace the whole thing with true
:
1
|
|
Our tests pass - mutant survives - this is a failing test for our tests. In this case, it is reasonable to write a new test for a case, when the full filtering expression should yield false
- when we have notifications of an invalid kind:
1 2 3 4 5 6 7 8 9 |
|
As a result, we should not get any notifications. After running, we see that our test fail. Great! This mutant no longer survives. Let’s see if our tests will pass when we undo the mutation:
1
|
|
And they all pass! Next mutation is inverting the whole condition:
1
|
|
All our tests are RED. Which means that this mutant does not survive and the test for our test is green. Now, we should dig deeper into the parts of the condition itself:
x[1][0] == "followed_notification"
: replacing with true
, false
, and inverting it; also, changing numeric and string constants; These all changes did not produce any surviving mutants, so we do not need to introduce new tests.x[1][2] == id.to_s
: replacing with true
, false
and inverting it; also, changing numeric constants.Replacing x[1][2] == id.to_s
with true
, apparently, leaves all our tests passing - a mutant that survives - a failing test for our test suite. It is time to add this test - when we have notifications of some different user:
1 2 3 4 5 6 7 8 9 10 |
|
As you can see, having a record with the different user id (in this case, even nonsensical user id) makes our test fail, which means that this mutant no longer survives. Let’s see if undoing the mutation will turn our tests GREEN:
1
|
|
All our tests pass again. I think we have finished testing the condition in the filter. I would not touch the conditions that are related to different kinds of notifications, as we want to introduce changes only to “Followed” notifications. So we can dig further into the logic of our group of knowledge bits:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
So, we can see that we split the row into its id
and all the other values of the notification record. Apparently, the first value is responsible for the kind, where we are switching on it to construct correct object (in this case just a lump of data - hash map). So let’s try to mutate the numeric constant in kind = values[0]
:
1 2 |
|
All our tests still pass. That is a failing test for our test suite. We ought to write a new test now. Where we should verify that it constructs correct lumps of data:
1 2 3 4 5 6 7 8 9 |
|
This test fails, because our user.notifications[0]
Is nil
, because none of if
or elsif
matched the kind
variable and in Ruby, by default any function returns a nil
value. This failing test means that we no longer have surviving mutant and let’s see if undoing that mutation will make our tests pass:
1 2 |
|
It does, all our tests are green now. We should continue like this until we understand code enough and have enough confidence in our tests so that we can make our desired change to the system. When we think we have finished, we should integrate isolated code back to the legacy system, leaving all the fakes and injection capabilities in place. We were separating this code only to make sure, that we are not calling any dependencies on accident (while they just work silently). While integrating it back we, of course, get rid of fail "NAME:nope"
implementations of collaborators. With such approach, integrating the code back should be as simple as copy-pasting the test suite code and production code (function under the test, and injecting facilities) without copying always-failing collaborators.
We will have to wrap up the example, and if you, my reader, would like to continue applying Explorative TDD to this code, you can find the code here: https://github.com/waterlink/explorative-tdd-blog-post (specifically, spec/user_spec.rb
). The function originates from this example project: https://github.com/waterlink/lemon
The answer is yes! I use Explorative TDD (as well as mutational testing) in following cases:
Today we have learned about concepts like “Knowledge in production code” and “Mutation.” Also, we learned what Test Semantic Stability is the best code coverage metric. We have seen Mutational Testing and Explorative TDD techniques at work. We could start applying these techniques (after some practice) to stop fearing the legacy code and just handle it as some tedious routine operation.
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>This article is the fifth one of the series “Build Your Own Testing Framework” so make sure to stick around for next parts! Find all posts of these series can here.
Shall we get started?
Our test suite should no longer bubble up any exceptions. We can achieve that by making an appropriate assertion. And also we should verify that other tests execute after the failure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
As expected, this fails with an appropriate error Error: Expected not to throw error, but thrown 'Expected to be true, but got false'
indicating that we are bubbling up all errors at the moment. Also, notice how the execution of the whole test suite stops at that point, and it just exits the program with error code 1
. A simple try .. catch
block will fix the issue:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
All tests now run successfully. This code is starting to become unreadable, so it is a good point to refactor. We will:
try .. catch
as a function runTest
. Its current responsibility is only to run the test and ignore any failure;if
statement that matches the test name as a function handleTest
. Its responsibility is to report the test, create a fresh testSuite and kick off runTest
;for
statement as runAllTests
.Here is the final snippet of code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Now, when at least one test fails in a suite of tests, the whole suite should fail (after running the rest of its tests). And the indicator of such failure should be an exit code of the process. Let’s write a test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
As you might guess, we will need another object. It will be responsible for interaction with our process, i.e.: something that we can ask to “exit with code 1.” Because we can not ask our process to exit within the test run, we will have to create a spy. And we shall test-drive its functionality. There is something interesting that we should worry about before that - our test suite is passing currently.. but it shouldn’t be!
Let’s step back and think what just happened: clearly, we are writing the test, that can not possibly pass because we do not have ProcessSpy
yet. So we are expecting a failure - we are expecting a thrown exception. That expectation is an important part of test-driven development: at all times we expect a very specific failure or we expect our tests to pass; if we do not receive a failure when expected and receive an unexpected failure, we should stop right there and think which part of our thinking and our assumptions is incorrect.
Right now, tests do not fail, because we are ignoring all exceptions in our try .. catch
that we introduced a couple of minutes ago. If we want to see failures again, let’s modify catch
block to just log all errors it receives:
1 2 3 4 5 6 7 |
|
Now our test suite outputs an expected error: ReferenceError: ProcessSpy is not defined
. Also, it outputs some other failures that happen in our nested runTestSuite
calls - we should fix them by providing silenceFailures
option for nested runTestSuite
call. We can focus now on the ProcessSpy
failure and test-drive it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
I think we have finished test-driving the functionality of ProcessSpy
. It is time to get back to our failing test for a failure resulting in an exit with code 1. When we run this test suite, we are getting the following error message: Error: Expected to equal 1, but got: null
.‘ To pass this test, we will need to store the fact that we had a failure somewhere and at the end of the test suite run we can trigger exit with code 1 or 0, respectively. We could pass around a status
object with boolean property status.failed
and set it to true
in our catch
block:
1 2 3 4 |
|
And at the end of runTestSuite
function we could call process.exit(1)
if status.failed
was true
:
1 2 3 4 5 6 7 |
|
While this works (as in “tests pass after providing fakeProcess
where needed for nested failing runTestSuite
calls”) state changes in this code are starting to be hard to follow and function signatures remind me of some horror movie:
1 2 3 4 |
|
These signatures smell like objects are hiding there in these functions. Let’s find them!
First, let’s extract the method object from the function runTestSuite
. We will give it a name TestSuiteRunContext
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
Now, if we were to move function runAllTests
inside of this class, we would not need all these arguments (and all other functions we call):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
It already looks very nice. The only thing that I do not like about this object yet is that it has stateful properties and stateless properties. I like to have my objects separated by this concern. Let’s extract status
mutable property as a proper TestSuiteRunStatus
object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
I think we have finished the refactoring. Now we should verify that test suite exits with the code 0 when everything passes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
I think we have finished implementing exit code reporting. The code can be found here: https://github.com/waterlink/BuildYourOwnTestingFrameworkPart5
There is still a lot to go through. In a few next episodes we will:
Stay tuned!
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on Twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>In Baby-Steps TDD the basic strategy is to get to the green state ASAP. If you can pass all tests with return 42
, you should! While the benefit of the approach is not directly obvious exploring the alternative shows its value. One possible alternative is to write a bunch of tests for the software and then make them all pass. This results in a lot of changes made to the software under the test while tests are failing (are in red state). This provides very slow feedback and high risks because with every decision in the code complexity grows exponentially and problems are hard to find when you only know that software worked 1 hour ago and there is one mistake in a whole 1 hour worth of work. The same effect can be seen if the most complex test is written first so that it forces the engineer to implement the whole solution or big part of it in one go.
Baby-Steps TDD mitigates the issue by ensuring everything worked one or two minutes ago. At least to the extent the software is specified by currently written tests. So if something does not work as expected it is most probably a mistake in these last 2-3 lines of code that we have written. And we can even discard them entirely with “undo” command and start over from the green state without losing much work and saving a whole lot of time debugging.
Baby-Steps TDD provides faster feedback and fewer risks for the cost of a bit more overall effort. Let’s take a look into the triangulation technique:
In essence, Triangulation Technique takes ideas of Baby-Steps TDD further and reduces step size even further. For example, when usually with Baby-Steps TDD you would need one test to introduce the correct if
statement, with Triangulation Technique and Baby-Steps TDD combined you would use multiple tests for this:
return CONSTANT
statement to pass:1
|
|
CONSTANT
to some sort of calculation (variable, formula or function call):1
|
|
if (argument == SPECIFIC_VALUE)
with another return ANOTHER_CONSTANT
:1 2 3 4 |
|
ANOTHER_CONSTANT
to some sort of calculation (variable, formula or function call):1 2 3 4 |
|
1 2 3 4 |
|
In normal Baby-Steps TDD that would probably have been only 2 or 3 test cases. With Triangulation it is 5 and to make every one of them pass requires a simple transformation of the production code.
Now let’s see why these techniques combined make me more productive.
Did you know that every decision you make costs you some willpower? For example: choosing what to wear in the morning, refusing to eat this tasty cake or to make a design choice in your code. This phenomenon is known as Ego Depletion (see in wiki) and it has an experimental evidence. According to this phenomenon self-control and willpower both draw upon a limited pool of mental resources and it can be used up. Usually, these resources are recovered greatly during good night’s sleep or slightly after consuming food. Cost per each made decision differs also and even for the same kind of decision can depend on various factors:
Baby-Steps TDD combined with Triangulation technique optimize for “perceived complexity of the problem” so that every decision is nearly obvious to make and the effort required to make it and execute on it is stupidly small. While this increases the amount of decisions that I need to make it also decreases the complexity and willpower cost of each decision to the point, where after completing the same amount of work I still have plenty of energy and willpower to make any other decisions at work and outside of it.
It is worth noting that these techniques need to be practiced quite a bit to enable this effect - with such incremental design it is important to avoid getting stuck. Definitely, try it out and see if it works for you!
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
Thanks to David Völkel for the great presentation about Baby-Steps TDD. Slides can be found here.
Thanks to Stephen Guise for the great book “Mini Habits” that has opened my eyes to the reasons why I like these techniques so much and why I love designing software in tiny increments.
]]>Given high score is 174
When player scores 191
Then high score is 191
Current implementation stores high score in the web browser’s local storage. This detail does not change the purpose of this Kata very much since any other platform and language can have its own analog of local storage (file system, in-memory or local database, application settings, etc.). HighScore
object looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Task for the Kata:
HIGHSCORE: NaN
. (NaN
is javascript’s abbreviation for “not a number”). parseFloat
most probably is a culprit for this.nodejs
).The focus of this Kata is on architectural boundaries, that this little innocent class spans.
Questions to ask yourself:
Next time we will take a look at one possible solution for this Kata. Try to solve it on your own, my dear reader, and please share the code and insights!
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>Duck type is the concept in the domain of the type safety that represents objects, that pass a so-called “Duck Test”:
If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
In terms of programming language, it might look like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
The point is that the public interface has methods swim()
and quack()
. This is how you identify the duck in a programming language. This concept is very similar to the concept of the interface
in programming languages that have one, but it is not enforced in any way by the programming language.
Duck typing is mostly natural in dynamic languages, where it is possible to send any message to any object and the check if that is something possible will happen at runtime. In static languages, it is still possible to use duck typing via some sort of Reflection.
In a dynamic language, it is important to make it obvious, that something is implementing certain duck type by writing one test suite for all implementers and executing it against them. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
This test suite has to go only through the Duck type public interface. If it is not possible to test behavior through it, one should test at least the function signatures, for example:
1 2 3 4 5 6 |
|
Not doing contract tests for your ducks may result in a passing test suite and broken production code. For example, when one duck and its test suite have been updated, but others haven’t.
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>Where Double
is an abstract test double, which has no functionality - it is a general concept to talk about test doubles.
Dummy
- is a test double, that is used to fill parameter lists, in cases where these parameters are not used by production code. Simplest Dummy
would look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Stub
- is a test dummy, additionally, providing an indirect input for the production code from the test. “Indirect” means here via a method call on the stub object or a call of the stub function. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Spy
- is a test stub, additionally, verifying an indirect output of the production code, by asserting afterward, without having defined the expectations before the production code is executed. For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
Mock
- is a stub, but the expectations are defined before the execution of the production code and it can verify itself after the execution. A simple example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
Mocks can be much more complex (verifying order of messages, allowing multiple messages to be sent, etc.). So it is recommended to either:
And if you do have to use your own custom mocks, please, write tests for them, since they can have a lot of logic inside of them.
And, finally, Fake
- is a test double providing a simpler implementation used in the tests instead of the real thing. A good example is an in-memory database gateway, that behaves the same way the real one would, but it stores all the data in the memory. A very simple example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Obviously, fakes require full-blown testing for them. And if the real implementation is testable (even if it is slow), it is a good idea to have the same test suite for both: fake and real implementation. This way we can really be sure, that the fake behaves the same way as the real thing. And don’t forget about the edge cases, for example, if the real thing can throw a ConnectionError
, the fake should be able too (after being instructed to do so via a special method in the tests).
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>1 2 3 4 5 |
|
This article is the fourth one of the series “Build Your Own Testing Framework”, so make sure to stick around for next parts! All articles of these series can be found here.
Shall we get started?
So where should the name of the test suite come from? Probably it should be a test suite class name. Currently, all of them are anonymous classes and therefore don’t have a name:
1 2 3 4 5 |
|
We would like all test suites to have that name, for example:
1 2 3 4 5 |
|
We should write a test for this case:
runTestSuite
Let’s try to write a test in a RunTestSuiteTest.js
test suite for that:
1 2 3 4 5 |
|
Now it is problematic: how are we going to assert that something is reported? Should we replace console.log(message)
or process.stdout.write(message)
with our own implementation, so that we can test it?:
1 2 3 4 5 6 |
|
And then we should be able to assert with: t.assertTrue(logged.indexOf("TestSuiteName") >= 0)
. Finally we will need to restore the old console.log
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
While this code works, it has multitude of problems:
oldConsoleLog
function is not restored;And fixing the last problem will actually fix everything else because this problem causes others. We can fix it by introducing some sort of Reporter
type, that can respond to reportTestSuite(name)
message:
1 2 3 4 5 6 7 8 |
|
reporter
in this case is some sort of test double. And what are they? - Find out here: Introducing Test Doubles.
So our reporter
object in the test seems terribly like a Spy Double to me, let’s test-drive it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Now we are getting the following error:
1 2 3 4 |
|
We need to create ReporterSpy object now:
1 2 3 |
|
Now we are getting:
1 2 3 4 |
|
Now we need to create a function assertHasReportedTestSuite(name)
for out ReporterSpy
:
1 2 3 4 5 6 |
|
Next we need to make sure, that expectedName
is actually present in the error message by triangulating with different name:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Then we need to make sure that we do succeed when the message is received:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
And all our tests pass. Now, when the wrong name is getting reported we should still fail:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
And all tests pass again. Although, we should notice this weird condition:
1
|
|
Looks like our current production code is not generic enough, it will work well only with the expectedName
equal to "HelloWorld"
. Let’s fix that by triangulating over this parameter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
And all the tests pass. Now we can get back to our failing test for the runTestSuite
:
1 2 3 4 5 6 |
|
To implement this, first we will need to accept options
parameter with sane defaults:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
After making the failing test pass and triangulating over the name of the test suite:
1 2 3 4 5 6 7 8 |
|
And all tests pass now. Unfortunately, this is the output that we see now:
1
|
|
Yeah, empty lines. This is because (function () {}).name
is equal to ""
. We need to give proper names to all our anonymous constructors for the test suites:
1 2 3 |
|
And now we should see the correct output:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Great, now we would like to render the name of the executed test:
1 2 3 4 5 6 7 8 9 10 |
|
Of course this fails, because we need to implement assertHasReportedTest(name)
now for our ReporterSpy
. Let’s test-drive it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
Unfortunately, this does not pass our tests, because this test fails now:
1 2 3 4 5 6 |
|
After an investigation, it becomes clear, that this happens because we can not re-use reporter
variable defined at the higher level since all tests share the same testSuite
object at the moment. We will have to move the creation of the reporter
variable inside of each test:
1 2 3 4 5 6 7 8 9 10 11 |
|
And this makes all our tests pass.
This is quite a noticeable problem, that our users can be frustrated with, so we probably should make it easy on them and allow such variables to be fresh for every test. This can be achieved quite easy if we were to create a new testSuite
for each test. Let’s write a simple test to show the problem:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
And now let’s implement it by creating the testSuite
for every test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
After doing this, we can move var reporter = new ReporterSpy(t);
to the top level of the ReporterSpyTest
suite again. And all the tests pass.
Finally, we need to make sure that the test suite, that we have written before will pass:
1 2 3 4 5 6 7 8 9 10 |
|
As expected it fails with Error: Expected test 'testSomeTestName' to be reported
. After fixing it and applying triangulation once, we would end up with the following implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Now, it seems that both ReporterSpy
and SimpleReporter
are implementing the same Duck type - Reporter
. What Duck Type is? - find out here: Meet Duck Type.
So we should test all our ducks that their public API don’t get out of sync:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
All the tests pass. Unfortunately, the output regarding this test suite looks weird:
1 2 3 4 5 6 |
|
The test suite name is empty. I think we need an ability to define a custom and dynamic test suite name:
We can achieve this by allowing any test suite to define special hook method, that will return its custom name, like testSuite.getTestSuiteName()
. Let’s write a test for this:
1 2 3 4 5 6 7 8 9 |
|
After implementing it and triangulating over the name once the code looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Now, if we were to use this feature in our duck type tests:
1 2 3 4 5 6 7 8 |
|
Then we are getting the proper output:
1 2 3 4 5 6 7 |
|
I think we are done with implementing our first simple reporter. Now we can see that the tests are actually executing and passing. The code can be found here: https://github.com/waterlink/BuildYourOwnTestingFrameworkPart4
There is still a lot to go through. In a few next episodes we will:
Stay tuned!
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>RED
is as important as other in Red-Green-Refactor cycle. If next test does not fail, it is either: already implemented, or has to wait until a later time (until it will fail).At its core the Triangulation Technique has the following idea:
After implementing one business rule (with Red-Green-Refactor) make sure to find all “weirdnesses” or non-generalities in the production code and one-by-one eliminate them by writing a test, that proves such non-generality, and then making it pass while removing non-generality. This is the third cycle of TDD - Mini Cycle.
This is a series of articles:
Shall we get started?
As tests get more specific, production code gets more generic.
When making the next failing test pass, our production code should also pass a whole class of similar tests. Best shown in the very simple example. The task at hand is to write the function sum(a, b)
that will add two numbers. Let’s see us a violation of the Specific/Generic rule:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
The production code to make this last test pass is as specific as the failing test now. The test of the same class (where we change the value of the b
parameter) will fail for it:
1 2 |
|
To follow the Specific/Generic rule we ought to make 4
into 2 + b
like that:
1 2 3 4 |
|
This way, when we change b
to any value it will still pass the test, aside from the fact, that we didn’t do anything about a
. This is because we still don’t have any test showing us, that parameter a
is important, like the following one:
1 2 |
|
Again we can make it pass in a very specific fashion by introducing specific if
statement, or we could do it to pass the whole class of such tests:
1 2 3 4 |
|
Have you noticed, that from the test suite side we had to “prove” that some knowledge in the system is important and had to be used? This technique is called Triangulation.
In the essence, Triangulation technique has a very simple idea at its core:
* - important from the perspective of the system or unit under the test
One Red-Green-Refactor cycle really has to have all stages in it. And I’m not ranting right now about “Refactor” stage, that is a given. Rather, I insist on the “Red” stage - in TDD, when we write a new test, it has to fail. Writing tests that do not fail is another way to get ourselves stuck while doing TDD. One could ask: “If I can’t write this test because it does not fail, what should I do about the requirement it represents?”, and the answer is rather simple - either this requirement is already implemented and tested by other tests, or we still need this test and we will get back to it later when it actually will fail.
As we can remember, in the first part of these series, we were going through an OrderKindValidator
example, and we were writing multiple tests in a row, that were all expecting the same outcome and of course they didn’t fail, because we had one line in our function that made them all pass. If we were to sprinkle some other tests, that do fail (like a test for a valid order kind), after making it pass, all of these tests will now be failing and therefore they are good candidates for our next test. Let’s see it with our own eyes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Now is the point, where we have to choose our next test, and last time we have chosen the test with the same outcome and it did not go so well. Let’s choose a test with different outcome, e.g.: when valid order kind is provided:
1 2 3 4 |
|
Now, we have 2 options, to either check for order[:kind] == %w(private)
or to check for order[:kind]
being absent. It does not matter what we choose at this point, so let’s go with the first one:
1 2 3 4 5 6 7 8 |
|
Now let’s apply Triangulation technique. We should always ask ourselves the question: “What is weird about this code?” and “What failing test should I write to point out this weirdness?”. First weirdness we can spot is that the validator currently accepts only one order kind - private
. According to our requirements it should also accept corporate
:
1 2 3 4 5 6 7 8 9 10 11 |
|
We also know, that our system should handle duplicate entries in order[:kind]
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Wow! We, of course, can check for kinds
to not be nil
, but I would rather listen to this test failure and put a check for kinds
being absent (and this makes for our second check, that we could have chosen from):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
So this passes all our tests. It may look weird, and this is exactly the pointer for us which test to write next to prove, that this weirdness is incorrect:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Production code starts looking not so clean and I think it is time to give things proper names:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
There is only one weirdness, that is left for triangulation in current production code, before we can move on to the next requirement - private
can be duplicated while corporate
can not:
1 2 3 4 5 6 7 8 9 10 |
|
Great, now we can safely go back to our empty order kind edge cases:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
And it is a good opportunity to eliminate some duplication:
1 2 3 4 5 6 7 8 |
|
Now, it is a good time to triangulate, because we have a weirdness in our code: kinds[0]
. To prove that this is too specific we can write another test:
1 2 3 4 5 6 7 8 9 10 |
|
Notice, how every single test that we have written was failing and how easy it was to make it pass. This suggests that we are probably moving in the right direction. Let’s test our next requirement - we can combine private
and bundle
:
1 2 3 |
|
Wait a minute. This is really bad. We should have a failing test here. This happened because we are checking only for the inclusion of private
or corporate
and we do not care about anything else in the order[:kind]
array. We have to discard this test and try to go with failing version of the same business rule - invalid order kind can not be combined with private
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
While this works, it leads to two other weirdnesses: kinds[1]
and "invalid"
, let’s the latter first:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Other tests fail now, from them it is possible to see, that second kind should be either private
or corporate
:
1 2 3 4 5 6 7 8 9 |
|
This looks rather clunky, we should make it a bit cleaner:
1 2 3 4 5 6 7 8 9 |
|
Let’s eliminate the other weirdness - kinds[1]
, it probably should verify all kinds in the array:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
And now this can be greatly simplified by inverting the boolean logic:
1 2 3 4 5 |
|
Now that we have dealt with all weirdnesses in our production code, let’s get back to our requirement:
1 2 3 4 |
|
Wow! Now it fails exactly as it should. This means that it is now the right time for this test! Let’s make it pass by adding bundle
to the list of allowed order kinds:
1 2 |
|
Nice! Our next requirement is about bundle
not being used on its own, i.e.: either private
or corporate
is required:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
And this is good enough, because that is really the only case, when this can happen until the list of allowed order kinds is extended by future business requirements. We should at least give this condition a proper name:
1 2 3 4 5 6 7 8 9 |
|
Except, that we could provide duplicated bundle
:
1 2 3 4 5 6 7 8 9 10 11 |
|
Now it is time to move on to the final requirement about conflicts between private
and corporate
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Of course, kinds == %w(private corporate)
can be considered too specific for production code, we should triangulate it:
1 2 3 4 5 6 7 8 9 10 11 |
|
And, finally, let’s give this condition a proper name:
1 2 3 4 5 6 7 8 9 10 |
|
I believe we are done now. Source for this example can be found in an open pull request here.
Let’s recap how Triangulation technique worked for us here.
The main goal of triangulation is to prove that the code is not general enough along some axis (class of tests) by writing a test and then making sure it passes. Effective application of the technique requires to prove and eliminate all such “weirdnesses” or non-generalities from the production code after each Red-Green-Refactor cycle for business requirements. This is, in fact, the 3rd cycle of Test-Driven-Development called Mini Cycle of TDD, it should be executed about every 10 minutes.
Another observation is that following this technique we are introducing only one small piece of knowledge into our production code, for example:
if
statement with a certain body (in this example it was a raise error
statement). Since we can not introduce the if
statement without a condition we need to put some condition there and we put a very specific condition on purpose since we know that it is tested and it is simple.Today we have learned the Golden Rule of TDD - “As tests get more specific, production code gets more generic”, and we have learned the Triangulation Technique, that allows us to follow this rule in an incremental and confident way. Additionally, we have learned, that following Red-Green-Refactor strictly is important, and this includes even the RED
stage of this cycle - when the test for business requirement does not fail, it is either: already implemented or it has to wait for later.
This is a series of articles:
You would not want to miss next articles on this tech blog, we still have a lot to talk about:
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>Code examples today will be in Ruby programming language. The technique itself is, of course, language-agnostic.
Ways to avoid this outcome:
Finally, do not forget to remove redundant tests if any.
This is a series of articles:
Buggy if
-riddled code is what we’ve got. It is even not so easy to read. While we can refactor it to be more readable that won’t change the presence of bugs, though. Let’s still do it to understand what happens in this code better:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
Structure of the class, actually, sounds just right, but conditions are not good:
1 2 3 4 5 |
|
Really? It does not do what it says. At all. It basically just solves the problem very specifically to the tests. I can easily come up with a test that will break it:
1 2 3 4 5 6 7 8 9 |
|
1 2 3 |
|
This at least does what it says. But only for one specific case, instead of general one. One test that I can come up with right away:
1 2 3 4 5 6 |
|
1 2 3 |
|
While this may work for our current requirements, it is really confusing for the reader. Method name says: “has no required kind” while method body checks if it is only bundle
. And it does not work well with that edge case:
1 2 |
|
While this case is quite unlikely, nothing in business rules forbid that and some other part of the system may as well duplicate bundle
kind for some reason or it may be a user input mistake.
1 2 3 |
|
This method, indeed, checks that kind
is invalid
. Literally "invalid"
. Which would mean, that all kinds except exactly "invalid"
are allowed. This is not true according to our business rules. In fact, we have already written the failing test for this some moments ago:
1 2 |
|
Let’s comment out these failing tests and try to force-TDD our way through these bugs by uncommenting and fixing them one-by-one following Red-Green-Refactor loop:
So, let’s uncomment our first failing test:
1 2 |
|
We are expecting validate_only_known
to fail with its message and that means invalid?(kinds)
should return true. To make it return true
in this case and preserve its old behavior we will need to remove private
, corporate
and bundle
from kinds
and check that it is not empty:
1 2 3 |
|
See how we had to write the whole thing in one go. There is no chance to write it incrementally because there will be a bunch of tests that fail. Wait! While it does not fail for any tests related to invalid kinds, it fails for all tests related to emptiness:
1 2 3 4 5 6 7 |
|
So we need to change more production code to make this one tiny test pass. It looks like validate_non_empty
is a culprit now - it is being called after validate_only_known
. It should be the other way around:
1 2 3 4 5 6 7 8 9 10 |
|
Oh! Now a bunch of other tests fails:
1 2 3 4 5 6 7 8 9 10 |
|
From failure messages it is possible to guess, that the culprit is empty?(kinds)
function that fails in too much cases now, such as: ["bundle"]
, ["private", "corporate"]
, ["almost anything"]
and ["invalid"]
. This is because it was not doing what it said it was:
1 2 3 4 5 |
|
And this is why it was hard to change the order of validations. We will have to completely rewrite this function. Let’s start small and see which tests fail:
1 2 3 |
|
The failures are:
1 2 3 4 5 6 7 8 9 10 11 |
|
Good, only tests related directly to this case are failing. So one-by-one we can construct our condition while fixing these test failures:
kinds.nil?
|| kinds.empty?
|| kinds[0].nil?
(turned out to be redundant in the end)|| kinds[0].empty?
(turned out to be redundant in the end)|| kinds.any? { |k| k.nil? || k.empty? }
After refactoring empty?
the function now is looking this way:
1 2 3 4 5 6 7 8 |
|
And all tests, finally, pass. It took a lot of effort and re-writing to get this one little test to pass. This is what we call “Getting Stuck” in TDD. There is always an order of tests that will lead to this result almost for any somewhat complex problem.
The code can be found in GitHub repository in an open pull request here.
Almost guaranteed ways to get stuck in TDD:
And to not get stuck is to do the opposite:
Today we have seen how bad the results of getting stuck while doing TDD can be. In the next article of these series, we will explore Golden Rule of TDD and the technique called Triangulation, that allows us to incrementally test-drive code in a way, that it will always be conforming to the Golden Rule of TDD and therefore will never get us stuck. Stay tuned!
This is a series of articles:
You would not want to miss next articles on this tech blog, we still have a lot to talk about:
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>Code examples today will be in Ruby programming language. The technique itself is, of course, language-agnostic.
“Getting stuck” happens for a couple of reasons:
This is a series of articles:
Usually “Getting Stuck” follows this pattern:
This last step usually takes minutes to hours depending on the complexity of the problem at hand. Additionally, the first few tests are basically wasted time since they did not produce any bits of knowledge in the production code that persisted in production code in the end. Even worse, chances are that the algorithm that we have just written is not fully covered by current tests, since we have written it in one go just to make current failing test pass - this is no longer correct TDD and can not guarantee high test coverage, and, therefore, can not guarantee high confidence anymore.
Let’s go through a small example on how one can get stuck in TDD:
Let’s define the problem at hand first. We have some sort of order request as an input to our system and we need to validate that its kind is correct:
private
, corporate
, bundle
,private
and corporate
order kinds can not be combined, otherwise InvalidOrderError
with message Order kind can not be 'private' and 'corporate' at the same time
,private
or corporate
should be always present, otherwise InvalidOrderError
with message Order kind should be 'private' or 'corporate'
,InvalidOrderError
with message Order kind can be one of: 'private', 'corporate', 'bundle'
,InvalidOrderError
with message Order kind can not be empty
.This is a fairly simple problem and it is easy to get stuck while doing TDD here. So let’s write our first test: “When order has no order_kind, then we should get InvalidOrderError with message ‘Order kind can not be empty’”:
1 2 3 4 5 6 7 8 |
|
And the simplest implementation possible:
1 2 3 4 5 6 7 8 |
|
Next test is our next simplest edge case - when kind’s value is nil
:
1 2 3 4 5 6 |
|
It does not fail at all, so we don’t have any reason to change the production code. We can already spot a little duplication - validator
variable. Let’s extract it as a named subject of the test suite:
1
|
|
And OrderKindValidator
can be replaced with described_class
(RSpec feature), so that we will not have to change too much in case we wanted to change name of the class:
1
|
|
Next simplest edge case - when kind is an empty array:
1 2 3 4 |
|
I believe I am spotting annoying pattern now:
1 2 3 4 |
|
It would be really nice to write it in this fashion:
1 2 3 |
|
And as another duplication piles up:
1 2 3 4 5 |
|
Now the next tests look very easy and simple:
1 2 3 4 5 6 |
|
And they all pass right from the go. The implementation for the it_fails_with
is looking like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
So, let’s write our next edge case - when order kind is invalid:
1 2 |
|
Pretty neat! And oh, it fails:
1 2 3 |
|
And the fix:
1 2 3 4 5 6 7 8 9 |
|
Let’s write our next test - when order kind is private
:
1
|
|
This fails as expected with expected no Exception, got #<InvalidOrderError: Order kind can not be empty>
. And to make it pass we need to wrap second raise
statement in the if
condition:
1 2 3 |
|
The implementation for it_does_not_fail
looks like that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Let’s write our next test:
1
|
|
And it fails with the expected error: expected no Exception, got #<InvalidOrderError: Order kind can not be empty>
. The fix is to amend our if
condition with that case:
1 2 3 4 |
|
And the tests pass. Our next business rule is that one of private
and corporate
should be always present:
1 2 |
|
As expected the test fails:
1 2 3 |
|
And to fix it we just need to sprinkle another if
statement in the middle of the function:
1 2 3 |
|
As expected, the test passes. Now we should test the next business rule - order can not be of private
and corporate
kind at the same time:
1 2 |
|
This, as expected, fails with error message:
1 2 3 |
|
And easiest way to fix that is to add another if
statement:
1 2 3 4 5 |
|
And it passes. Let’s test that we can combine private
or corporate
with bundle
order kinds:
1
|
|
And it fails with error: expected no Exception, got #<InvalidOrderError: Order kind can not be empty>
. To fix this we will have to amend our last if
condition in the function even more:
1 2 3 4 5 |
|
And the test passes. Let’s refactor the code a bit:
order[:kind]
duplication to a local variable kind
raise
statement to the private methodAfter this, OrderKindValidator
will look a bit cleaner:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Let’s write our next test for the same business rule (now a corporate bundle):
1
|
|
And it fails with error: expected no Exception, got #<InvalidOrderError: Order kind can not be empty>
. To fix this we need to add && kind != %w(corporate bundle)
to our last if
condition again.
The code can be found in GitHub repository in an open pull request here.
Now it seems that we have implemented all the business rules (we have all tests for them). Or did we?
Buggy if
-riddled code is what we’ve got. We will see why in the next part of “Getting Stuck While Doing TDD” series. Stay tuned!
This is a series of articles:
Today we have implemented our not-so-complex problem at hand while following 3 rules of TDD. The result was not of the best quality and we will take a look why in further articles of these series. You would not want to miss next articles on this tech blog, we still have a lot to talk about:
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>if
statements tend to duplicate throughout the code base. This may lead to subtle mistakes and bugs. One way to avoid that problem is to eliminate if
statement completely. Today we are going to take a look at one example of such elimination. Code examples today will be in Kotlin.
verification token
given device id
and phone number
of the user’s mobile device.verification tokens
with 3rd party API.verification token
is fairly standardized.issuer
field of verification token
has to be URL of the API that has issued that token and 3rd party API in question validates this fact.issuer
field gets generated as com.tddfellow
. According to this standard, it has to be https://tddfellow.com
.issuer
to be com.tddfellow
, we can not change them as they are already installed on users’ mobile devices.Solution: bump the version of our API from v1
to v2
and use v1
for integration with old mobile clients and use v2
for integration with 3rd party API and all new clients.
Main program, containing routing information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Endpoint that issues verification token:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
And the UseCase itself:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Code can be found here.
if
StatementEasiest solution using passed in apiVersion
from the Main
program and switch on it being old or new in the use case to determine which issuer to generate:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
And the endpoint just passes this value through to the use case:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
And finally the if
statement in the use case:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
The full code change available here (via open Pull Request).
This solution has quite a few problems:
if
statement smells a bitapiVersion
, since APIs is not our domain, it is just a delivery mechanismIf we were to pass some object, like TokenIssuer
, it would probably be more appropriate to have use case know of it. Let’s try to refactor:
if
Statement Using PolymorphismFirst, let’s start passing in the token issuer in the routing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
And this is how TokenIssuer
and its derivatives are looking like:
1 2 3 4 5 6 7 |
|
1 2 3 4 5 6 7 |
|
1 2 3 4 5 6 7 |
|
As you might guess, endpoint just passes this object through to the use case. And the use case itself just calls getName()
on it when generating issuer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Full code change can be seen here (in the Pull Request).
This code may be refactored further so that even Endpoint
class will not have to know about tokenIssuer
and pass it through. I will leave that as an exercise to you, my dear reader.
You would not want to miss next articles on this tech blog, we still have a lot to talk about:
Thank you for reading, my dear reader. If you liked it, please share this article on social networks and follow me on twitter: @tdd_fellow.
If you have any questions or feedback for me, don’t hesitate to reach me out on Twitter: @tdd_fellow.
]]>