Author: Tal Weksler

Tal Weksler is the CTO at Fieldin. He has three girls and a dog and likes to drag them all on long hiking trips in his free time with at least one sci-fi book in the backpack.

CI and Testing, Part 2: Fieldin CI and Development Cycle

In part #1 of this two posts series, I wrote about the QA philosophy at Fieldin and now it is time to get into more details about our CI and development cycle.
Before we jump into the details, I want to start with how we test our code and to introduce you to a startup called Testim.IO.
In most cases, your developers will create and maintain unit tests while a different team, commonly the automation QA team will maintain integration tests, usually using Selenium or any other similar automation framework. That means that you need to have developers in your QA team that know how to code for Selenium and have them fix and maintain tests each time a new scenario or feature is introduced. Your company probably has some sort of dashboard or reporting system to follow all the tests and you also have to maintain this dashboard somehow.
This is where Testim.IO comes into play as they have super easy to maintain and use Selenium framework as SAAS (there are several other companies providing the same solution, we found them to be the easiest to use and integrate, but you are welcome to check others of course). There are also free Selenium extensions out there to record the tests, but nothing even remotely comes with the simplicity and full framework like Testim.IO or similar SAAS companies. You can run tests on their grid or locally, you can monitor results and you can record tests in a couple of seconds. The tests are also very stable and usually do not break.
Fieldin Testing Philosophy — Anti-pattern of the Pyramid of Tests
Many integration tests, very few focused unit tests.
In one sentence, this is our testing philosophy. Using Testim.IO we can create, maintain and deploy a huge amount of integration tests (a very long full-feature check test) that can run on every step of the way (when the developer finishes working on her task, before QA, during QA, before production, etc.). If the test fails, we get a very detailed report describing where and why it failed. True, it will not be as pinpoint as a unit test, but to us the benefit of multiple tests is more important (right now we have around 500 integration tests in the system and each week we add more).
The following diagrams show how we save 30% development time:

On the right, the ideal by the book software testing pyramid, on the left the software ice-cream cone anti pattern. Notice that at Fieldin we do almost no manual testing as almost everything is covered by easy to maintain UI tests. Since the developers are not busy writing a lot of unit tests we can easily increase the development time.
Unit Tests
We still do unit tests but we are only performing them on complex logic or functions that can receive very diverse and different inputs which we want to make sure still work after every change. We also do a lot of unit tests on offline services of course that are not covered by the automation UI tests.
The Cycle
The cycle of development — QA — Deploy is very simple.
A developer is assigned with a task (we are using JIRA as our task management system) and creates a new feature branch in the GIT repository. We have scripted everything together so with a simple CLI command the developer can create a new branch and move the task to “in progress”. Once the developer finishes the task, she will create pinpoint unit tests and then move the task to the selenium grid (powered with Testim.IO). If one of the tests failed, the task automatically returns to the developer and she will get notified. If all goes well, the task is moved to QA for deeper observation and usually the QA will create new tests using the Testim.IO framework. Once approved, the task will be merged into the main branch and will be deployed to production in our weekly releases.
We usually do weekly releases and aim for 1 day releases, but we can do a release whenever we want as the main branch is always ready to be released.
The main advantages of our CI is how fast we can move and deploy mature code to production. We manage to deploy new features and fixes very quickly and move very fast.
The logic behind our way is that developers need to do what they do best — do more code and less QA.
We know this is standing in contradiction with many other development traditions like TDD and using the developers to write their own tests, but we believe that the combination of the Testim.IO framework and our ability to use dedicated QA engineers to generate a huge amount of tests makes the path we took worthwhile — at least for now, we are checking ourselves all the time. The results speak for themselves as we manage to move and react very fast to customers’ needs.

Posted by Tal Weksler

CI and Testing, Part 1

Should developers do their own QA?
The question of should developers do their own QA has risen in the past couple of years, especially with the increase of CD processes which try to minimize the time to production. It is enough to check Google autocomplete to see that this is a major interest in the industry.

Unfortunately for the confused R&D manager or team leader, there is no definite answer. The decision if your developer should do QA (and if so, how much?) should come after reviewing a few factors like the stage of the company (bootstrap vs corporate), the number of developers in the team, their skill, the R&D budget and other factors.
Having taken all of the above into consideration, to me, an early stage startup up to a medium-sized company should really reduce the QA their developers do and let it be handled by QA professionals. Why? Well money of course.
By the way, for a big company or corporate it is a more complex question which also depends on the product stage, team etc.
Let’s start by looking on the two main pros for developers to QA their own code:

Owning the task — the developer knows his code best. QAing the task and giving it the final stamp of approval makes the developer responsible for her work instead of sending the task to be QAed by someone else.
Reduce the time from development to production — this is the holy grail for all companies as they want to ship new features and bug fixes to production as fast as possible.

Of course, no one will argue against the first one — we all want our developers (and all other employees in the company) to be responsible for their work. However, as many developers noticed, it is very hard to test your own work — even if you have full tests suites covering your code. You are still limited by what you are not thinking to test. There are a lot of methods to overcome this mental block, but those are out of the scope of this post.
But wait, we assumed that the project has full (or almost full) tests coverage. In real life, reaching the level where there is enough tests coverage of the code might take time, and without it, you don’t really want your developers to deploy their code to production. In addition, many companies (mainly B2B) just don’t have enough users to really do gradual deploy for limited users at first and then full roll out so you are going to push the code for all users without knowing if you have covered every scenario.
Now let’s talk a little about money which to me is somehow always pushed aside in the discussion — maybe because most team leaders or R&D managers don’t get involved in the day to day operation and budget of the company. This is a big issue as developers might choose the wrong path or technology because it is better for development but not better for the company. I will try to discuss this topic in a different post in the future.
The average time spent by developers on testing their code is estimated to be around 30% of the working time. According to Glassdoor, the average salary difference between QA and a developer is 2/3. That means that if your team is more than 2 or 3 developers, you should really think about letting your developers do what they are being paid to do and bring a QA to do what they know to do.
Can you imagine the face of your product manager if you tell her, “listen we can work 30% better and write 30% more code which means more features and tasks done in the product”.
Of course, there is a downside. Your deployment will be reduced from immediate by the developer to once a week (or any other timeframe you choose. I recommend once a week although in Fieldin we aim for 1-day delivery) in order to let the QA team test everything and we all want to be agile and lean and to deliver. However, this once a week version will have less bugs and more features so it should be overall ok money-wise.
Wait, money again? Yes money. Your product manager is not writing feature tickets as a hobby. Those features provide better product and thus increase the value of the product and company — that equals money even if sometimes it is hard to quantify it. 30% more coding is a lot of money for the long run of a company. For most companies, there is no real difference between shipping code every day or once a week — if you can achieve better code quality and less bugs, to me the choice is obvious.
Having written all of that, it is still important to understand that:
1. You still need CI and tests for large scale product. You should develop a CI method that tests your code and features. It is just that you want your QA to oversee most of it.
2. Your developers can still write tests and unit tests are always welcome. Just don’t reach 30% of the time of the developer.
3. The right person to the right job. From my experience, good QA and good developers are just different types of people. The average developer will not be able to be as thorough as the average QA (and the opposite of course) so why put such expectations on your team?
At Fieldin we are practicing the above which allows us to deliver new features in a rapid pace and still be very flexible and adaptive to changes.
We have had our CI system for almost 3 years now and it’s still relevant and supports us as we grow with developers and QA. Our developers do unit testing and are responsible for their work. We always try to reduce bugs and talk about responsibility, but we are not letting them write full test cases and other stuff that can be handled by experts in their field.
In the second post of this two-post series, I will go into details on how we implement the vision behind our CI.
Read the second part, CI and Testing, Part 2: Fieldin CI and Development Cycle.

Posted by Tal Weksler

Altering big tables in mySQL

Altering a big-sized table is never a fun task. It can cause the table to lock for anywhere between a couple of minutes to a couple of hours. If the table is an essential part of your production environment, this could pose a major challenge.

Couple of scenarios for such change can be:
1. Add column
2. Drop column
3. Add an index
4. Update an index
This situation happens to me every now and then and I came up with a very easy solution and while the solution I provide is not fully downtime-free, it minimizes downtime as much as possible. You can be up and running again within 5 minutes. It is also much simpler than many other solutions I Googled. If you cannot accept even 5 minutes of downtime, continue Googling…
In this post I will explain how to alter a table while the DB is hosted as an RDS instance on AWS. This process is very easy using AWS RDS but it can be done on any other cloud provider or on premise.
Step 1: Set the Binlog Format
Binlog is where each action in the DB is saved. You can control how it is saved by changing the “binlog format”.
If you don’t already have a separate parameter group for your DB, create one and assign it to your DB. If you are using the RDS default, just clone it and assign the clone to your instance.
We now need to set the binlog format.
Statement — It will save each query, I.E: “Update users set name=’tal’ where id = 1”.
Row — It will save the data that changed to the binlog
Mixed — Let mySQL choose what to save.
Usually the default is Mixed which is ok for most scenarios, but on some occasions, you may want to change it if Mixed is failing. You will see that your replication is failing once you start changing the table. Notice that after each change you must reset the DB which will cause additional downtime.
Step 2: Replication
Create a read replica of the DB. Make sure to make the replica to be the same as your production. I.E, if you are using Multi-AZ, your replica should be Multi-AZ as well.
Step 3: Change the parameter group of the replica
Clone the parameter group of the production server and create a new one.
Change the “read_only” parameter to 0.
You can now work on the replica.
Step 4: Change the table
On the replica server, change the table. If a large amount of data is going into the table from the production server, this step will lock the table and will block the whole replication.
Once it is finished (can also take days, depends on the size of the table), the replication should continue automatically, and you should give it time to catch up.
You can monitor the progress using the “show slave status” command on the replica server or through the RDS admin.
In the command, you are looking to monitor the “Last_Error” field to see if something interrupts the replication and the “Seconds_Behind_Master” in order to see that both servers are up to date and syncing (should be 0).
Step 5: Switch replica back to the original parameters group.
Once the replica is syncing again, return the production parameters group to the replica — this will active again the read only.
Step 6: 5 minutes of downtime
It is now the moment of truth. The following process can be done in 5 minutes:
1. Modify the production instance. Change the name of the instance to be something else. This will cause the endpoint of the instance to be changed and will shut down your code environment.
2. Once the change is finished, it means that no new data is coming into the DB. Select the replica and under actions choose “promote read replica”. This will cause the replica to become a new standalone instance, separate of the replication..
3. Modify the name of the replica instance to the original name of the production instance. Once finished, everything will return to normal and you will have a new production instance with the same data but with a modified database.
4. You can now go ahead and delete the old instance or keep it for couple of days just in case.

Posted by Tal Weksler