skills17

Automated testing in a competition environment

Published on 19 September 2020, 15:05

This year, we were able to automatically test 100% of our tasks at ICTskills2020. We of course still always check the results manually to make sure our testing framework works as expected and provides the correct results. And although we still find improvements every year to make it even more reliable, it is at a very solid state and has reduced the marking time from hours or even days to just a few minutes (manual confirmation excluded).

Tests in a competition environment

Every developer probably knows how to write unit tests. But when writing tests for a competition, there are a few things to consider that you normally maybe wouldn't think of.

Leave space for different implementations

There are often a million ways to solve a given task and you can assume that there will be at least one solution you will never think of. In a competition, it does not matter how a specific task has been implemented, as long as it fulfills the requirements. So it would be wrong to write too specific tests that only work for your solution.

An example of that could be to implement a grid in CSS with 5 equally sized columns: instead of checking for specific CSS properties (e.g. grid-template-rows) which work only for implementations with CSS grids, more abstract things should be tested like computed element widths or positions. In that way, the tests pass no matter if the grid has been implemented with CSS grids, flexbox, float, or anything else as long as all columns have the same width.

Another example would be to test for background colors. If you would implement the task, you would maybe set it to the <a> element, but a competitor decides to set it on the parent <li> element. So in this case, it isn't enough to check the background color on one element, but you have to also traverse and also check the parent elements as long as it makes sense. Additionally, the computed CSS properties should be used because even for background colors, one competitor could use HEX values, another one RGB and another one even something different. By using the computed CSS properties, you are making sure that it always has the same format, in the case of Chrome it is RGB(A).

Write performant tests

In a competition, time is very valuable and you don't want competitors to waste time just waiting for tests to complete. So, time-consuming operations (like resetting a database) should only be performed if necessary.

Also, competitors should be able to only run a subset of tests or tests for a single method they were asked to implement. It should also be clear to the competitors how those single tests can be executed as not everyone knows the testing libraries used (unless they are specified in advance and are expected to learn them before the competition).

Prevent cheating

When marking automatically and providing the tests to the competitors during the competition, a big concern is cheating. Let's say if a test checks the implementation of a simple method that just does an addition (3 + 4 = 7), a competitor could just return 7; as he also has access to the test and doesn't really implement the addition. The test will then pass and he will get the points. Just testing more additions doesn't really prevent that, so we were looking for a reliable way to detect possible cheating.

The best solution we came up with are extra tests. Extra tests are basically just copies of the normal tests and so test the exact same functionality, just with different values. While the normal tests are provided to the competitors, the extra tests are not and are only used for marking after the competition. So if our testing framework detects that an extra test fails while the same normal test succeeds, it produces a warning that a manual review is needed to test for possible cheating. Screenshot 2020-09-19 at 16.57.28.png Warning of a failed extra test

End-to-end tests

While it is easy to unit test the PHP and JavaScript parts, providing tests for HTML/CSS is a bit more challenging. After evaluating some possible tools, we discovered Cypress which has worked best for us in the end. The UI looks modern and is very helpful, especially for the competitors to debug the errors that occurred. Competitors can also use the time-travel feature to see the website and be able to inspect them using the normal dev tools at the time a test failed or for any other performed assert.

But using end-to-end tests to assess the HTML/CSS has also a drawback. It is really hard to mark their designs or own innovations. So it is no longer possible to have subjective marks, for example, 'The design looks modern and polished'. Althought, it is of course possible to combine subjective marks that require manual marking and objective marks that can be automatically tested, like the responsiveness or presence and correct configuration of form fields, dropdowns, etc.

Output

Our tests are all based on popular testing libraries (PHPUnit, Mocha, Cypress) but we wrote custom result printers for all of them. That makes it easier for the competitors to get a summary and directly see the number of points they have scored. Additionally, those printers have a switch to return the results in a JSON format compatible with our marking backend.

Test output that the competitors see

Open sourcing

Soon, we will open-source our testing frameworks to be able to also help other competitions and work together with other countries. Stay tuned on this blog or write us at [email protected] if you are interested.

Conclusion

Using automated marking for competitions is possible and works really well. But it requires quite some time and knowledge to get started with them as competitions are a very different environment to normal projects where you would write unit tests. You always have to be very careful when writing the tests so they are restrictive enough to only allow correct results, but still allow the task to be implemented in different ways. That just needs some practice and it is very likely that you will still discover some things that you didn't think of when running the tests against the competitors code. But with all those learnings and improvements you can make to the tests, marking scripts, or the whole system, it always grows and gets better. And we are now at a point where we are very happy with it and it saves us a lot of time.