# Evolution

This is an addendum to my last post about randomness and fairness. As I was implementing a randomizer in the spirit of the random threshold model discussed before, a nice simplification occurred to me. I knew the probabilities p in my use case would be small, between 5% and 30%, and long streaks of bad luck can feel like a bug to the user. Therefore I wanted a tight upper bound for a streak of bad luck, so I chose c = 0.5, which results in an upper bound of 2 / p, while still allowing a two-streak of good luck. The code looked simple enough:

I noticed a couple of problems though.

1. It is somewhat likely to return true for probability p = 0% and below
2. It is somewhat likely to return false for probability p = 100% and above
3. The first call can never return true for small p, if entropy is initialized to .

It is easy enough to handle points 1. and 2. explicitly, but what about the initialization of entropy? Of course I want the probability of returning true on the first call to be equal to p. It is easy to see that entropy = 0.5f guarantees exactly that. But then one can just subtract 0.5f from both sides of the threshold test and obtain the following simplified randomizer:

# Elegance

Simply elegant, isn’t it? Feel free to use the code above.

# Sugar, sugar

Suppose you have a list of items you want to iterate over and do something for each item in C#. Also suppose you want to avoid any heap allocations while doing so, therefore fancy list extensions using lambda expressions are out of question. So you write something like this:

Which is syntactic sugar for:

Which, in turn, is syntactic sugar for:

Sweet, sweet syntactic sugar! Profiling confirms that that last piece of code generates zero GC alloc, if list is of type System.Collections.Generic.List<T>. So all is well, right? Wrong! Both the foreach loop and the second version with explicit using block generate 48 bytes of garbage. So why is that?

# Boxing

There is a small but important lie in the code transformations above: the var keyword. The only reason why that third snippet generates zero garbage is because List<T> implements a custom enumerator which is a struct. Structs are stack allocated and gets passed by value, meaning they are copied every time they get passed around as a function parameter or a return value. So zero heap allocation.

But the foreach pattern does not know what it gets. It expects an IEnumerator<T> out of the GetEnumerator() call. And even though List<T>.Enumerator implements IEnumerator<T> a boxing conversion happens as soon as you write something like this:

Why? I don’t know. Bug, oversight, or feature? I don’t know. Does it happen for all versions of .NET? I don’t know. However, I do know that it happens in all versions of Unity released to this date, and that is bad enough for me.

Also note that the same behavior is true for the using block which expects an IDisposable. Additionally a using block also emits a null check before disposing of the disposable, which of course doesn’t make any sense for a struct either.

So bottom line neither the foreach statement nor the using block can take advantage of having a struct instead of a class getting passed to them.

# Workaround

I tend to use a plain T[] array instead of a List<T> whenever possible. If I have to use a list, I tend to use old school indexer loops to iterate over them:

To iterate over collections like HashSet<T> or Dictionary<K,V> I use the following pattern:

It isn’t quite as succinct as the foreach loop but almost. The for loop syntax with empty increment statement might feel weird at first. The worst part is that the code never disposes of the enumerator. Therefore it’s borderline dangerous or simply wrong. But then again, as far as I know, for these collections there is nothing to dispose of anyway.

# Remarks

C# compiles an iterator block using yield statements into an inner class implementing IEnumerator. Jon Skeet writes about iterator blocks in great detail. Note that it generates an inner class, not an inner struct! Therefore there is no further cost for boxing involved but GetEnumerator() itself already allocates memory on the heap.

Why did they choose to implement it like this? I don’t know. But the implementation does adhere to the design guidelines for choosing between class and struct. More specifically the enumerator implementing the iterator block neither is immutable, nor necessarily smaller than 16 bytes, nor logically represents a single value, and most likely it will be used in a pattern requiring boxing anyway.

Funny though that they chose to implement collection enumerators as structs, since those violate at least two or three of the four characteristics as well…

# Anecdote

Let’s say you are creating a computer game where the player explores a dungeon and every time he finds a new treasure chest there is a 10% chance it contains a rare item. A naïve implementation goes like this:

Fairly straight forward, right? And it certainly fulfills the desired property of spawning 10% rare items. But soon the complaints of players roll in. I have opened 40 chests and not a single one of them had a rare item in it. This is not 10%! My game is broken.

# Perceived unfairness

Clearly there is nothing wrong with the implementation. It yields rare items with 10% probability on average. In fact running statistics on a sample of one million evaluations the expected value of treasure chests to open until finding a rare item is ten. So all is good, right? Wrong. Always look at the variance too! Turns out the variance is above 90, which basically means anything can happen. Here is what the distribution looks like:

Probability in percent of having to open x number of treasure chests until finding a rare item.

It’s easy to see that most of the time it takes less than 22 treasure chests to find a rare item. But what about the rest of the time? All bets are off. In fact I had to crop the graph to make it fit. The actual distribution continues way further to the right. There is a small chance that it will take 40, 50, 100, or even more chests to find that rare item the player so dearly desires. One out 100 players will encounter this situation and get angry.

This is bad, but what can we do? Let’s take a step back. What we really want can’t be captured by a mere 10% chance. What we want is a game experience where it takes roughly ten treasure chests to find a rare item. We don’t want everlasting streaks of bad luck. We don’t want long streaks of good luck either. Oh, and we don’t want it to be predictable. It should feel random but fair.

# Dithering

A good idea when trying to exert control over randomness is to start with something deterministic and then add a bit of randomness on top. Dithering is a deterministic process to generate a series of events. In our case there is only one event: loot or no loot. The implementation is as follows:

Again fairly straight forward. Every time a chest is opened we add 10% to the entropy until its above 100%. This will put a treasure in every tenth chest, which is great because we managed to prevent streaks of good/bad luck. But it’s also really boring. Nevertheless there are real games using dithering, in this case for calculating the chance to dodge an attack. To make combat a bit less foreseeable the entropy variable is reset to a random value before each fight. Here is what it looks like:

Resetting the entropy to a random value every now and then is still not great. If done too often we are back at the initial situation, if not done often enough everything becomes predictable.

# Adding randomness

Of course there are other options of adding randomness than messing with the entropy. One option is to randomize the success chance so that its expected value is 10%. I call it varying. Examples are:

Where random(a, b) returns a random value between a and b. Suffice to say that the former does not prevent endless streaks of bad luck and the latter needs very careful tuning of c. In any case both solutions make it impossible to score twice in a row. They are not random enough. This is how the first one looks like:

Another option is to deal a deck of cards. The deck is filled by using the deterministic dithering algorithm. Every time a treasure chest is opened a random card is drawn from the deck to determine the chests content. The deck is refilled using dithering. It is the perfect solution to prevent really long streaks of good or bad luck. The tricky part choosing the right size for the deck. Big decks play the naïve randomness, small decks play like dithering. In addition a deck based implementation needs more memory than other solutions. I tried something else.

Threshold

There is another straight forward option of adding randomness to the dithering algorithm. It is this line:

What happens if we choose a random threshold with an expected value of 1? Well, turns out to be perfect. Implementation:

I found a value of c = 0.75 to yield very pleasing results. This is what it looks like:

Isn’t it beautiful? A nice Poissonesque distribution spreading all the way to the left while staying bound to the right. Glorious!

Using dithering with a randomized threshold results in aesthetically pleasing distributions that match our desired goal of creating events that feel random but fair. The expected value is preserved while preventing streaks of good/bad luck.

# Comparison

A clear victory for the randomized threshold method, referred to as Hybrid in the graph.

# In depth analysis

As I said before I found a threshold value of c = 0.75 to yield very pleasing results. But what does c mean? The parameter c controls the amount of randomness added to the distribution. Smaller values stay closer to the deterministic distribution, large values increase variance. A value of c >= 0.5 is necessary to allow for scoring twice in a row no matter how small the configured success probability p. Values of c > 1.0 tend to make the distribution feel too random. Here is a comparison of various values of c:

All distributions are centered around the expected value of 10. The upper bound of a streak of bad luck is determined by (1 + 2c) / p where p is the configured probability of success. In our case p = 10%. I like c = 0.75 but really anything between 0.5 and 1.0 will work depending on the desired balance between determinism and randomness.

I’ll conclude this post with an overview of distributions using c = 0.75 for varying p. You can clearly see the Poissonesque shape of the distributions. Also the upper bound of 2.5 times the respective expected value is apparent.

# Anecdote

Recently I had to brush up on my C++ knowledge for a job interview. I have been coding in C++ for maybe fifteen years now but for the last three years I’ve spend most of my programming time doing C# instead because it pays my bills. So I decided to get a quick overview of the fundamental syntactic, idiomatic, and ideological differences between C++ and C# to restore my memories.

I searched for C++ vs. C#. My bad. I should have known better and I guess I even did but hoped for some helpful results anyway. Instead I was presented lots of opinionated and subjective attempts to answer ill-fated questions about performance and features. I will stop right here because I couldn’t say what happens when you ask these kinds of questions any better than Eric Lippert, Jeff Atwood, and Steve Yegge. In fact you should stop reading my blog now and read theirs instead.

# Language matters?

Don’t ask which is better. Ask what is the right tool for the job. In the end you should base the choice of programming language solely on project constraints and productivity considerations. The latter in turn are likely much more dependent on the expertise of your team than anything else. And for the vast majority of all software performance doesn’t even matter much. But then again the vast majority of software are apps, websites, and scripts. Not my line of business. I’m a specialized minority.

I do massively parallel computations and soft real-time applications. Stuff where you can’t afford having an operation take ten milliseconds to complete. Obviously I care a lot for performance. When it comes to languages and libraries less overhead and more control is better for me – as long as productivity doesn’t suffer too much. Because in the end someone has to pay for the time my colleagues and I spend on turning ideas into code.

On a more holistic scale for performance and productivity both C++ and C# are somewhat in the middle. Perhaps with C++ favoring performance and C# favoring productivity whenever the two are mutually exclusive. Which is less often than you might think. If you really need raw performance you should head towards C, assembler, programmable hardware, or even custom chips. Oppositely if productivity is prime look for existing solutions in terms of frameworks, libraries, or even complete software.

## C++ vs. C#

Lets compare these programming languages anyway. For reasons of scope I will just ignore all other programming languages. But you shouldn’t! Always be aware of what tools are available. They might be the better fit for the job.

I will also not consider platform dependence at all. If you have a very specific target platform in mind or need to support a wide variety of platforms chances are high that C# is not an option. This is a good example of project constraints mentioned above. Make sure to get those right before going into details.

Because my post turned out much longer than envisioned I’m going to break the actual comparison up into multiple parts and link them here.

1. Memory management
2. Type system
3. Polymorphism
4. Syntax

# Open development

I am very lucky to be working for Unknown Worlds Entertainment on their upcoming underwater exploration game Subnautica. Lucky because of many reasons. For starters Subnautica is an exceptional game. Like traditional high quality games it features gorgeous graphics and a masterfully crafted world. Unlike traditional games it does not involve shooting things and killing stuff though. It is very peaceful. I am excited about that! Moreover I am lucky because my colleagues are amazing folks and I can work remotely from anywhere in the world.

We employ a pretty unconventional style of development where all information, all tasks, all work in progress is public. Fellow game developer Tim Keenan coined the term naked development for this. The game is far from finished. In fact there is hardly any game in the game yet. Despite that everyone can have a look at our Subnautica Trello board and see what we are doing. Furthermore by the end of October interested players will be able to access the in-development version of the game. That is a scary thought! But I am sure we will get lots of valuable feedback that will help us to make this game great.

# 3D Space Shooter

Years ago, around 2005 perhaps, I created a 3D space shooter prototype from scratch using only C++ and OpenGL. I did the physics myself and built an optimized collision detection system featuring spatial tessellation using Delaunay triangulation and Minkowski hull extrusion. All models and textures were created in Maya by the amazing Warby. To get those assets in the game I wrote a little exporter using the Maya API. Without further ado here are some screenshots.

Note that the game runs at ~200fps on an old Pentium 4 with integrated IBM graphics.

# Knowledge Age

If you run a business in the knowledge age then there is a good chance that it involves some kind of development. That’s were engineers come into play. You can say engineering is about applying scientific knowledge and ingenuity to develop technical solutions. But because there’s always more than one solution to a problem it’s also a lot about making good decisions. Decisions that will affect the future of your business!

# Decisions

In the presence of multiple possible solutions to a given problem the key to making good decisions is knowing their respective strengths and weaknesses. Oftentimes there is a tradeoff. Therefore it’s a good idea to evaluate the available options according to your individual business requirements. So make sure to know your requirements!

Also note that these decisions might occur on multiple levels of abstraction. This is especially true for software development. Strategic decisions like what target platforms to support, what programming language to use, and what tools to employ are usually made by the management. On the other hand most low level decisions like what algorithms to apply and how to implement them are usually left to the programmer. Inbetween senior engineers decide about frameworks, architecture and data structures — the core of your product.

# Implications

Every decision has far reaching consequences on the levels below it. They force constraints and limitations on the available options further down. That is why you want your high level decisions to be made early and to stay stable. Also make sure they are good. Otherwise you would be wasting time, money, and lots of effort.

Mid level decisions tend to have the strongest implications on the ability of the product to actually meet the business requirements. This is where fulfillment is created within time and budget — or where it fails. This is also where the ability to react to ever changing requirements is determined. Like with a huge building you want your architecture to be clean and stable, yet flexible enough to cope with unforeseen demands. A rigid architecture will soon become messy, and a messy one will soon become unmanageable. So well engineered architecture and data structures are key to retaining agility.

# Knowledge

Ingenuity and scientific knowledge is necessary to come up with possible solutions to existing problems. And by now it should be clear that a deep understanding of the available options and their respective implications is key to making a good decision. All of this highlights the importance of knowledge.

Engineering knowledge can be obtained by higher education, professional training, and experience. Your business will want to acquire, build, and retain it, because loosing knowledge means reduced agility, going into dead ends, and ultimately loosing the ability to fulfill requirements!

# Conclusion

• Acquire knowledge by hiring highly trained and experienced people.
• Evaluate each possible solution regarding individual business needs.
• Document each decision, its rationale, and its implications. It seems like a lot of work but it’s pays off. Omit documenting the implementation!
• Build additional knowledge through reflection and team training.
• Keep knowledge by retaining your team. It’s good for your business!

I hope all of this makes sense to you. It might even be obvious to you. I totally agree, it should be obvious. But it’s a lot easier said than done and I’ve seen noble goals go overboard as soon as time is short. Resist the temptation of shortsighted decisions!

# How to procedurally generate Zelda dungeons

Yesterday I gave a microtalk at one of the incredible Unknown Worlds Entertainment‘s postmortem get-togethers. The talk is appropriately titled How to procedurally generate Zelda dungeons. I hope to see the recordings online soon. Until then here are my slides and slide notes:

1. [Introduction]
2. Zelda dungeons are awesome because they combine fighting and puzzles. Let’s focus on the puzzles.
3. The dungeon puzzle is really just obstacles and collectable items to overcome the obstacles.
4. Example: You enter the dungeon in the southmost room. In the next room you’ll see four exits, one is blocked by a door with a keyhole. You’ll need to go east to find the key. You’ll also notice another door there that unlocks if you can light the torch next to it. To do this you must solve all of the west/north part of the dungeon. In the northwestmost room you’ll find the satchel with ember seeds. You’ll need to go all the way back to light the torch and continue eastwards to beat the dungeon’s boss. Note that exploring the dungeon is highly nonlinear and requires lots of backtracking.
5. The puzzle itself can be represented as a graph. I will explain how to generate graphs like this one. Being able to do so is exciting, because it solves a classic problem in game design, i.e.
6. The conflict between progammers and designers, generating content for novelty and replay value vs. creating content for specific purpose and artistic vision, unlimited new stories vs. one good story. You can have both. Generating complex dungeon puzzles is creating good new stories.
7. I’m using a genetic algorithm to grow these puzzle graphs. I’m guiding the evolution to favor purposive puzzles. The genetic algorithm needs to know the following three things:
8. Encoding. You can encode a graph into a genome by using a list of edges referring to their nodes by number. Obstacle edges need two nodes, item edges only one. Each edge becomes a gene.
9. Mutation. Meaningful operations on the genes are for example changing node numbers, items or obstacles. Operations on the genome include adding and removing genes, of course.
10. Evaluation. Decode the genome grown by the genetic algorithm into a graph. Run a brute force crawler on the graph. Define objectives like no dead ends, long shortest path, etc. to guide the evolution. Use multi-objective optimization.
11. [Result example]
12. [End]
13. [Contact information]

Note that these puzzles are not limited to games like Zelda. They can be found in pretty much any game involving puzzles and are also the heart of every classic adventure. Of course there are lots of details that still need to be explained, but these microtalks are limited to five minutes. If you are interested in the topic, please let me know by leaving a comment!

How to procedurally generate Zelda dungeons (PDF)

# Infinite Zelda project

Some years ago I started working on a project I call Infinite Zelda. It’s a lot like Infinite Mario Bros! just for The legend of Zelda. It’s basically a level generator. The goal is to create fun-to-play zelda-like dungeons.

A zelda-like dungeon consists of connected rooms with enemies, keys and doors, bombs and cracks, switches, gaps, one-way paths and what have you. The player has a wealth of items at his disposal to overcome the challenges of the dungeon, beat the boss and eventually rescue the princess. Yay!

Generating such a dungeon is difficult. Especially a fun-to-play one. Even if you leave out the story-telling. Let’s focus on the puzzle aspects. There are constraints that must be met to prevent player frustration:

1. The dungeon must be solvable. There must be a way.
2. The player must not get stuck. There must not be a total dead-end.
3. Every item and obstacle should be essential. You don’t want more keys than doors.

In a series of posts I’ll explain what it takes to generate such dungeons. I’ll break down the challenges in detail and present my approach. I’ll show how multiple layers of abstraction help modelling each part. Over the course it will become clear how all of this is related to genetic algorithms, abstract syntax tree interpreters and formal verification.