In the last two days, I described on this page my efforts at cracking the Facebook Hacker Cup 2013 challenge – three separate puzzles. Today we will examine the final one: Find the Min. The actual problem is described below:

After sending smileys, John decided to play with arrays. Did you know that hackers enjoy playing with arrays? John has a zero-based index array, m, which contains n non-negative integers. However, only the first k values of the array are known to him, and he wants to figure out the rest.

John knows the following: for each index i, where k <= i < n, m[i] is the minimum non-negative integer which is *not* contained in the previous *k* values of m.

For example, if k = 3, n = 4 and the known values of m are [2, 3, 0], he can figure out that m[3] = 1.

John is very busy making the world more open and connected, as such, he doesn't have time to figure out the rest of the array. It is your task to help him.

Given the first k values of m, calculate the nth value of this array. (i.e. m[n - 1]).

Because the values of n and k can be very large, we use a pseudo-random number generator to calculate the first k values of m. Given positive integers a, b, c and r, the known values of m can be calculated as follows:
m[0] = a
m[i] = (b * m[i - 1] + c) % r, 0 < i < k

Input
 The first line contains an integer T (T <= 20), the number of test cases.
 This is followed by T test cases, consisting of 2 lines each.
 The first line of each test case contains 2 space separated integers, n, k (1 <= k <= 105, k < n <= 109).
 The second line of each test case contains 4 space separated integers a, b, c, r (0 <= a, b, c <= 109, 1 <= r <= 109).

Output
 For each test case, output a single line containing the case number and the nth element of m.

The example input is presented below:

5
97 39
34 37 656 97
186 75
68 16 539 186
137 49
48 17 461 137
98 59
6 30 524 98
46 18
7 11 9 46

And its corresponding output is given below:

Case #1: 8
Case #2: 38
Case #3: 41
Case #4: 40
Case #5: 12

The description of the challenge is not very good. I can tell because I solved the challenge and had a difficulty of understanding some of the things in the text that we’ll see later on. Basically, we need to read the input file two lines at a time; the first line constitutes the variables n and k, while the second line constitutes the variables a,b,c and r. We need to read all those numbers in the program and calculate some basic mathematics over it to get an array of numbers. At the end, we should display the last number in the array, which presents the actual result.

Solving the Challenge

I guess this challenge was the easiest among the three to solve; the main catch was understanding what the challenge was asking of us, but once we figure it out, writing the code is trivial. The first thing we need to figure out is the fact that the elements in the array are being calculated based on the value k, like this:

m[0] = a ; k=0

m[i] = (b * m[i - 1] + c) % r ; 0 < i < k

m[i] = min(m[i-k]…m[i-1]) ; k<=i<n

This is also the trick part to figure out because the challenge is not very clear about that. We can see that the first element of the array is always the number a, as it was read from the input file. The elements from the first element till the k’th element are calculated based on the following formula: m[i] = (b * m[i - 1] + c) % r. This is presented well in the text of the challenge. The tricky part is the understanding of the array elements, which have the index greater than k. The text of the challenge says the following, which describes how those elements should be calculated:

John knows the following: for each index i, where k <= i < n, m[i] is the minimum non-negative integer which is *not* contained in the previous *k* values of m.

This really isn’t a very detailed description of how the elements k<=i<n should be calculated. From the description above, it’s immediately evident that all those values should be calculated as follows:

min(m[k]…m[i-1])

But actually, this is wrong! We shouldn’t take whole range from m[k]…m[n-1] into consideration when calculating the next integer to append the array, but only the previous k characters! This is the important part that is left for the reader to figure out. I’m actually not very sure why this isn’t better presented; maybe it’s because the Facebook writers intentionally left some ambiguous sentences in there just to mess with the hackers solving the challenges. The actual formula should have been written like this:

Want to learn more?? The InfoSec Institute Ethical Hacking course goes in-depth into the techniques used by malicious, black hat hackers with attention getting lectures and hands-on lab exercises. While these hacking skills can be used for malicious purposes, this class teaches you how to use the same hacking techniques to perform a white-hat, ethical hack, on your organization. You leave with the ability to quantitatively assess and measure threats to information assets; and discover where your organization is most vulnerable to black hat hackers. Some features of this course include:

  • Dual Certification - CEH and CPT
  • 5 days of Intensive Hands-On Labs
  • Expert Instruction
  • CTF exercises in the evening
  • Most up-to-date proprietary courseware available

min(m[i-k]…m[i-1])

There’s not a big difference, I know, but this is what decides whether we’ll be able to successfully solve the problem or not.

The basic program that solves the problem above is the following:

#!/usr/bin/python

import sys

def calc_lower(m, b, c, r):
  return (b * m + c) % r

def calc_higher(m,j,k):
  for i in range(max(m)+1):
    if i not in m[j-k:j]:
       return i

  return max(m)+1

f = open(sys.argv[1], "r")
num = f.readline().strip()
count = 0
for i in xrange(1, int(num)+1):
  count += 1
  n,k     = f.readline().strip().split(' ')
  a,b,c,r = f.readline().strip().split(' ')

  # initialize i=0
  m = [int(a)]

  # calculate 0<i<k numbers
  for j in range(int(k))[1:]:
    m.append(calc_lower(m[j-1], int(b), int(c), int(r)))

  # calculate k<=i<n numbers
  for j in range(int(k), int(n)):
    m.append(calc_higher(m, int(j), int(k)))

  print "Case #%d: %d" % (count, m[-1])

f.close()

We’re reading the variables n,k,a,b,c,r into the program and using them to calculate the value we need. First we’re using the given formula to calculate the first k numbers and afterwards we’re also finding the minimum value of the last k elements in the array. The program works and I’ve tested it against the given input file and the output was correct. But here’s the catch: what if the program uses really high values of n,k,a,b,c,r? Well, the most important variables to be worrying about are n and k, which greatly add the cpu and memory usage of a program. Did I say this was the easiest of the three problems? Well, now I can bite myself in the tongue. It is very easy to write the program that can operate with small numbers, but if the n and k variables are really high, the program posted above will fail miserably. So in order to shorten the time period in which it calculates the right number m[n-1], we must optimize the program.

We’ll be operating on just one example. For clarity, we’ll be using the input file given below (this is the first case of the original input file):

1
97 39
34 37 656 97

If we print the first k values of the array m, we’ll get the following numbers:

[34, 71, 82, 4, 28, 43, 16, 84, 78, 50, 81, 64, 17, 24, 89, 69, 8, 79, 87, 92, 83, 41, 39, 62, 40, 2, 51, 21, 75, 36, 48, 7, 42, 76, 73, 59, 26, 66, 91]

There are exactly 39 elements in the array, which is exactly the number k (from the input file). From what we can understand now, this can’t really be optimized; the numbers must be calculated one way or the other, so it’s better to move on to the second part of the problem. We should now calculate the rest of the elements in the array, rather than wasting time, we’ll come back here and optimize further if necessary, but for now, let’s leave it as it is. Then let’s print every number returned by the calc_higher function. In our example, we get the following numbers:

0 1 3 5 4 6 9 10 11 12 13 14 15 16 17 18 19 8 20 22 23 24 25 27 28 29 2 30 21 31 32 33 7 34 35 36 37 26 38 39 0 1 3 5 4 6 9 10 11 12 13 14 15 16 17 18 19 8

The last number is 8, which is also the correct result of the first sample case, but let’s closely look at the elements returned. We can see that only the k+1 values are different, after that the values start to repeat themselves. So the number 0 it the first number and the number 39 is the last number of the k window. This looks like a good way to start optimizing our code. We can rewrite the program to only calculate the k+1 values and then just return the right number.

We’ve changed only the calculating of k+1 numbers after the initial k numbers, which can be seen in the code below:

  # calculate only the k+1 numbers
  for j in range(int(k), int(k)+int(k)+1):
    m.append(calc_higher(m, int(j), int(k)))

  # find the right index into the array
  index = int(k)+(int(n)%(int(k)+1))
  print "Case #%d: %d" % (count, m[index])

Basically, we changed that only the range till k…k*2+1 should be calculated, instead using the range from k…n, which was used before. By using this, we successfully solved the problem if the variable n is very big, because we traversed the problem from n to k. Thus, we’ve successfully solved the problem for big n numbers. But what about big k numbers? Those are still a problem, but we’re one step closer to the solution.

The next thing we can do is optimize is the calc_higher and calc_lower functions. We don’t really need the calc_lower function since we only have one line of code to execute, plus using the function adds an overhead we just can’t afford right now. Additionally, we also want to get rid of the calc_higher function, but we’ll do it somehow differently. Instead of moving the code, we’ll implement it with heap sorting algorithm. In Python we can use the heapq module, which can be found here: http://docs.python.org/3/library/heapq.html. By using the heap sort, the elements pushed to the queue are automatically sorted, so the minimum element is in the front of the queue and the maximum element is at the back of the queue. Since we need the numbers that are not contained in the previous k elements, we need to insert the opposite numbers in it. Thus, if one of the previous k elements in the array is 12, then we shouldn’t store the element 12 in our heap queue. Let’s take a look at the whole program now, which correctly solves the problem:

#!/usr/bin/python

import sys
from heapq import heappush, heappop
import itertools

f = open(sys.argv[1], "r")
num = f.readline().strip()
count = 0
for i in xrange(1, int(num)+1):
  count += 1
  nn,kk     = f.readline().strip().split(' ')
  aa,bb,cc,rr = f.readline().strip().split(' ')

  n = int(nn)
  k = int(kk)
  a = int(aa)
  b = int(bb)
  c = int(cc)
  r = int(rr)

  # initialize i=0
  m = [a]

  # calculate 0<i<k numbers
  for j in range(k)[1:]:
    m.append((b * m[j-1] + c) % r)

  # use heap
  heap = []

  # initialize the empty elements not already in m
  for j in range(k+1):
    if j not in m:
      heappush(heap, j)

  # calculate k<=i<n numbers
  for j in range(k, k+k+1):
    minn = heappop(heap)
    m.append(minn)

    if m[j-k] not in m[j-k+1:j]:
      heappush(heap, m[j-k])

  # find the right index into the array
  index = int(k)+(int(n)%(int(k)+1))
  print "Case #%d: %d" % (count, m[index])

f.close()

We can see that we don’t have any function right now and we’re using heap sort queue to keep the right element at the beginning of the queue, so we don’t need to search for it every time. The code is properly commented, so I won’t explain it in detail, but I’ll just say that at the beginning, I’m iterating over the elements and appending the elements not contained in the previously calculated k elements into the queue, which is exactly what I need later on. Then I’m calculating the rest of the elements by appending the minimum element from the heap sort queue to our array and pushing the element that falls just out of the k window to the heap queue (if it isn’t already contained in there).

The algorithm works as expected, but it can also be greatly optimized. Let’s present the results of the input file that was appended to the challenge, we can see that on the output below:

# python main.py input.txt
Case #1: 8
Case #2: 38
Case #3: 41
Case #4: 40
Case #5: 12

Conclusion

We can see that the challenge is way harder than the previous two challenges, even though at the beginning it doesn’t seem to be so. I’ve had the most fun with this challenge, even though I didn’t completely optimize it as much as I could have.