Jonny's Blog

Better Than Entropy Compression... Just Joking!

I’ve been working on a long-term project to do with sending messages by encoding the data in time, it’s a free quantity so use it instead of something physical like electrons! What do I mean by “encoding in time” - well here’s a little trick you can try with your kids. You can communicate a number with two blinks - the first blink starts the clock, the second stops the clock. The number communicated is the the interval span between the blinks. So two bits communicate an arbitrary integer - as long as you have accurate clocks. Sub-entropy compression …. Nature beckons …. (I kid :))

There are two problems with getting this system working. The first is to figure out how to send more than one number. The second is how to work with inaccurate clocks. Lets look at these individually.

How to send more than one number

So you send a bit, leave the channel open leaving a gap, then send another bit. What to do with the gap? Clearly there’s scope to send more than one integer in the gap, the question is how to do this at all, and then efficiently. Remember, the hope is to generate some form of compression.

I’ve got two schemes which rely on self-delimiting codes to record indexes. The self-delimiting code I used is Elias coding . To find out about these read the section on Integer Codes in the fantastic Information Theory Textbook by David Mackay.

Scheme 1

My first effort at coding scheme algorithm,

1.  For a list of integers say [5,4,7,8]
2.  Index the list using (index,value) [(1,5),(2,4),(3,7),(4,8)]
3,  Sort the list in descending order [(4,8),(3,7),(1,5),(2,4)]
4.  For each tuple write the index twice with a gap of value between them 
in a self-delimiting way at the next available spot,
5.  Result: (4,_,3,_,_,1,_,_,4,3,1)   

Here’s the pythons to encode and decode this, I using self-delimiting codes so I don’t have to worry about decoding:

“bloated time encode and decode in Python”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def encode(inList):
    """ take a list a turn it into a simulated time code.
        The _ charecter indicates an empty field. 
         in: a native list of numeric values
         out: a native list time coded """
    #create empty list of _
    outList = ["_" for _ in range(110)]
    #generate a tuple (position, value)
    pvList = [ (i,j) for i,j in enumerate(inList)]
    #sort based on value
    pvList = sorted(pvList, key=lambda x: x[1] ,reverse=True)
    #for each tuple
    for pos,val in pvList:
        #repeat until start and end ok and not at end
        start = 0
        end = start + val
        while outList[start] != "_" or outList[end]!= "_":
            # move forward 1
            start += 1
            end   += 1
            #if end point is at the end
                #extend array by x%
            if end == len(outList):
                outList.append("_")
        #write position and value
        outList[start] = pos
        outList[end]   = pos
    return outList

#decode into normal array
def decode(inList):
    """ decode a free time code. perform a decode from
        input: native array free time coded
        output: native array decode array"""
    outList = []
    #for each value in list
    for n,i in enumerate(inList):
        #if valid value
        if i != "_":
            value = 1
            while inList[n+value]!= i :
                value += 1
            inList[n]="_"
            inList[n+value]="_"
            outList.append((i,value))
            #find end and hence position,value
            #add to list
    outList = sorted(outList, key=lambda x: x[0])
    #sort list based on position
    outList = [ value for _,value in outList]
    #pull off #outlist
    return outList

Scheme 2

Another go - trying to remove the start bits, let everything start at 0:

1.  For a list of integers, say [5,4,7,8]
2.  Index the list using (index,value) [(1,5),(2,4),(3,7),(4,8)]
3.  For each tuple write the index once in a self-delimiting way , value distance from the start
4.  Result (_,_,_,2,1,_,3,4]

Orginally, I thought this was a really clever variation on scheme 1, then I realised it’s just a histogram!!

Here’s the code:

“less bloated time encode and decode in Python”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def encode2(inList):
    """encode code without the starting bit
       no need to sort just label and write to stream
       care needs to be taken with duplicate entries for
       the time being hold them in a tuple"""
    #setup a blank list max length
    outList = (max(inList)+1) * [0]
    #for each item in the stream
    for i,j in enumerate(inList):
        i = i+1
        if outList[j] =="_":
            outList[j] = i
        elif type(outList[j])==type(0):
            outList[j] = [outList[j],i]
        elif type(outList[j])== type([]):
            outList[j].append(i)
    return outList
    #write the position to the length value in stream
    #if collision hold in tuple


def decode2(inList):
    """decode streamline codes, take a stream as input
       and writes a new stream """
    #setup empty array
    maxi = 0
    for i in inList:
        if i !="_":
            if type(i)==type([]):
                if max(i) > maxi:
                    maxi=max(i)
            elif type(i)== type(0):
                if i > maxi:
                    maxi = i

    outList= (maxi+1) * ["_"]
    #foreach value in in list
    for i,j in enumerate(inList):
        #if int write position value to held value in out list
        if type(j)==type(0):
            outList[j] = i
        if type(j)== type([]):
        #if array then
            for k in j:
            #for each value in array
                outList[k]=i
                #write position value to held value in outlist
    return outList

Comparing schemes against each other it’s 8,11 to scheme two, So how am I doing when compared to real-world compression? … to be continued

Comments