Getting started

[1]:
import bison

We create some random data using basic types

[2]:
alist = [3,4,5]
atuple = (5,6,7)
astr = 'hello there'
alist2 = [5+1.2j, 6, .45687]

We save all this data together in a single file using bison and check it

[3]:
bison.save('testfile.dat', alist, atuple, astr, alist2)
[Bison] : Written 0.000639915 MB at 0.977066 MB/s
[4]:
!ls -al testfile.dat
-rw-r--r--  1 mbruno  staff  671 Nov 23 17:19 testfile.dat

Let’s reload the data from the saved file.

[5]:
data = bison.load('testfile.dat')
[Bison] : Reading file testfile.dat
[Bison] : File created by mbruno at macthxbruno.local on Mon Nov 23 17:19:34 2020
[Bison] : Read 0.000639915 MB at 2.0135 MB/s

Note that data is a list whose elements correspond to the elements originally passed to bison.save. The order is preserved!

[6]:
print(data)
[[3, 4, 5], (5, 6, 7), 'hello there', [(5+1.2j), 6, 0.45687]]

numpy arrays, where the fun begins

The package is designed to optimize IO performances with large datasets, stored as numpy.array. Note that only the content of numpy arrays is stored in binary format.

[7]:
import numpy

# a simple array of integers
arr1 = numpy.arange(450)

# we change the format
arr2 = numpy.arange(1000,9000,2).astype('i4')

# complex and arbitrary shape
Na = 256
tmp = numpy.random.rand(Na*2*1000)
arr3 = tmp[0::2] + complex(0.,1.)*tmp[1::2]
arr3 = numpy.reshape(arr3, (Na,1000))

bison.save('arrays',arr1,arr2,arr3)
[Bison] : Written 3.92557 MB at 2325.57 MB/s
[8]:
res = bison.load('arrays')
[Bison] : Reading file arrays
[Bison] : File created by mbruno at macthxbruno.local on Mon Nov 23 17:19:34 2020
[Bison] : Read 3.92557 MB at 1567.5 MB/s

Note how the dtype of numpy arrays is preserved, as well as their shape.

[9]:
print(res[1].dtype)
print(res[2].shape, res[2].dtype)
int32
(256, 1000) complex128

tag fields via dictionaries

When we use bison.save the name of the field is lost and the user has to remember the order in which the various arguments where passed. Dictionaries are the elegant way to circumvent this problem

[10]:
d = {}
d['myarray1'] = arr1
d['mycomplexarray'] = arr3
d['alist'] = alist
d['nested'] = {'atuple': atuple, 'myarr2': arr2}

bison.save('arrays_with_dict',d)
[Bison] : Written 3.92599 MB at 1461.12 MB/s
[11]:
res = bison.load('arrays_with_dict')
[Bison] : Reading file arrays_with_dict
[Bison] : File created by mbruno at macthxbruno.local on Mon Nov 23 17:19:34 2020
[Bison] : Read 3.92599 MB at 2309.51 MB/s
[12]:
print(res.keys())
dict_keys(['myarray1', 'mycomplexarray', 'alist', 'nested'])

Using dictionaries naturally allows hierarchical storage

[13]:
res['nested']['myarr2']
[13]:
array([1000, 1002, 1004, ..., 8994, 8996, 8998], dtype=int32)

custom classes

Clearly this library would be incomplete if it did not support user-defined classes!

[14]:
class A:
    def __init__(self,n,m):
        self.n = n*m
        self.numbers = [n,m]

    def __call__(self):
        print('numbers = ', self.numbers)
[15]:
aclass = A(45,6.789)
aclass()
numbers =  [45, 6.789]
[16]:
bison.save('classes',aclass)
[Bison] : Written 0.000396729 MB at 0.768591 MB/s

If we use bison.load in the simplest way the returned object is not the class A, but a dictionary with the fields of the class that we originally passed.

[17]:
res = bison.load('classes')
print('\nresult from reading === ',res)
[Bison] : Reading file classes
[Bison] : File created by mbruno at macthxbruno.local on Mon Nov 23 17:19:34 2020
[Bison] : Read 0.000396729 MB at 0.536774 MB/s

result from reading ===  {'____main__.A__': {'n': 305.505, 'numbers': [45, 6.789]}}

In order to interpret the dictionary as our class we need to instruct bison. To do so we simply need to construct a class that we will pass to bison.load as additional input!

[18]:
class decodeA:
    def __init__(self):
        self.type = '__main__.A' # here we must initialize the class with the module.name of the class

    def decode(self, obj):
        return A(obj['numbers'][0],obj['numbers'][1])

res2 = bison.load('classes', decoder=decodeA)
[Bison] : Reading file classes
[Bison] : File created by mbruno at macthxbruno.local on Mon Nov 23 17:19:34 2020
[Bison] : Read 0.000396729 MB at 0.584886 MB/s

If our decoder has been properly interpreted the dictionary then res2 should be a fully fledged class of type A including its methods. Let’s try

[19]:
res2()
numbers =  [45, 6.789]
[20]:
# final clean up
!rm testfile.dat arrays arrays_with_dict classes