NamedCollection – a Python container for Data Science

NamedCollection is a versatile container for Python mixing semantics of list, dictionary and NumPy array/Pandas dataframe in order to provide maximum flexibility for grouping and dispatching common work to selected subsets of variables.

Usage example:

from named_collection import NamedCollection as nc

data = nc(*((release, DatasetsDownload(data_release=release)) \
  for release in ['train', 'test']))
data = data.apply(lambda a: \
  a.get_data('baseline', 'labs', 'vitals'))
data = data.apply(baseline=clean_baseline, labs=clean_labs,
data = data.raw_apply(lambda a: \
  merge_data(a.baseline, a.labs, a.vitals))
train, test = train_test_split(data.train, test_size=.15)
data.train = nc( ('train', train), ('test', test) )
clf = make_clf(data.train.train)
pred = data.apply(test=lambda a: clf.pred(a))

Alternative construction:

from named_collection import from_weave as nc
x = nc('a', 1, 'b', 2, 'c', nc('d', 3, 'e', 4, 'f', 5))

Yet another alternative construction:

from named_collection import nc
nc = nc.from_dicts
x = nc({'a': 1}, {'b': 2}, {'c': nc({'d': 3}, {'e': 4}, {'f': 5})})

For download please head to GitHub:

Leave a Reply

Your email address will not be published. Required fields are marked *