Belle II Software development
PriorDataLoader Class Reference
Inheritance diagram for PriorDataLoader:

Public Member Functions

def __init__ (self, str path, str key, list particlelist, list labels)
 
def __getitem__ (self, index)
 
def __len__ (self)
 
torch.tensor get_split (self, float n_test=0.1)
 

Public Attributes

 x
 The tensor of features.
 
 y
 The tensor of labels.
 

Detailed Description

Dataloader for PID prior probability training.

Attributes:
    x (np.array): Array containing feature data with a second order combination of momentum, cos(theta) and transverse momentum.
    y (np.array): Array containing the label encoded PDG values.

Definition at line 26 of file priorDataLoaderAndModel.py.

Constructor & Destructor Documentation

◆ __init__()

def __init__ (   self,
str  path,
str  key,
list  particlelist,
list  labels 
)
Initialize the dataloader for PID prior training.

Parameters:
    path (str): Path to the root file containing the data.
    key (str): Key (i.e. path) of the tree within the root file.
    particlelist (list(int)): List of particle PDG values for which the model has to be trained.
    labels (str): Labels of pandas columns containing cos(theta), momentum and PDG values (in this order).

Definition at line 36 of file priorDataLoaderAndModel.py.

36 def __init__(self, path: str, key: str, particlelist: list, labels: list):
37 """
38 Initialize the dataloader for PID prior training.
39
40 Parameters:
41 path (str): Path to the root file containing the data.
42 key (str): Key (i.e. path) of the tree within the root file.
43 particlelist (list(int)): List of particle PDG values for which the model has to be trained.
44 labels (str): Labels of pandas columns containing cos(theta), momentum and PDG values (in this order).
45
46 """
47 data = ur.open(path)
48 data = data[key].pandas.df(labels)
49 df = data.dropna().reset_index(drop=True)
50 df.loc[:, labels[2]] = df.loc[:, labels[2]].abs()
51 droplist = np.setdiff1d(np.unique(df[labels[2]].values), particlelist)
52 for i in droplist:
53 df = df.drop(df.loc[df[labels[2]] == i].index).reset_index(drop=True)
54 x = df.values[:, 0:2]
55 x = np.hstack((x, (np.sin(np.arccos(x[:, 0])) * x[:, 1]).reshape(-1, 1)))
56 pol = PolynomialFeatures(2, include_bias=False)
57 x = pol.fit_transform(x)
58
59 self.x = x.astype("float32")
60 y = df.values[:, 2]
61 le = LabelEncoder()
62 y = le.fit_transform(y)
63
64 self.y = y.astype("int64")
65

Member Function Documentation

◆ __getitem__()

def __getitem__ (   self,
  index 
)
Function to get feature and label tensors at the given index location.

Parameters:
    index (int): The index of required tensors.

Returns:
    Tensors of features and labels at the given index.

Definition at line 66 of file priorDataLoaderAndModel.py.

66 def __getitem__(self, index):
67 """
68 Function to get feature and label tensors at the given index location.
69
70 Parameters:
71 index (int): The index of required tensors.
72
73 Returns:
74 Tensors of features and labels at the given index.
75 """
76 return [self.x[index], self.y[index]]
77

◆ __len__()

def __len__ (   self)
Function to obtain length of a tensor.

Parameters:
    None.

Returns:
    Number of feature sets.

Definition at line 78 of file priorDataLoaderAndModel.py.

78 def __len__(self):
79 """
80 Function to obtain length of a tensor.
81
82 Parameters:
83 None.
84
85 Returns:
86 Number of feature sets.
87 """
88 return len(self.x)
89

◆ get_split()

torch.tensor get_split (   self,
float   n_test = 0.1 
)
Split the input data into training and validation set.

Parameter:
    n_test (float): Ratio of number of particles to be taken in the validation set to that of training set.

Return:
    A randomly split data set with the ratio given by 'n_test'.

Definition at line 90 of file priorDataLoaderAndModel.py.

90 def get_split(self, n_test: float = 0.1) -> torch.tensor:
91 """
92 Split the input data into training and validation set.
93
94 Parameter:
95 n_test (float): Ratio of number of particles to be taken in the validation set to that of training set.
96
97 Return:
98 A randomly split data set with the ratio given by 'n_test'.
99 """
100 test_size = round(n_test * len(self.x))
101 train_size = len(self.x) - test_size
102 return random_split(self, [train_size, test_size])
103
104

Member Data Documentation

◆ x

x

The tensor of features.

Definition at line 59 of file priorDataLoaderAndModel.py.

◆ y

y

The tensor of labels.

Definition at line 64 of file priorDataLoaderAndModel.py.


The documentation for this class was generated from the following file: