Internals¶
It follows the documentation related to the internals of cPMML. Here you can find all cPMML classes not exposed in the public API.
cPMML Driving Principles¶
Model Load and Model Scoring¶
cPMML design is split around two phases:
- Model Loading
The XML content is loaded from disk to memory and the corresponding c++ object is built.
- Model Scoring
The model object is evaluated against an user provided sample in order to produce a prediction.
Visiting the XML¶
cPMML defines one class for each PMML XML element. Once the XML is read, it is visited recursively and, for each XML node encountered, the corresponding object instance is constructed.
Most of the computation takes place in this phase as the idea is to precompute as much as possible in order to speed up scoring.
In general, any object part of a cpmml::Model is meant to be immutable. With the exception of the following classes:
Sample
Feature
Value
Indeed the previous may be used to store the user input for scoring. In this case, the values are changed rather than copied, to speed-up scoring.
Feature Fields¶
The PMML serialization of a model contains the definition of multiple fields which serve as input for the model. This fields can be defined in:
- DataDictionary
It defines the fields provided as input from the user.
- TransformationDictionary
It defines fields derived from the ones in DataDictionary by applying preprocessing transformations (standardization, discretization, etc.)
- LocalTransformations
As TransformationDictionary but with local scope.
- Output
It defines fields derived from the output of the model scoring by applying postprocessing transformations (probability computation, etc.)
- MiningSchema
It just contains the reference to which fields, among the previously defined ones, are actually used by the model.
In cPMML all fields are indexed through the class Indexer. This is done to allow accessing the fields through an integer index in order to improve performance. A shared instance of Indexer is used to share the associations integer→fieldname.
Core¶
-
class
BuiltInFunction
¶ Class representing PMML built-in functions.
It defines a set of predefined functions implementing low-level operations. For instance:
arithmetic functions: min, max, sum, etc.
boolean functions: equal, isMissing, etc.
etc.
-
class
DataType
¶ Class representing PMML dataTypes.
For instance:
STRING
INTEGER
BOOLEAN
etc.
-
class
DataField
¶ Class representing PMML DataField.
It defines a feature available to the model, along with the values it can assume. The constraints on the admissible values are enforced through the class Predicate.
Subclassed by MiningField
-
class
DataDictionary
¶ Class representing PMML DataDictionary.
It is a collection of DataFields. See DataField.
-
class
Indexer
¶ Class used to index all features available in the PMML file.
In cPMML features are indexed to gain constant access time. This class keeps the associations name-index, along with the type of the feature.
-
class
Closure
¶ Class representing PMML Closure.
It describes the type of boundaries for a range of continous values. For instance:
closedClosed
closedOpen
etc.
-
class
DagBuilder
¶ Since there is no assumption on the ordering of DataFields and DerivedFields in a PMML document. This class is used to build a directed-acyclic-graph of the dependencies between fields.
The DAG is represented as a vector where the ordering represents the order through which Derived fields must be computed.
-
class
DerivedField
¶ Class representing a PMML DerivedField.
It defines a feature available to the model, which is obtained by transforming other features. The associated transformation is represented by the class Expression.
- See
-
class
FieldUsageType
¶ Class representing PMML FieldUsageType.
It defines the usage of a field in the mining schema.
-
class
Header
¶ Class representing PMML Header.
It contains some metadata for the model.
-
class
InternalEvaluator
¶ Class encapsulating the high level elements of the PMML document.:
It contains:
MODEL-ELEMENT
Subclassed by EnsembleEvaluator, RegressionEvaluator, TreeEvaluator
-
class
InternalModel
¶ Abstract class representing a PMML MODEL-ELEMENT.
Implementations of this class are the central element of cPMML and they are the ones actually producing the scoring.
It contains:
LocalTransformations
Output
Through the method validate, the presence of all fields needed by the model is checked. This includes DerivedFields obtained through TransformationDictionary and also LocalTransformations.
Subclassed by EnsembleModel, RegressionModel, TreeModel
-
class
InternalScore
¶ Abstract class internally representing the prediction produced by the scoring process.
It contains both double and literal representations of the score, as well as the associated probabilities and all values produced by Output.
Subclassed by RegressionScore, TreeScore
-
class
IntervalBuilder
¶ Class used to build the constraints for continous features of the DataDictionary. See also Closure.
-
class
InvalidValueTreatmentMethod
¶ Class representing PMML INVALID-VALUE-TREATMENT-METHOD.
It defines how the model should behave when encountering a feature assuming a value not respecting the constraints defined in the DataDictionary.
-
class
MiningField
: public DataField¶ Class representing PMML MiningField.
It defines a field used by the model, along with its usage type and the behaviour in case of missing/invalid values. See also: DataDictionary, InvalidValueTreatmentMethod, MissingValueTreatmentMethod, OutlierTreatmentMethos, FieldUsageType.
-
class
MiningFunction
¶ Class representing PMML MiningFunction.
It defines which is the function pursued by the model.
For instance:
Classification
Regression
Clustering
etc.
-
class
MiningSchema
¶ Class Representing PMML MiningSchema.
It is a collection of MiningFields. See MiningField.
Notice that the link with the values defined in DataDictionary is performed by deriving MiningFields from DataFields.
Through the method validate, the presence of all fields needed by the model is checked.
-
class
MissingValueTreatmentMethod
¶ Class representing PMML MISSING-VALUE-TREATMENT-METHOD.
It defines how the model should behave when encountering a feature whose value is missing.
-
class
ModelBuilder
¶ Factory class to create InternalEvaluator objects.
-
class
OpType
¶ Class representing PMML opType.
For instance:
CATEGORICAL
ORDINAL
CONTINUOUS
-
class
OutlierTreatmentMethod
¶ Class representing PMML OUTLIER-TREATMENT-METHOD.
It defines how the model should behave when encountering a feature assuming a value considered as an outlier.
-
class
Predicate
¶ Class representing PMML Predicate.
It is used to enforce a constraint between values, between set of values or between recursively defined set of other constraints. See also PredicateOpType, TreeModel, DataDictionary.
-
class
PredicateOpType
¶ Class representing the operator for the condition enforced by a predicate.
For instance:
equal
notEqual
isIn
surrogate
etc.
-
class
PredicateType
¶ Class representing the type of predicate.
For instance:
SimplePredicate
SimpleSetPredicate
CompoundPredicate
True
False
See also PMML Predicate.
-
class
Property
¶ Class representing PMML Property.
It used to define the constraints for admissible values in the DataDictionary.
-
class
Sample
¶ Class used as an internal representation of the user provided sample.
It stores the values for each feature used by the model. The features are indexed thanks to Indexer class. This allows to reach constant access time.
-
class
Feature
¶ Class used as an internal representation of the user provided feature.
It allows to signal when the corresponding value is missing.
-
class
Target
¶ Class representing PMML Target.
It defines transformations for the raw output of a model.
-
class
TransformationDictionary
¶ Class representing PMML TransformationDictionary.
It is a collection of DerivedFields. See also DerivedField.
-
class
string_view
¶ An implementation of a non-owning reference to std::string.
-
class
XmlNode
¶ Non-owning wrapper of rapidxml::xml_node<> *, implementing some utility functions.
-
class
Value
¶ Internal representation of each value used by the model. For efficiency reasons every type of input value is converted into double.
Expression¶
-
class
Expression
¶ Abstract class representing PMML Expression.
Its implementations provide the transformations used by DerivedField.
Subclassed by Apply, Constant, Discretize, FieldRef, MapValues, NormContinuous, NormDiscrete, OutputExpression
-
class
Apply
: public Expression¶ Class representing PMML Apply.
It allows to perform user-defined transformations made of other PMML Expressions or PMML Built-in functions.
-
class
Constant
: public Expression¶ Class representing PMML Constant.
It provides a constant value for other PMML Expressions.
-
class
Discretize
: public Expression¶ Class representing PMML Discretize.
It performs the discretization of numerical input fields by mapping from continuous to discrete values using intervals.
-
class
FieldRef
: public Expression¶ Class representing PMML FieldRef.
It is used to simply reference (and rename) another DataField or DerivedField.
-
class
SimpleFieldRef
¶ Class used to simply reference (and rename) another DataField or DerivedField. It differs from FieldRef since it has local scope rather than global scope.
-
class
MapValues
: public Expression¶ Class representing PMML MapValues.
Through the use of a table (implemented with TreeTable) it maps discrete values to other discrete values.
-
class
NormContinuous
: public Expression¶ Class representing PMML NormContinuous.
It is used to normalize input fields through a piecewise linear interpolation.
-
class
NormDiscrete
: public Expression¶ Class representing PMML NormDiscrete.
It performs the encoding of STRING values into numerical values.
-
class
ExpressionType
¶ Class used to represents the different PMML-EXPRESSION types.
For instance:
-
class
ExpressionBuilder
¶ Factory class to create Expression objects.
Output¶
-
class
OutputDictionary
¶ Class representing PMML Output.
It is a collection of OutputFields. See OutputField.
Also in this case a DAG is built to keep track of dependencies between OutputFields. However this DAG is unrelated to the one built by DagBuilder since DerivedFields and OuputFields have different scope: the former deals with preprocessing of fields while the second deals with postprocessing of fields.
-
class
OutputField
¶ Class representing PMML OutputField.
It defines a which output features the model will produce from the raw prediction.
-
class
OutputExpression
: public Expression¶ Abstract class representing PMML RESULT-FEATURE.
Its implementations provide the transformations used by OutputField.
Subclassed by PredictedValue, Probability, TransformedValue
-
class
OutputExpressionType
¶ Class used to represents the different PMML RESULT-FEATURE types.
For instance:
predictedValue
transformedValue
probability
etc.
-
class
OutputExpressionBuilder
¶ Factory class to create OutputExpression objects.
-
class
PredictedValue
: public OutputExpression¶ Class representing PMML PredictedValue.
It is used to simply reference (and rename) the raw predicion value provided by the model.
-
class
TransformedValue
: public OutputExpression¶ Class representing PMML TransformedValue.
It allows to perform user-defined transformations made of other PMML Expressions or PMML Built-in functions.
-
class
Probability
: public OutputExpression¶ Class representing PMML Probability.
It is used to simply reference (and rename) the probabilities associated with the raw prediction value provided by the model.
Ensembles¶
-
class
EnsembleEvaluator
: public InternalEvaluator¶ Implementation of InternalEvaluator, it is used as a wrapper of EnsembleModel.
-
class
EnsembleModel
: public InternalModel¶ Implementation of InternalModel representing a PMML MiningModel.
Through this class are represented all ensemble models. For instance, the Random Forest Model or the Gradient Boosted Trees model. See also MultipleModelMethod.
-
class
MultipleModelMethod
¶ Class reresenting PMML MULTIPLE-MODEL-METHOD.
For instance:
majorityVote
weightedAverage
modelChain
-
class
Segment
¶ Class representing PMML Segment.
Inside of an ensemble of models, this is the wrapper for the single model object. See also EnsembleModel.
TreeModel¶
-
class
TreeEvaluator
: public InternalEvaluator¶ Implementation of InternalEvaluator, it is used as a wrapper of TreeModel.
-
class
TreeModel
: public InternalModel¶ Implementation of InternalModel representing a PMML TreeModel.
Through this class are represented Decision Tree models, both for classification and regression.
-
class
TreeScore
: public InternalScore¶ Implementation of InternalScore for TreeModel objects.
-
class
Node
¶ Class representing PMML Node.
It is a node of the decision tree, containing a Predicate and a TreeScore. The score represents the prediction associated to a sample matching the predicate.
-
class
ScoreDistribution
¶ Class representing PMML ScoreDistribution.
It contains additional information for the score associated to a node. For instance:
Probability distribution.
Confidence.
RegressionModel¶
-
class
RegressionEvaluator
: public InternalEvaluator¶ Implementation of InternalEvaluator, it is used as a wrapper of RegressionModel.
-
class
RegressionModel
: public InternalModel¶ Implementation of InternalModel representing a PMML RegressionModel.
Through this class are represented models such as Linear Regression Model, Logistic Regression Model, etc.
-
class
RegressionTable
¶ Class representing PMML Regression Table.
It is the class wrapping all predictors for the RegressionModel: NumericPredictor, CategoricalPredictor and PredictorTerm. For predicting a numerical value RegressionModel contains just one RegressionTable. For predicting a categorical value, it contains one RegressionTable for each category involved.
-
class
NumericPredictor
¶ Class representing PMML NumericPredictor.
It contains the parameters (coefficients and exponents) needed to compute the regressed value of a continuous variable.
-
class
CategoricalPredictor
¶ Class representing PMML CategoricalPredictor.
It contains the parameters (coefficients and exponents) needed to compute the regression score for a single category among the ones involved in the prediction of a categorical variable.
-
class
PredictorTerm
¶ Class representing PMML PredictorTerm.
It contains references to other fields in the PMML, which are combined by multiplication.
-
class
SingleNormalizationMethodBuilder
¶ Factory class building the normalization method for a RegressionModel predicting a continuous variable.
-
class
MultiNormalizationMethodBuilder
¶ Factory class building the normalization method for a RegressionModel predicting a categorical variable.
-
class
NormalizationMethodType
¶ Class representing PMML REGRESSIONNORMALIZATIONMETHOD.
For instance:
SIMPLEMAX
SOFTMAX
CAUCHIT
See also NormalizationMethods.
-
class
RegressionScore
: public InternalScore¶ Implementation of InternalScore for regression models.
Math¶
-
group
GenericMathFunctions
Functions
-
double
closest0or1
(const double &value)¶
-
double
probit
(const double a)¶
-
double
logit
(const double a)¶
-
double
_exp
(const double a)¶
-
double
cloglog
(const double a)¶
-
double
loglog
(const double a)¶
-
double
cauchit
(const double a)¶
-
double
_round
(const double a)¶
-
double
_floor
(const double a)¶
-
double
_ceil
(const double a)¶
-
double
_identity
(const double a)¶
-
double
-
group
NormalizationMethods
Implementation of the various normalization methods used in PMML. See PMML Normalization Methods.
Functions
-
std::vector<double>
categorical_softmax
(const std::vector<double> &values)¶
-
std::vector<double>
categorical_simplemax
(const std::vector<double> &values)¶
-
std::vector<double>
categorical_none
(const std::vector<double> &values)¶
-
std::vector<double>
categorical_base
(const std::vector<double> &values, std::function<double(double)> function, const std::string &function_name, )¶
-
std::vector<double>
categorical_logit
(const std::vector<double> &values)¶
-
std::vector<double>
categorical_probit
(const std::vector<double> &values)¶
-
std::vector<double>
categorical_cloglog
(const std::vector<double> &values)¶
-
std::vector<double>
categorical_loglog
(const std::vector<double> &values)¶
-
std::vector<double>
categorical_cauchit
(const std::vector<double> &values)¶
-
std::vector<double>
ordinal_base
(const std::vector<double> &values, std::function<double(double)> function)¶
-
std::vector<double>
ordinal_logit
(const std::vector<double> &values)¶
-
std::vector<double>
ordinal_probit
(const std::vector<double> &values)¶
-
std::vector<double>
ordinal_exp
(const std::vector<double> &values)¶
-
std::vector<double>
ordinal_cloglog
(const std::vector<double> &values)¶
-
std::vector<double>
ordinal_loglog
(const std::vector<double> &values)¶
-
std::vector<double>
ordinal_cauchit
(const std::vector<double> &values)¶
-
std::vector<double>
ordinal_none
(const std::vector<double> &values)¶
-
double
single_logit
(const double &a)¶
-
double
single_softmax
(const double &a)¶
-
double
single_exp
(const double &a)¶
-
double
single_probit
(const double &a)¶
-
double
single_cloglog
(const double &a)¶
-
double
single_loglog
(const double &a)¶
-
double
single_cauchit
(const double &a)¶
-
double
single_none
(const double &a)¶
-
std::vector<double>
TreeTable¶
-
template<class
K
, classV
, classH
>
classTreeTable
¶ Tables are an important part of PMML standard. For instance they are the main building block for MapValues transformations.
In order to reach max efficiency, this class implements a table with access cost linear in the number of columns.
In other words, given a table with m rows and n columns, this class has access cost of O(n).
Utils¶
-
group
Utils
Various utility functions, used also during model scoring.
Functions
-
std::string
to_lower
(std::string value)¶
-
static double
to_double
(const std::string &value)¶
-
template<class
T
>
std::stringmkstring
(const std::vector<T> &values, const std::string &separator = ",")¶
-
template<class
T
>
std::stringmkstring
(const std::set<T> &values, const std::string &separator = ",")¶
-
template<class
T
, classH
>
std::stringmkstring
(const std::unordered_set<T, H> &values, const std::string &separator = ",")¶
-
template<class
K
, classV
, classH
>
std::stringto_string
(const std::unordered_map<K, V, H> &values, const std::string &separator = ",")¶
-
template<class
T
, classU
>
std::stringmkstring
(const std::vector<std::pair<T, U>> &values, const std::string &separator = ",")¶
-
std::vector<std::string>
split
(std::string value, const std::string &separator)¶
-
std::string
remove_all
(std::string value, char to_remove)¶
-
bool
parse_boolstring
(const std::string &value)¶
-
double
double_min
()¶
-
static std::string &
ltrim
(std::string &s)¶
-
static std::string &
rtrim
(std::string &s)¶
-
static std::string &
trim
(std::string &s)¶
-
static std::vector<char>
read_xml
(const std::string &filepath)¶
-
static void
mz_reader_error
(mz_zip_archive *zip_archive, const std::string &message)¶
-
static std::vector<char>
read_zip
(const std::string &filepath)¶
-
static bool
file_exists
(const std::string &name)¶
-
static std::vector<char>
read_file
(const std::string &filepath, const bool zipped)¶
-
std::string
format_int
(const int &value)¶
-
std::string
-
class
CSVReader
¶ Class implementing a simple csv file reader.
Though it is used for benchmarking and testing of cPMML, it is not part of its core or of its API, since it’s best-effort and not directly involved in model scoring.