Internals

It follows the documentation related to the internals of cPMML. Here you can find all cPMML classes not exposed in the public API.

cPMML Driving Principles

Model Load and Model Scoring

cPMML design is split around two phases:

Model Loading

The XML content is loaded from disk to memory and the corresponding c++ object is built.

Model Scoring

The model object is evaluated against an user provided sample in order to produce a prediction.

Visiting the XML

cPMML defines one class for each PMML XML element. Once the XML is read, it is visited recursively and, for each XML node encountered, the corresponding object instance is constructed.

Most of the computation takes place in this phase as the idea is to precompute as much as possible in order to speed up scoring.

In general, any object part of a cpmml::Model is meant to be immutable. With the exception of the following classes:

  • Sample

  • Feature

  • Value

Indeed the previous may be used to store the user input for scoring. In this case, the values are changed rather than copied, to speed-up scoring.

Feature Fields

The PMML serialization of a model contains the definition of multiple fields which serve as input for the model. This fields can be defined in:

DataDictionary

It defines the fields provided as input from the user.

TransformationDictionary

It defines fields derived from the ones in DataDictionary by applying preprocessing transformations (standardization, discretization, etc.)

LocalTransformations

As TransformationDictionary but with local scope.

Output

It defines fields derived from the output of the model scoring by applying postprocessing transformations (probability computation, etc.)

MiningSchema

It just contains the reference to which fields, among the previously defined ones, are actually used by the model.

In cPMML all fields are indexed through the class Indexer. This is done to allow accessing the fields through an integer index in order to improve performance. A shared instance of Indexer is used to share the associations integer→fieldname.

Core

class BuiltInFunction

Class representing PMML built-in functions.

It defines a set of predefined functions implementing low-level operations. For instance:

  • arithmetic functions: min, max, sum, etc.

  • boolean functions: equal, isMissing, etc.

  • etc.

class DataType

Class representing PMML dataTypes.

For instance:

  • STRING

  • INTEGER

  • BOOLEAN

  • etc.

class DataField

Class representing PMML DataField.

It defines a feature available to the model, along with the values it can assume. The constraints on the admissible values are enforced through the class Predicate.

Subclassed by MiningField

class DataDictionary

Class representing PMML DataDictionary.

It is a collection of DataFields. See DataField.

class Indexer

Class used to index all features available in the PMML file.

In cPMML features are indexed to gain constant access time. This class keeps the associations name-index, along with the type of the feature.

class Closure

Class representing PMML Closure.

It describes the type of boundaries for a range of continous values. For instance:

  • closedClosed

  • closedOpen

  • etc.

class DagBuilder

Since there is no assumption on the ordering of DataFields and DerivedFields in a PMML document. This class is used to build a directed-acyclic-graph of the dependencies between fields.

The DAG is represented as a vector where the ordering represents the order through which Derived fields must be computed.

class DerivedField

Class representing a PMML DerivedField.

It defines a feature available to the model, which is obtained by transforming other features. The associated transformation is represented by the class Expression.

See

Expression.

class FieldUsageType

Class representing PMML FieldUsageType.

It defines the usage of a field in the mining schema.

class Header

Class representing PMML Header.

It contains some metadata for the model.

class InternalEvaluator

Class encapsulating the high level elements of the PMML document.:

It contains:

Subclassed by EnsembleEvaluator, RegressionEvaluator, TreeEvaluator

class InternalModel

Abstract class representing a PMML MODEL-ELEMENT.

Implementations of this class are the central element of cPMML and they are the ones actually producing the scoring.

It contains:

Through the method validate, the presence of all fields needed by the model is checked. This includes DerivedFields obtained through TransformationDictionary and also LocalTransformations.

Subclassed by EnsembleModel, RegressionModel, TreeModel

class InternalScore

Abstract class internally representing the prediction produced by the scoring process.

It contains both double and literal representations of the score, as well as the associated probabilities and all values produced by Output.

Subclassed by RegressionScore, TreeScore

class IntervalBuilder

Class used to build the constraints for continous features of the DataDictionary. See also Closure.

class InvalidValueTreatmentMethod

Class representing PMML INVALID-VALUE-TREATMENT-METHOD.

It defines how the model should behave when encountering a feature assuming a value not respecting the constraints defined in the DataDictionary.

class MiningField : public DataField

Class representing PMML MiningField.

It defines a field used by the model, along with its usage type and the behaviour in case of missing/invalid values. See also: DataDictionary, InvalidValueTreatmentMethod, MissingValueTreatmentMethod, OutlierTreatmentMethos, FieldUsageType.

class MiningFunction

Class representing PMML MiningFunction.

It defines which is the function pursued by the model.

For instance:

  • Classification

  • Regression

  • Clustering

  • etc.

class MiningSchema

Class Representing PMML MiningSchema.

It is a collection of MiningFields. See MiningField.

Notice that the link with the values defined in DataDictionary is performed by deriving MiningFields from DataFields.

Through the method validate, the presence of all fields needed by the model is checked.

class MissingValueTreatmentMethod

Class representing PMML MISSING-VALUE-TREATMENT-METHOD.

It defines how the model should behave when encountering a feature whose value is missing.

class ModelBuilder

Factory class to create InternalEvaluator objects.

class OpType

Class representing PMML opType.

For instance:

  • CATEGORICAL

  • ORDINAL

  • CONTINUOUS

class OutlierTreatmentMethod

Class representing PMML OUTLIER-TREATMENT-METHOD.

It defines how the model should behave when encountering a feature assuming a value considered as an outlier.

class Predicate

Class representing PMML Predicate.

It is used to enforce a constraint between values, between set of values or between recursively defined set of other constraints. See also PredicateOpType, TreeModel, DataDictionary.

class PredicateOpType

Class representing the operator for the condition enforced by a predicate.

For instance:

  • equal

  • notEqual

  • isIn

  • surrogate

  • etc.

class PredicateType

Class representing the type of predicate.

For instance:

  • SimplePredicate

  • SimpleSetPredicate

  • CompoundPredicate

  • True

  • False

See also PMML Predicate.

class PredicateBuilder

Factory class to create Predicate objects.

class Property

Class representing PMML Property.

It used to define the constraints for admissible values in the DataDictionary.

class Sample

Class used as an internal representation of the user provided sample.

It stores the values for each feature used by the model. The features are indexed thanks to Indexer class. This allows to reach constant access time.

class Feature

Class used as an internal representation of the user provided feature.

It allows to signal when the corresponding value is missing.

class Target

Class representing PMML Target.

It defines transformations for the raw output of a model.

class TransformationDictionary

Class representing PMML TransformationDictionary.

It is a collection of DerivedFields. See also DerivedField.

class string_view

An implementation of a non-owning reference to std::string.

class XmlNode

Non-owning wrapper of rapidxml::xml_node<> *, implementing some utility functions.

class Value

Internal representation of each value used by the model. For efficiency reasons every type of input value is converted into double.

Expression

class Expression

Abstract class representing PMML Expression.

Its implementations provide the transformations used by DerivedField.

Subclassed by Apply, Constant, Discretize, FieldRef, MapValues, NormContinuous, NormDiscrete, OutputExpression

class Apply : public Expression

Class representing PMML Apply.

It allows to perform user-defined transformations made of other PMML Expressions or PMML Built-in functions.

class Constant : public Expression

Class representing PMML Constant.

It provides a constant value for other PMML Expressions.

class Discretize : public Expression

Class representing PMML Discretize.

It performs the discretization of numerical input fields by mapping from continuous to discrete values using intervals.

class FieldRef : public Expression

Class representing PMML FieldRef.

It is used to simply reference (and rename) another DataField or DerivedField.

class SimpleFieldRef

Class used to simply reference (and rename) another DataField or DerivedField. It differs from FieldRef since it has local scope rather than global scope.

class MapValues : public Expression

Class representing PMML MapValues.

Through the use of a table (implemented with TreeTable) it maps discrete values to other discrete values.

class NormContinuous : public Expression

Class representing PMML NormContinuous.

It is used to normalize input fields through a piecewise linear interpolation.

class NormDiscrete : public Expression

Class representing PMML NormDiscrete.

It performs the encoding of STRING values into numerical values.

class ExpressionType

Class used to represents the different PMML-EXPRESSION types.

For instance:

class ExpressionBuilder

Factory class to create Expression objects.

Output

class OutputDictionary

Class representing PMML Output.

It is a collection of OutputFields. See OutputField.

Also in this case a DAG is built to keep track of dependencies between OutputFields. However this DAG is unrelated to the one built by DagBuilder since DerivedFields and OuputFields have different scope: the former deals with preprocessing of fields while the second deals with postprocessing of fields.

class OutputField

Class representing PMML OutputField.

It defines a which output features the model will produce from the raw prediction.

class OutputExpression : public Expression

Abstract class representing PMML RESULT-FEATURE.

Its implementations provide the transformations used by OutputField.

Subclassed by PredictedValue, Probability, TransformedValue

class OutputExpressionType

Class used to represents the different PMML RESULT-FEATURE types.

For instance:

  • predictedValue

  • transformedValue

  • probability

  • etc.

class OutputExpressionBuilder

Factory class to create OutputExpression objects.

class PredictedValue : public OutputExpression

Class representing PMML PredictedValue.

It is used to simply reference (and rename) the raw predicion value provided by the model.

class TransformedValue : public OutputExpression

Class representing PMML TransformedValue.

It allows to perform user-defined transformations made of other PMML Expressions or PMML Built-in functions.

class Probability : public OutputExpression

Class representing PMML Probability.

It is used to simply reference (and rename) the probabilities associated with the raw prediction value provided by the model.

Ensembles

class EnsembleEvaluator : public InternalEvaluator

Implementation of InternalEvaluator, it is used as a wrapper of EnsembleModel.

class EnsembleModel : public InternalModel

Implementation of InternalModel representing a PMML MiningModel.

Through this class are represented all ensemble models. For instance, the Random Forest Model or the Gradient Boosted Trees model. See also MultipleModelMethod.

class MultipleModelMethod

Class reresenting PMML MULTIPLE-MODEL-METHOD.

For instance:

  • majorityVote

  • weightedAverage

  • modelChain

class Segment

Class representing PMML Segment.

Inside of an ensemble of models, this is the wrapper for the single model object. See also EnsembleModel.

TreeModel

class TreeEvaluator : public InternalEvaluator

Implementation of InternalEvaluator, it is used as a wrapper of TreeModel.

class TreeModel : public InternalModel

Implementation of InternalModel representing a PMML TreeModel.

Through this class are represented Decision Tree models, both for classification and regression.

class TreeScore : public InternalScore

Implementation of InternalScore for TreeModel objects.

class Node

Class representing PMML Node.

It is a node of the decision tree, containing a Predicate and a TreeScore. The score represents the prediction associated to a sample matching the predicate.

class ScoreDistribution

Class representing PMML ScoreDistribution.

It contains additional information for the score associated to a node. For instance:

RegressionModel

class RegressionEvaluator : public InternalEvaluator

Implementation of InternalEvaluator, it is used as a wrapper of RegressionModel.

class RegressionModel : public InternalModel

Implementation of InternalModel representing a PMML RegressionModel.

Through this class are represented models such as Linear Regression Model, Logistic Regression Model, etc.

class RegressionTable

Class representing PMML Regression Table.

It is the class wrapping all predictors for the RegressionModel: NumericPredictor, CategoricalPredictor and PredictorTerm. For predicting a numerical value RegressionModel contains just one RegressionTable. For predicting a categorical value, it contains one RegressionTable for each category involved.

class NumericPredictor

Class representing PMML NumericPredictor.

It contains the parameters (coefficients and exponents) needed to compute the regressed value of a continuous variable.

class CategoricalPredictor

Class representing PMML CategoricalPredictor.

It contains the parameters (coefficients and exponents) needed to compute the regression score for a single category among the ones involved in the prediction of a categorical variable.

class PredictorTerm

Class representing PMML PredictorTerm.

It contains references to other fields in the PMML, which are combined by multiplication.

class SingleNormalizationMethodBuilder

Factory class building the normalization method for a RegressionModel predicting a continuous variable.

class MultiNormalizationMethodBuilder

Factory class building the normalization method for a RegressionModel predicting a categorical variable.

class NormalizationMethodType

Class representing PMML REGRESSIONNORMALIZATIONMETHOD.

For instance:

  • SIMPLEMAX

  • SOFTMAX

  • CAUCHIT

See also NormalizationMethods.

class RegressionScore : public InternalScore

Implementation of InternalScore for regression models.

Math

group GenericMathFunctions

Functions

double closest0or1(const double &value)
double probit(const double a)
double logit(const double a)
double _exp(const double a)
double cloglog(const double a)
double loglog(const double a)
double cauchit(const double a)
double _round(const double a)
double _floor(const double a)
double _ceil(const double a)
double _identity(const double a)
group NormalizationMethods

Implementation of the various normalization methods used in PMML. See PMML Normalization Methods.

Functions

std::vector<double> categorical_softmax(const std::vector<double> &values)
std::vector<double> categorical_simplemax(const std::vector<double> &values)
std::vector<double> categorical_none(const std::vector<double> &values)
std::vector<double> categorical_base(const std::vector<double> &values, std::function<double(double)> function, const std::string &function_name, )
std::vector<double> categorical_logit(const std::vector<double> &values)
std::vector<double> categorical_probit(const std::vector<double> &values)
std::vector<double> categorical_cloglog(const std::vector<double> &values)
std::vector<double> categorical_loglog(const std::vector<double> &values)
std::vector<double> categorical_cauchit(const std::vector<double> &values)
std::vector<double> ordinal_base(const std::vector<double> &values, std::function<double(double)> function)
std::vector<double> ordinal_logit(const std::vector<double> &values)
std::vector<double> ordinal_probit(const std::vector<double> &values)
std::vector<double> ordinal_exp(const std::vector<double> &values)
std::vector<double> ordinal_cloglog(const std::vector<double> &values)
std::vector<double> ordinal_loglog(const std::vector<double> &values)
std::vector<double> ordinal_cauchit(const std::vector<double> &values)
std::vector<double> ordinal_none(const std::vector<double> &values)
double single_logit(const double &a)
double single_softmax(const double &a)
double single_exp(const double &a)
double single_probit(const double &a)
double single_cloglog(const double &a)
double single_loglog(const double &a)
double single_cauchit(const double &a)
double single_none(const double &a)

TreeTable

template<class K, class V, class H>
class TreeTable

Tables are an important part of PMML standard. For instance they are the main building block for MapValues transformations.

In order to reach max efficiency, this class implements a table with access cost linear in the number of columns.

In other words, given a table with m rows and n columns, this class has access cost of O(n).

template<class K, class V, class H>
class TreeTableNode

Basic building block of a TreeTable, it can be seen as a column of the table.

Utils

group Utils

Various utility functions, used also during model scoring.

Functions

std::string to_lower(std::string value)
static double to_double(const std::string &value)
template<class T>
T parse_string(const std::string &value)
template<class T>
std::string to_string(const T &value)
template<class T>
std::string mkstring(const std::vector<T> &values, const std::string &separator = ",")
template<class T>
std::string mkstring(const std::set<T> &values, const std::string &separator = ",")
template<class T, class H>
std::string mkstring(const std::unordered_set<T, H> &values, const std::string &separator = ",")
template<class K, class V, class H>
std::string to_string(const std::unordered_map<K, V, H> &values, const std::string &separator = ",")
template<class T, class U>
std::string mkstring(const std::vector<std::pair<T, U>> &values, const std::string &separator = ",")
template<class T, class V>
std::string to_string(const std::pair<T, V> &value)
template<class K, class V>
std::vector<V> to_values(const std::unordered_map<K, V> &values)
template<class K, class V>
std::vector<V> to_values(const std::map<K, V> &values)
template<class K, class V>
std::vector<K> to_keys(const std::unordered_map<K, V> &values)
std::vector<std::string> split(std::string value, const std::string &separator)
std::string remove_all(std::string value, char to_remove)
bool parse_boolstring(const std::string &value)
double double_min()
static std::string &ltrim(std::string &s)
static std::string &rtrim(std::string &s)
static std::string &trim(std::string &s)
static std::vector<char> read_xml(const std::string &filepath)
static void mz_reader_error(mz_zip_archive *zip_archive, const std::string &message)
static std::vector<char> read_zip(const std::string &filepath)
static bool file_exists(const std::string &name)
static std::vector<char> read_file(const std::string &filepath, const bool zipped)
template<class T>
std::string format_num(const T &value)
std::string format_int(const int &value)
template<typename T, typename ...Args>
std::unique_ptr<T> make_unique(Args&&... args)
class CSVReader

Class implementing a simple csv file reader.

Though it is used for benchmarking and testing of cPMML, it is not part of its core or of its API, since it’s best-effort and not directly involved in model scoring.