Machine Learning is an essential task for any recognition problem, that
permits to train adaptable systems from examples. The structuration
of this knowledge as symbolic rules is the domain of grammatical
inference, where the group has been specially active:
Algorithms for building finite state automata, stochastic automata,
tree grammars, etc. from strings and trees and methods that learn to translate
between formal languages.
String edit distance
The edit distance is used to compute the similarity of a pair of
strings, as the number of operations for transforming one string to the
other. If this transformation is based on a random phenomenon and then on
an underlying probability
distribution, edit operations become random variables. Therefore, will be
the stochastic edit distance: a stochastic transduction
between the input and output alphabets.
This approach is useful to handle noise in sequences such as in grammar inference, where
is required either to remove or correct noisy data to avoid overfitting phenomena.
About above, are required to estimate the parameters of the stochastic transducer
with some approach.
Tree edit distance
The stochastic edit distance can be extended to tree comparison.
It is possible to model an edit distance as a
stochastic process and to use probabilistic methods to learn these costs.
To learn a generative model in the form of a joint distribution has an
advantage, because it provides an estimate of the
unknown joint density with a small variance. However, it also have a
drawback due to the estimate is biased because it depends on the
distribution of the node labels in the input trees. The second
approach is to learn a discriminative model in the form of a
conditional distribution, since it provides an unbiased estimate.
Nowadays, we are working into development a new forest-based framework in
order to allow that the above approaches can be extended.