

LetterRecognition(mlbench)                   R Documentation

_L_e_t_t_e_r _I_m_a_g_e _R_e_c_o_g_n_i_t_i_o_n _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n_:

     The objective is to identify each of a large number of
     black-and-white rectangular pixel displays as one of
     the 26 capital letters in the English alphabet.  The
     character images were based on 20 different fonts and
     each letter within these 20 fonts was randomly dis-
     torted to produce a file of 20,000 unique stimuli.
     Each stimulus was converted into 16 primitive numerical
     attributes (statistical moments and edge counts) which
     were then scaled to fit into a range of integer values
     from 0 through 15.  We typically train on the first
     16000 items and then use the resulting model to predict
     the letter category for the remaining 4000.  See the
     article cited below for more details.

_U_s_a_g_e_:

     data(LetterRecognition)

_F_o_r_m_a_t_:

     A data frame with 20,000 observations on 17 variables,
     the first is a factor with levels A-Z, the remaining 16
     are numeric.

             [,1]             lettr      capital letter
             [,2]             x.box      horizontal position of box
             [,3]             y.box      vertical position of box
                [,4]          width      width of box
                    [,5]      high       height of box
                    [,6]      onpix      total number of on pixels
                   [,7]       x.bar      mean x of on pixels in box
               [,8]           y.bar      mean y of on pixels in box
               [,9]           x2bar      mean x variance
                   [,10]      y2bar      mean y variance
                    [,11]     xybar      mean x y correlation
                 [,12]        x2ybr      mean of x^2 y
                   [,13]      xy2br      mean of x y^2
                   [,14]      x.ege      mean edge count left to right
             [,15]            xegvy      correlation of x.ege with y
              [,16]           y.ege      mean edge count bottom to top
             [,17]            yegvx      correlation of y.ege with x

_S_o_u_r_c_e_:

        * Creator: David J. Slate

        * Odesta Corporation; 1890 Maple Ave; Suite 115;
          Evanston, IL 60201

        * Donor: David J. Slate (dave@math.nwu.edu) (708)
          491-3867

     These data have been taken from the UCI Repository Of
     Machine Learning Databases at

        * ftp.ics.uci.edu://pub/machine-learning-databases

        * http://www.ics.uci.edu/mlearn/MLRepository.html

     and were converted to R format by
     Friedrich.Leisch@ci.tuwien.ac.at.

_R_e_f_e_r_e_n_c_e_s_:

     P. W. Frey and D. J. Slate (Machine Learning Vol 6/2
     March 91): "Letter Recognition Using Holland-style
     Adaptive Classifiers".

     The research for this article investigated the ability
     of several variations of Holland-style adaptive classi-
     fier systems to learn to correctly guess the letter
     categories associated with vectors of 16 integer
     attributes extracted from raster scan images of the
     letters.  The best accuracy obtained was a little over
     80%.  It would be interesting to see how well other
     methods do with the same data.

