

Soybean(mlbench)                             R Documentation

_S_o_y_b_e_a_n _D_a_t_a_b_a_s_e

_D_e_s_c_r_i_p_t_i_o_n_:

     There are 19 classes, only the first 15 of which have
     been used in prior work.  The folklore seems to be that
     the last four classes are unjustified by the data since
     they have so few examples.  There are 35 categorical
     attributes, some nominal and some ordered.  The value
     ``dna'' means does not apply.  The values for
     attributes are encoded numerically, with the first
     value encoded as ``0,'' the second as ``1,'' and so
     forth.

_U_s_a_g_e_:

     data(Soybean)

_F_o_r_m_a_t_:

     A data frame with 683 observations on 36 variables.
     There are 35 categorical attributes, all numerical and
     a nominal denoting the class.

      [,1]      Class               the 19 classes
      [,2]      date                apr(0),may(1),june(2),july(3),aug(4),sept(5),oct(6).
      [,3]      plant.stand         normal(0),lt-normal(1).
      [,4]      precip              lt-norm(0),norm(1),gt-norm(2).
      [,5]      temp                lt-norm(0),norm(1),gt-norm(2).
      [,6]      hail                yes(0),no(1).
      [,7]      crop.hist           dif-lst-yr(0),s-l-y(1),s-l-2-y(2), s-l-7-y(3).
      [,8]      area.dam            scatter(0),low-area(1),upper-ar(2),whole-field(3).
      [,9]      sever               minor(0),pot-severe(1),severe(2).
      [,10]     seed.tmt            none(0),fungicide(1),other(2).
      [,11]     germ                90-100%(0),80-89%(1),lt-80%(2).
      [,12]     plant.growth        norm(0),abnorm(1).
      [,13]     leaves              norm(0),abnorm(1).
      [,14]     leaf.halo           absent(0),yellow-halos(1),no-yellow-halos(2).
      [,15]     leaf.marg           w-s-marg(0),no-w-s-marg(1),dna(2).
      [,16]     leaf.size           lt-1/8(0),gt-1/8(1),dna(2).
      [,17]     leaf.shread         absent(0),present(1).
      [,18]     leaf.malf           absent(0),present(1).
      [,19]     leaf.mild           absent(0),upper-surf(1),lower-surf(2).
      [,20]     stem                norm(0),abnorm(1).
      [,21]     lodging             yes(0),no(1).
      [,22]     stem.cankers        absent(0),below-soil(1),above-s(2),ab-sec-nde(3).
      [,23]     canker.lesion       dna(0),brown(1),dk-brown-blk(2),tan(3).
      [,24]     fruiting.bodies     absent(0),present(1).
      [,25]     ext.decay           absent(0),firm-and-dry(1),watery(2).
      [,26]     mycelium            absent(0),present(1).
      [,27]     int.discolor        none(0),brown(1),black(2).
      [,28]     sclerotia           absent(0),present(1).
      [,29]     fruit.pods          norm(0),diseased(1),few-present(2),dna(3).
      [,30]     fruit.spots         absent(0),col(1),br-w/blk-speck(2),distort(3),dna(4).
      [,31]     seed                norm(0),abnorm(1).
      [,32]     mold.growth         absent(0),present(1).
      [,33]     seed.discolor       absent(0),present(1).
      [,34]     seed.size           norm(0),lt-norm(1).
      [,35]     shriveling          absent(0),present(1).
      [,36]     roots               norm(0),rotted(1),galls-cysts(2).

_S_o_u_r_c_e_:

        * Source: R.S. Michalski and R.L. Chilausky "Learn-
          ing by Being Told and Learning from Examples: An
          Experimental Comparison of the Two Methods of
          Knowledge Acquisition in the Context of Developing
          an Expert System for Soybean Disease Diagnosis",
          International Journal of Policy Analysis and
          Information Systems, Vol. 4, No. 2, 1980.

        * Donor: Ming Tan & Jeff Schlimmer (Jeff.Schlim-
          mer%cs.cmu.edu)

     These data have been taken from the UCI Repository Of
     Machine Learning Databases at

        * ftp.ics.uci.edu://pub/machine-learning-databases

        * http://www.ics.uci.edu/mlearn/MLRepository.html

     and were converted to R format by Evgenia.Dimitri-
     adou@ci.tuwien.ac.at.

_R_e_f_e_r_e_n_c_e_s_:

     Tan, M., & Eshelman, L. (1988). Using weighted networks
     to represent classification knowledge in noisy domains.
     Proceedings of the Fifth International Conference on
     Machine Learning (pp. 121-134). Ann Arbor, Michigan:
     Morgan Kaufmann.  - IWN recorded a 97.1% classification
     accuracy - 290 training and 340 test instances

     Fisher,D.H. & Schlimmer,J.C. (1988). Concept Simplifi-
     cation and Predictive Accuracy. Proceedings of the
     Fifth International Conference on Machine Learning (pp.
     22-28). Ann Arbor, Michigan: Morgan Kaufmann.  - Notes
     why this database is highly predictable

