

BreastCancer(mlbench)                        R Documentation

_W_i_s_c_o_n_s_i_n _B_r_e_a_s_t _C_a_n_c_e_r _D_a_t_a_b_a_s_e

_D_e_s_c_r_i_p_t_i_o_n_:

     The objective is to identify each of a number of benign
     or malignant classes. Samples arrive periodically as
     Dr. Wolberg reports his clinical cases.  The database
     therefore reflects this chronological grouping of the
     data.  This grouping information appears immediately
     below, having been removed from the data itself.  Each
     variable except for the first was converted into 11
     primitive numerical attributes with values ranging from
     0 through 10.  There are 16 missing attribute values.
     See cited below for more details.

_U_s_a_g_e_:

     data(BreastCancer)

_F_o_r_m_a_t_:

     A data frame with 699 observations on 11 variables, one
     being a character variable, 9 being ordered or nominal,
     and 1 target class.

      [,1]      Id                  Sample code number
      [,2]      Cl.thickness        Clump Thickness
      [,3]      Cell.size           Uniformity of Cell Size
      [,4]      Cell.shape          Uniformity of Cell Shape
      [,5]      Marg.adhesion       Marginal Adhesion
      [,6]      Epith.c.size        Single Epithelial Cell Size
      [,7]      Bare.nuclei         Bare Nuclei
      [,8]      Bl.cromatin         Bland Chromatin
      [,9]      Normal.nucleoli     Normal Nucleoli
      [,10]     Mitoses             Mitoses
      [,11]     Class               Class

_S_o_u_r_c_e_:

        * Creator: Dr. WIlliam H. Wolberg (physician); Uni-
          versity of Wisconsin Hospital ;Madison; Wisconsin;
          USA

        * Donor: Olvi Mangasarian (mangasarian@cs.wisc.edu)

        * Received: David W. Aha (aha@cs.jhu.edu)

     These data have been taken from the UCI Repository Of
     Machine Learning Databases at

        * ftp.ics.uci.edu://pub/machine-learning-databases

        * http://www.ics.uci.edu/mlearn/MLRepository.html

     and were converted to R format by Evgenia.Dimitri-
     adou@ci.tuwien.ac.at.

_R_e_f_e_r_e_n_c_e_s_:

     1. Wolberg,W.H.,  Mangasarian,O.L. (1990). Multisurface
     method of pattern separation for medical diagnosis
     applied to breast cytology. In Proceedings of the
     National Academy of Sciences, 87, 9193-9196.
     - Size of data set: only 369 instances (at that point
     in time)
     - Collected classification results: 1 trial only
     - Two pairs of parallel hyperplanes were found to be
     consistent with 50% of the data
     - Accuracy on remaining 50% of dataset: 93.5%
     - Three pairs of parallel hyperplanes were found to be
     consistent with 67% of data
     - Accuracy on remaining 33% of dataset: 95.9%

     2. Zhang,J. (1992). Selecting typical instances in
     instance-based learning.  In Proceedings of the Ninth
     International Machine Learning Conference (pp.
     470-479).  Aberdeen, Scotland: Morgan Kaufmann.
     - Size of data set: only 369 instances (at that point
     in time)
     - Applied 4 instance-based learning algorithms
     - Collected classification results averaged over 10
     trials
     - Best accuracy result:
     - 1-nearest neighbor: 93.7%
     - trained on 200 instances, tested on the other 169
     - Also of interest:
     - Using only typical instances: 92.2% (storing only
     23.1 instances)
     - trained on 200 instances, tested on the other 169

