Algorithm provides the diagnosis under the following performances assessed over 210 test data :
This application is intended for pathologists who grade the 9 following features of “Fine Needle Aspiration” in accordance with Wisconsin Dataset's grading scale(1) so that to determine whether a breast mass is benign or malignant:
After rigorous tuning and training cycles of 8 models over 489 train data randomly selected from the available dataset, Random Forest algorithm turned out to be the best model. 98.1% of 210 test cases are correctly classified and 100% of the cancerous tumors are perfectly diagnosed by the algorithm.
In other words, the algorithm provides a classification error of 1.9% (+/-1.85) which is the percentage of incorrect predictions to the number of predictions made, moreover, the algorithm does not miss any cancerous tumor and then it is 100% sensitive to malignancy (recall score is 100%).
These performance results were obtained with 210 test data that have never been seen by the algorithm during its training step (143 benign tumors / 67 malignant tumors).
Test data is a sample of the available data that has been randomly selected and removed from the available data, such that it is not used during model selection or configuration.
Lastly, please note that tuning and training were performed with cross validation method (StratiFiedKFold).
1 Wolberg, W.H., & Mangasarian, O.L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, 87, 9193—9196. http://www.pnas.org/content/87/23/9193.full.pdf
Dr. WIlliam H. Wolberg (physician) University of Wisconsin Hospitals, Madison, Wisconsin, USA (1992-07-15). Breast Cancer Wisconsin (Original) Data Set.
In the early 1990’s, Professors William H. Wolberg and Olvi L. Mangasarian at the University of Wisconsin published a near 700-sample dataset of breast cancer masses.
These masses had been biopsied via fine needle aspirates.
Nine cytological characteristics of breast FNAs were valued on a scale of 1 to 10, with 1 being the closest to benign and 10 the most anaplastic.
This data was then published to the University of California Irvine’s Machine Learning Repository as public domain.
I am grateful for access to this data, as it provided my algorithm with training and testing data.
The data was also very appropriate for the classification task.
I would also like to acknowledge Brittany Wenger for her contribution in this domain.
In 2012, based on Wisconsin database, Brittany Wenger provided a service built on a neural nets
algorithm.
I thank Axel Tessier for his great contribution on the web user interface of my application and also for making a secured server available for this application.
Finally, I would like to thank my family for their continuous support throughout this project.
The contents of rai-light.com Site, such as text, graphics, images, figures and other material contained on rai-light.com Site are for informational purposes only. The content is not intended to a substitute for professional medical advice, diagnosis, or treatment. Always, seek the advice of your physician or other qualified health provider with any questions you may have regarding medical condition. Never disregard professional medical advice or delay in seeking it because of something you have read on the rai-light.com Site. Reliance on any information provided by rai-light.com is solely at your own risk.