| Peer-Reviewed

Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants

Received: 5 May 2016     Accepted: 18 May 2016     Published: 7 June 2016
Views:       Downloads:
Abstract

Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-Nearest neighbours techniques and to compare their predictive accuracy. The analysis was first by training the dataset, where by 70% of the data was used for training and the remaining 30% was used for testing. Finally, the ability of the developed models to forecast trends was investigated. Here we assume that a cluster is homogeneous, if it contains members that have a high degree of similarity. The analysis is therefore based on credit data provided by commercial banks in Kenya used to test the effectiveness of cluster analysis, K-Nearest neighbour (K-NN) and artificial neural network (ANN) models. To determine the best model in classification accuracy, confusion matrix was used. To test for the goodness of fit the chi square test was used. From the results of the study, the researcher concluded that ANN was better in predicting the classification of credit applicants than K-NN and Cluster Analysis.

Published in American Journal of Theoretical and Applied Statistics (Volume 5, Issue 4)
DOI 10.11648/j.ajtas.20160504.14
Page(s) 186-191
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2016. Published by Science Publishing Group

Keywords

Cluster Analysis, ANN: Artificial Neural Network, K-NN: K-Nearest Neighbour, Credit Risk, Overall Accuracy Rate, SSE: Sum of Square Errors

References
[1] Abdou, H, J Pointon and A El-Masry (2007), ‘On the applicability of credit scoringmodels in Egyptian banks’, Banks Bank Syst 2 (1), 4–19.
[2] Bekhet, H and S Eletter (2012), ‘Credit risk management for the Jordanian commercial banks: a business intelligence approach’, Aust. J. Basic Appl. Sci 6 (18), 188–195.
[3] Boguslauskas, V and R Mileris (2009), ‘Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the italian experience)’, Economics of engineering decisions.
[4] Correa, A, A Gonzalez, C Nieto and D Amezquita (2012), Constructing a Credit Risk Scorecard using Predictive Clusters, SAS Global Forum.
[5] Durand, D (1941), Risk elements in consumer instalments financing, New York: national bureau of economic research.
[6] Enas, G G and S C Choi (1986), ‘Choice of the smoothing parameter and efficiency of k-nearest Neighbor classification’, Computers and Mathematics with Applications 12A (2), 235–244.
[7] Fisher, R A (1936), ‘The use of multiple measurement in taxonomic problems’, Annals ofEugenic 7, 179–188.
[8] Fix, E and J Hodges (1952), Discrimatory analysis; nonparametric discrimination: consistency properties, report 4, project 21-49-004 edn, us airforce school of aviation medicine, random Field.
[9] Glorfeld, LWand B C Hardgrave (1996), ‘an improved method for developing neural networks: the case of evaluating commercial loan credit worthiness’, Computers and Operations Research 23 (10), 933–944.
[10] Hand, D J and W E Henley (1996), ‘A k-nearest neighbour classifier for assessing consumer credit risk’, the statistician 45 (1), 77–95.
[11] Khashman, A (2010), ‘Neural network for credit risk evaluation: investigation of different neural Models and learning schemes.)’, Exp. Syst. Appl. 37 (9), 6233–6239.
[12] Oso, W Y and D Onen (2009), ‘A guide line to writing a research proposal and report’, A Handbook of Beginning Researchers.
Cite This Article
  • APA Style

    Mutua Jennifer Ndanu, Gichuhi Anthony Waititu, Wanjoya Anthony Kiberia, Muia Patricia Nthoki. (2016). Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants. American Journal of Theoretical and Applied Statistics, 5(4), 186-191. https://doi.org/10.11648/j.ajtas.20160504.14

    Copy | Download

    ACS Style

    Mutua Jennifer Ndanu; Gichuhi Anthony Waititu; Wanjoya Anthony Kiberia; Muia Patricia Nthoki. Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants. Am. J. Theor. Appl. Stat. 2016, 5(4), 186-191. doi: 10.11648/j.ajtas.20160504.14

    Copy | Download

    AMA Style

    Mutua Jennifer Ndanu, Gichuhi Anthony Waititu, Wanjoya Anthony Kiberia, Muia Patricia Nthoki. Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants. Am J Theor Appl Stat. 2016;5(4):186-191. doi: 10.11648/j.ajtas.20160504.14

    Copy | Download

  • @article{10.11648/j.ajtas.20160504.14,
      author = {Mutua Jennifer Ndanu and Gichuhi Anthony Waititu and Wanjoya Anthony Kiberia and Muia Patricia Nthoki},
      title = {Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {5},
      number = {4},
      pages = {186-191},
      doi = {10.11648/j.ajtas.20160504.14},
      url = {https://doi.org/10.11648/j.ajtas.20160504.14},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20160504.14},
      abstract = {Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-Nearest neighbours techniques and to compare their predictive accuracy. The analysis was first by training the dataset, where by 70% of the data was used for training and the remaining 30% was used for testing. Finally, the ability of the developed models to forecast trends was investigated. Here we assume that a cluster is homogeneous, if it contains members that have a high degree of similarity. The analysis is therefore based on credit data provided by commercial banks in Kenya used to test the effectiveness of cluster analysis, K-Nearest neighbour (K-NN) and artificial neural network (ANN) models. To determine the best model in classification accuracy, confusion matrix was used. To test for the goodness of fit the chi square test was used. From the results of the study, the researcher concluded that ANN was better in predicting the classification of credit applicants than K-NN and Cluster Analysis.},
     year = {2016}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Cluster Analysis, K-Nearest Neighbour and Artificial Neural Network Applied to Credit Data to Classify Credit Applicants
    AU  - Mutua Jennifer Ndanu
    AU  - Gichuhi Anthony Waititu
    AU  - Wanjoya Anthony Kiberia
    AU  - Muia Patricia Nthoki
    Y1  - 2016/06/07
    PY  - 2016
    N1  - https://doi.org/10.11648/j.ajtas.20160504.14
    DO  - 10.11648/j.ajtas.20160504.14
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 186
    EP  - 191
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20160504.14
    AB  - Potential risk on credit applicants is the probability of default on repayment of a credit facility rendered by a commercial bank. To improve efficiency in decision making on credit risk, therefore credit scoring models are developed. The objectives of this research areto classify credit applicants cluster analysis, Artificial Neural Network and K-Nearest neighbours techniques and to compare their predictive accuracy. The analysis was first by training the dataset, where by 70% of the data was used for training and the remaining 30% was used for testing. Finally, the ability of the developed models to forecast trends was investigated. Here we assume that a cluster is homogeneous, if it contains members that have a high degree of similarity. The analysis is therefore based on credit data provided by commercial banks in Kenya used to test the effectiveness of cluster analysis, K-Nearest neighbour (K-NN) and artificial neural network (ANN) models. To determine the best model in classification accuracy, confusion matrix was used. To test for the goodness of fit the chi square test was used. From the results of the study, the researcher concluded that ANN was better in predicting the classification of credit applicants than K-NN and Cluster Analysis.
    VL  - 5
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • Applied Statistics, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

  • Statistics, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

  • Statistics, Department of Statistics and Actuarial Science, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

  • Education, Department of Educational, Administration and Planning, University of Nairobi, Nairobi, Kenya

  • Sections