Comparative Study on performance of Fuzzy clustering algorithms on Liver and Thyroid Data

Conventional classification methods are difficult to analyze accurate diagnosis without ambiguities due to fast growth in technology. Since the states are vague in medicine comparative crisp ones the fuzzy methods are supportive. As fuzzy tools provide accurate results in various data sets, in this paper, we concentrate on fuzzy based clustering. In this work, a comparative study of these algorithms with Thyroid data set and liver disorder data set from the UCI repository is presented. Repository results were compared with these results. Based on the clustering output criteria the performance of these two algorithms is analyzed in terms of percentage of correctness and classification performance. The objective of this paper is to analyze the performance of two popular clustering algorithms FPCM and PFCM for thyroid data and liver data, and to prove that PFCM gives better performance than FPCM for Thyroid Samples and liver samples in terms of percentage of correctness and Classification performance.


Introduction
Clustering is one of the important phenomenons in soft computing which creates clusters of most identical featured objects in a group of data.A cluster of objects can be treated collectively as one group and so may be considered as form of data classification.Clustering data streams attracted many researches since the applications that generate data streams have become more popular.Clustering is also often called as http://www.ispacs.com/journals/jfsva/2018/jfsva-00395/International Scientific Publications and Consulting Services Classification.Clustering is an important tool in data analysis, image processing, data mining, pattern recognition, medical diagnosis and etc [1].In this paper, we used the Liver disorder data set donated by Richard [3] and Thyroid data set donated by Danny Coomans (Coomans et al, 1983).[4] from the UCI Machine Learning Repository.In several cases, usage of fuzzy tools provide the accurate results than conventional clustering algorithms, in this paper, we focus on fuzzy based clustering.Thyroid gland is one of the largest of endocrine gland, weighing 15-20 g in adults.Thyroid secretes two major hormones thyroxine and tri-idothyronine, commonly called t4 and t3 respectively.Thyroid secretion is controlled primarily by thyroid stimulating hormone [TSH] secreted by pituitary gland.Thyroid gland also secretes calcitonin a hormone involved in calcium metabolism.It consists of large number of closed follicles that are filled with a secretory substance called colloid and lined by cuboidal epithelium.Thyroid gland which is in a butterfly-shape is one of the largest endocrine glands located in the lower front of the neck [2].This gland produces thyroid hormones to regulate the body's metabolism and calcium balance.It helps to maintain the working conditions of brain, heart, muscles and other organs and helps body to stay warm and use energy.A symptom is a physical or laboratory finding that indicates the presence of a disease and hence can be considered as an aid in diagnosis.A cluster diagnosis one of the main tasks is grouping the symptoms to one syndrome.In this regard clustering analysis is well known as an effective and efficient tool in medicine.Liver is the largest organ in the body.It contributes about 2% of total body weight or 1.5 kg in the average adult human.The basic unit of liver is liver lobule.The human liver contains 50,000 to 1,00,000 individual lobules.The lobule consists of liver cellular plates that radiated from the central vein like spokes in a wheel.In between the lobules there is a portal triad; it consists of bile duct, hepatic artery, portal vein.Liver is the largest gland in the body, weighing about 1.5 kg in an adult.It consists of "lobes" which are subdivided in to lobules, which is the basic functional unit of liver.The liver lobules are made up of columns of hepatic cells.In between the liver lobules there is a portal triad.Portal triad consists of hepatic artery, portal vein and bile duct.Regeneration power of liver: Removal of 3/4 th of liver causes, within 6-8 weeks restoration of original liver mass by proliferation of remaining tissue as a result of active mitotic divisions of the cell.Due to the rapid growth in technology, conventional classification methods are quite difficult to analyze accurate diagnosis without ambiguities.Since the conditions are vague in medicine, fuzzy methods are supportive rather than crisp.The fuzzy cluster analysis is an iterative method.In this method memberships are assigned to the objects ranged between 0 and 1 by means of a membership function.This feature becomes a relative one and simultaneously more than one class or cluster can have the same object but with different degrees.These algorithms look for the cluster prototypes by optimizing the objective function (a function which is used to find the distance between the prototype and the object).

The Dataset
In data analysis clustering is a discipline devoted to investigating and describing the clusters with similar objects.The efficiency and robustness of clustering algorithms could be investigated by clustering output.The performance of clustering algorithms can be improved by defining suitable objective function.The algorithms FPCM and PFCM were developed by implementing memberships and introducing typicalities to improve the performance of FCM.In this section the brief details of data sets, liver disorder and Thyroid and the algorithms FPCM and PFCM are presented.2.1.The Dataset http://www.ispacs.com/journals/jfsva/2018/jfsva-00395/

International Scientific Publications and Consulting Services
To evaluate Fuzzy Possibilistic c-Mean (FPCM) and Possibilistic Fuzzy c-Mean (PFCM) algorithms, the real world data sets Liver disorder data set donated by Richard [3] and Thyroid data set donated by Danny Coomans (Coomans et al, 1983).[4] from the UCI Machine Learning Repository have been considered.The Thyroid gland dataset was gathered from the UCI Machine Learning Repository [6].The dataset contains 215 samples with 5 attributes or lab measurements each.The samples are classified into three different classes according to the Thyroid functions: Normal (150 samples), Hyperthyroid (35 samples) and Hypothyroid (30 samples).The 5 attributes are the lab tests to measure the thyroid function.These attributes are T3-resin uptake test (A percentage), value of total serum thyroxin given by the isotopic displacement method, total serum triiodothyronine value given by radioimmunoassay, value of basal thyroid stimulating hormone (TSH) given by radioimmunoassay and after injection of 200 micro grams of thyrotrophic-releasing hormone the maximal absolute difference of TSH value as compared to the basal value.Liver disorder data set contains 341 samples with 6 attributes each.These attributes are the measurements of the blood tests that are sensitive to liver disorders which might arise due to excessive alcohol consumption.These blood tests are mcv-mean corpuscular volume, alkphos-alkaline phosphotase, sgpt-alamine aminotransferase, sgot-aspartate aminotransferase, gammagt-gamma-glutamyl transpeptidase and drinksthe number of half-pint equivalents of alcoholic beverages drunk per day.

Fuzzy Possibilistic c-mean algorithm
Traditional clustering approaches the partition whereby each object can only belong to one cluster at any one time.Fuzzy clustering extends this notion to each object can belong to more than one cluster at a time with different membership values using a membership function.These membership values ranged from 0 to 1. FPCM was developed based on fuzzy theory by Pal and Bedzek [7].The concept of typicality and membership functions was introduced in FPCM model to overcome the drawbacks occurring in FCM model proposed by Bezdek et al. [5].The partition of the dataset Z into c clusters is represented by the fuzzy partition matrix N. The fuzzy partitioning space for Z is the set Here  = [ 1 ,  2 , … .  ] where   ℜ  denotes a vector of (unknown) cluster prototypes (centers) and the degree of fuzziness determined by a weighting parameter, (2.4) The algorithm is given by the following basic steps.http://www.ispacs.com/journals/jfsva/2018/jfsva-00395/International Scientific Publications and Consulting Services Step 1: Initialization: Randomly initialize partition matrix U, number of clusters c, weighting parameter m and  the termination tolerance ε > 0.
Step 2: Centroid calculation: Determine the fuzzy cluster prototypes by using the equation (2.3).
Step 3: Classification: update the membership matrix by using the equation (2.4) and the typically matrix by using the equation (2.5) Step 4: Convergence criteria: Compare the membership matrices of previous and after the iteration.If the comparison value is less than the termination tolerance, then stop else repeat from step 2

Possibilistic Fuzzy c-Mean Clustering
In order to achieve good clustering results the memberships and typicalities are both important.Nikhil et al. [7] proposed Possibilistic Fuzzy c-Mean (PFCM) model.In this proposed model the constraint in the FPCM model that the sum of the typicalities of all data points in a cluster is equal to 1 is relaxed and retains the constraint on memberships.
The basic steps of the PFCM algorithm are described as follows.
Step 1: Initialization: Randomly initialize partition matrix , and typicality matrix, number of clusters c, parameters m, a, b and the termination tolerance  > 0 Step 2: Centroid calculation: Calculate the fuzzy cluster prototypes by using the equation (2.9).
Step 3: Classification: Update the membership matrix by using the equation (2.5) and the typicality matrix by using the equation (2.8).
Step 4: Convergence criteria: Compare the membership matrices of previous and after the iteration.If the comparison value is less than the termination tolerance, then stop, else repeat from step 2. This model has the potential that is either it can influence the prototypes by means of memberships (when a >b) or by typicalities (when b > a).If the values of a and b are restricted as a = 1 and b = 0 then the PFCM model performs as FCM model.The effect of outliers can be reduce by considering high value of b (m) than a (η).
FPCM generates three clusters corresponding to Normal, Hyperthyroid and Hypothyroid containing 149, 35 and 31 samples respectively.The cluster which is associated with Normal contains 7 samples that belong to Hyperthyroid and 6 samples that belong to Hypothyroid clusters are wrongly grouped.9 samples that belong to Normal and 2 sample that belong to Hypothyroid clusters are wrongly assigned to the cluster associated with Hyperthyroid.Further 5 samples that belong to Normal and 3 samples that belong to Hyperthyroid clusters are wrongly classified in to the cluster associated with Hypothyroid.
Using the method PFCM three clusters were obtained corresponding to Normal, Hyperthyroid and Hypothyroid containing 153, 39 and 23 samples respectively.The cluster which is associated with Normal contains 10 samples and 6 samples respectively that belong to Hyperthyroid and Hypothyroid clusters are wrongly grouped.13 samples that belong to Normal and 1 samples that belong to Hypothyroid clusters are wrongly assigned to the cluster associated with Hyperthyroid.Further 23 samples are correctly classified into cluster associated with Hypothyroid.
The clustering results of the two fuzzy methods are shown in tables containing the number of correctly and incorrectly classified sample for each method.
The liver disorder data set contains 341 samples classified as two different classes.Each sample is characterized by 6 attributes and all the samples are labeled by numbers 1 to 341.The samples from 1 to 142 i.e., 142 samples are classified as class 1 and from 143 to 341 i.e., 199 samples are classified as class 2. The algorithms FPCM and PFCM are applied to generate two clusters.http://www.ispacs.com/journals/jfsva/2018/jfsva-00395/International Scientific Publications and Consulting Services The algorithms PFCM generate two clusters corresponding to class 1 containing 51 and class2 containing 290. 34 samples which belong to class 2 are wrongly assigned in class1 and 125 samples which belong to class 1 are wrongly assigned in class 2. The algorithm FPCM generate two clusters corresponding to class 1 containing 54 samples and class 2 containing 287 samples.38 samples which belong to class 2 are wrongly classiefied into class1.125 samples which belong to class 2 are wrongly classiefied into class1.

Figure 2 :
Figure 2: Thyroid FPCM result In Fig 2 blue line connects all normal samples and the red line connects hyperthyroid and green line connects hypothyroid.

Figure 3 :Figure 4 :
Figure 3: Liver PFCM result In Fig 3 red line connects class 1 and green line connects class 2

Table 2 :
The clustering results obtained by the algorithms FPCM and PFCM for thyroid data

Table 3 :
Comparision of performance of the clustering results obtained by the algorithms PFCM and FPCM for thyroid data For thyroid data, the classification performance of PFCM is 86.06 where as for FPCM is 85.11 which is less.http://www.ispacs.com/journals/jfsva/2018/jfsva-00395/ International Scientific Publications and Consulting Services

Table 4 :
Comparision of performance of the clustering results obtained by the algorithms PFCM and FPCM for liver