Voice over IP (VoIP) has been prevailing in the telecommunication world since its emergence in the late 90s, as a new technology transporting multimedia over the IP network. The reason for its prevalence is that, compared to legacy phone system, VoIP allows significant benefits such as cost savings, the provision of new media services, phone portability, and the integration with other applications . Despite these advantages, the VoIP technology suffers from many hurdles such as architecture complexity, interoperability issues, Quality of services (QoS) concerns, and security issues. Among these disadvantages, VoIP security issues are becoming more serious because traditional security devices, protocols, and architectures cannot adequately protect VoIP systems from recent security attacks . VoIP fraud is a significant and growing problem in the telecommunications industry. Because fraudsters often attack during weekends, fraud events often go undetected for many hours. As the popularity of VoIP continues to grow, the problem of VoIP fraud will become an increasing threat to the industry (see in Figure 1.1). Voice over Internet Protocol (VoIP) is one of the fastest rising Internet applications in recent years . The VoIP communication It replaces the old Public Switched Telephone Network (PSTN) with a converged network where data, voice, and video packets are carried together using a single IP-based network, moreover is an application that employs Real-Time Transport Protocol (RTP) to transfer voice packets over IP based networks. RTP runs on top of the user datagram protocol (UDP), which is an unreliable delivery protocol. Regarding real-time communication, voice packets that delay in arrival at the destination are useless and equal to drop packets  This Study propose how the ML Can Detect one type of distributed denial-of-service (DDoS) Attack called SIP Flooding Attack and how to secure VoIP. It Based on Composite of decision tree Algorithm and Fuzzy C-Mean to find if calls are real or Flood attack before the Gateway. Thus, The work proposed in this study has never been used in VoIP networks and has had high results in detecting threats compared to other models, which is why this study was important in Machine learning to protect communications systems.
focus our attention on Session Initiation Protocol (SIP) , a popular and widely deployed technology. Primarily because of the availability of a number of free and open-source implementations. SIP is a protocol standardized by the Internet Engineering Task Force (IETF), and is designed to support the setup of bidirectional communication sessions including, but not limited to, VoIP calls. It is similar in some ways to Hypertext Transfer Protocol (HTTP), in that it is text-based, has a request-response structure, and even uses a mechanism based on the HTTP Digest Authentication  for user authentication. However, it is an inherently stateful protocol that supports interaction with multiple network components (e.g., middle boxes such as PSTN bridges), and asynchronous notiﬁcations. While its ﬁnite state machine is seemingly simple, in practice it has become quite large and complicated — an observation supported by the fact that the main SIP RFC  is one of the longest ever deﬁned, with additional RFCs further extending the speciﬁcation. SIP is a signaling protocol, relying on RTP  for media transfer. There exists an RTP proﬁle (named Secure RTP, or SRTP ) that supports encryption and integrity, but is not yet widely used. The RTP protocol family also includes Real-time Transport Control Protocol (RTCP), which is used to control certain RTP parameters between communicating endpoints. SIP can operate over a number of transport protocols, including Transmission Control Protocol (TCP) , UDP  and Stream Control Transmission Protocol (SCTP) . UDP is generally the preferred method due to simplicity and performance, although TCP has the advantage of supporting TLS protection of call setup. However, recent work on Datagram Transport Layer Security (DTLS)  may render this irrelevant. SCTP, on the other hand, offers several advantages over both TCP and UDP, including Denial of service (DoS) resistance , multi-homing and mobility support, and logical connection multiplexing over a single channel
VoIPSA is a vendor-neutral, not for proﬁt organization composed of VoIP and security vendors, organizations and individuals with an interest in securing VoIP protocols, products and installations. The VoIPSA security threat taxonomy  aims to deﬁne the security threats against VoIP deployments, services, and end users. The key elements of this taxonomy are:
1) Social threats are aimed directly against humans. For example, misconﬁgurations, bugs or bad protocol interactions in VoIP systems may enable or facilitate attacks that misrepresent the identity of malicious parties to users. Such attacks may then act as stepping stones to further attacks such as phishing, theft of service, or unwanted contact (spam).
2) Eavesdropping, interception, and modiﬁcation threats cover situations where an adversary can unlawfully and without authorization from the parties concerned listen to signaling (call setup) or the content of a VoIP session, and possibly modify aspects of that session while avoiding detection. Examples of such attacks include call re-routing and interception of unencrypted RTP sessions.
3) Denial of service threats have the potential to deny users access to VoIP services. This may be particularly problematic in the case of emergencies, or when a DoS attack affects all of a user’s or organization’s communication capabilities (i.e., when all VoIP and data communications are multiplexed over the same network which can be targeted through a DoS attack). Such attacks may be VoIP-speciﬁc (exploiting ﬂaws in the call setup or the implementation of services), or VoIP agnostic (e.g., generic trafﬁc ﬂooding attacks). They may also involve attacks with physical components (e.g., physically disconnecting or severing a cable) or through computing or other infrastructures (e.g., disabling the DNS server, or shutting down power) .
In DDOS are attacks that are mostly directed to websites or even the online services with an aim of causing a lot of traffic a lot of traffic in these websites and online services. They normally cause a lot of traffic to an extent that the network or the online service servers cannot accommodate it. The traffic could be sent inform of messages, fake packets or even requests for connection.
For a DDOS attack to occur, the Attacker only needs to send too many Messages thereby jamming your server or network hence denial of services, showing in Figure 1.3
The steps to prepare an organization or individual to deal with the DDOS attacks are:
Map vulnerable assets – In this step, All assets exposed to DDoS are identified. Note all assets from the physical ones like the servers to the virtual ones like the IP addresses and the domains.
Assess potential damages – In this step you have to asses the worth of the asset determined, the kin of risk it faces and to what extent it might cost you if it is taken down by the DDOS attack. This will help you to know how much resources you are willing to spend to protect this asset from DDOS attacks.
Assign responsibility – After assets determination and also determining the amount of money and the sector it would affect in case of an attack, you now have to determine the personnel responsible for protecting the asset from DDOS attacks.
4) Service abuse threats covers the improper use of VoIP services, especially (but not exclusively) in those situations where such services are offered in a commercial setting. Examples of such threats include toll fraud and billing avoidance , .
5) Physical access threats refer to inappropriate/unauthorized physical access to VoIP equipment, or to the physical layer of the network (following the ISO 7-layer network stack model).
6) Interruption of services threats refer to non-intentional problems that may nonetheless cause VoIP services to become unusable or inaccessible. Examples of such threats include loss of power due to inclement weather, resource exhaustion due to over-subscription, and performance issues that degrade call quality.
Train And Testing
By the training process, the learning algorithm learned a new classifier (classifier inducer) using observations. The process by which the learned classifier is tested on unseen observations is called the testing process. The training set is a collection of observations from the problem domain that is called instances; they are used by the classifier during the training processes . The algorithm learns important knowledge or rules in the training set by building models and setting the corresponding parameters. Then, the performance of the algorithm is evaluated on the test set, which is also a collection of instances in the same problem domain, but these are not used, and they remain unseen during the training process. The learning ability of the classification algorithms is usually tested by applying them to a set of benchmark problems .
Benchmark problems are commonly picked from datasets that are publicly accessible to researchers so that results can be verified, and the performance can be checked. Datasets often have a training set and a test set. In such problems, the learning algorithm then becomes of two-fold purpose: to discover or learn various kinds of knowledge or rules from the training set and apply these rules to the test set to measure the learned model. However, many benchmark problems do not have a specific test set, or some of them only have a small number of available instances in the dataset. To evaluate the performance of a classifier on these problems, it is necessary to use some resampling methods .
In n-fold cross-validation, a dataset is partitioned into n folds (partitions) randomly and the folds are in near-equal size. In n-fold cross-validation, the folds are selected so that the proportion of instances from different classes remains the same in all folds. Therefore, a single fold of the n folds is kept as the test set for testing the learned classifier, and the remaining (n – 1) folds are used as the training set. Then the cross-validation process is repeated n times, with each of the n folds used only once as the test set. After that, one takes the average of n results from the n experiments to estimate the classification achievement. The advantage of this method is that all instances are used for both training and testing, and each instance is used for testing only once. Mostly, a larger n will output an estimate with smaller bias because of the higher proportion of instances in the training set, but potentially higher variance (on top of being computationally expensive) .
A decision tree model represents a tree structure that is similar to a flowchart, where ever internal node serves as a test on a dataset feature, whereas, a tree branch corresponds to the test result in this structure. In addition, each leaf node represents a target feature label and the upper first node in the tree acts as the root node. Decision trees can be binary or a non-binary tree. Decision trees are popular classification techniques because their usage does not need prior knowledge of the problem domain or a complicated setting of the classification parameters .
Moreover, they can be easily understood and converted to classification rules. The decision tree classification technique has been used in many real-world applications and fields, such as financial analysis, medicine, molecular biology, manufacturing production, and astronomy. During building the decision tree, the algorithm applies an attribute selection measure which is used in selecting the feature that best divides the dataset instances into distinct target classes. Such measures include the Accuracy, F1 Score, Recall score and Precision score .
In C4.5 algorithm Data Mining, the C4.5 algorithm is utilized as a Decision Tree Classifier, which can be used to make a decision based on a sample of data . The training data consists of a set of samples that have already been categorized. Each sample is made up of a p-dimensional vector that represents the sample’s attribute values or characteristics, as well as the class in which it falls. C4.5 selects the data characteristic that most effectively divides its set of samples into subsets enriched in one class or the other at each node of the tree.  .
The Fuzzy C-mean (FCM) application can be used to solve a wide range of geostatistical data analysis issues. For any set of numerical data, this application generates fuzzy partitions and prototypes. These partitions can be used to confirm known substructures or to suggest substructure in previously unexplored data. A generalized least-squares objective function is used to aggregate subsets as a clustering criterion. Thus, A choice of three norms that is Euclidean, Diagonal, or Mahalonobis, an adjustable weighting factor that effectively limits sensitivity to noise, acceptance of variable numbers of clusters, and outputs that contain many metrics of cluster validity are among the features of this application . On the basis of the distance between the cluster center and the data point, this algorithm assigns membership to each data point corresponding to each cluster center. The closer the data is to the cluster center, the more likely it is to belong to that cluster center. Clearly, the sum of each data point’s membership should equal one . FCM is a popular nondeterministic clustering technique in data mining. In traﬃc engineering researches, traﬃc pattern recognition plays an important role. Besides, these studies often face the limitation of missing or incomplete data. To deal with these constraints, FCM has become a commonly applied clustering technique. The advantage of this approach is, unlike original C-means clustering methods, it can overcome the issue of getting trapped in the local optimum . However, FCM requires setting a predeﬁned cluster number, which is not always possible while dealing with massive data without any prior knowledge of the data dimension. Besides, this model becomes computationally expensive with data size increment. Diﬀerent studies have applied FCM successfully by improving its limitations. Some studies changed the fuzzy index value for each FCM algorithm execution , some calculated the Davies-Bould in (DB) index , while others applied the K-means clustering algorithm [19, 20].
This chapter provides a review of the literature that forms the background and supports the motivations of the thesis. This chapter covers the basic concepts and reviews of typical related work of SIP Flooding Attack. Reynolds and Ghosal  describe a multi-layer protection scheme against ﬂood-based application-and transport-layer denial of service (DoS) attacks in VoIP. They use a combination of sensors located across the enterprise network, continuously estimating the deviation from the long-term average of the number of call setup requests and successfully completed handshakes. Similar techniques have been used in detecting TCP SYN ﬂood attacks, with good results. The authors evaluate their scheme via simulation, considering several different types of DoS attacks and recovery models. Larson et al. ,  have experimentally analyzed the impact of distributed denial of service (DDoS) attacks on VoIP call quality. The proposed method also established the effectiveness of low-rate denial of service attacks that target speciﬁc vulnerabilities and implementation artifacts to cause equipment crashes and reboots, it discuss some of the possible defenses against such attacks and describe Sprint’s approach, which uses regional “cleaning centers” which divert suspected attack trafﬁc to a centralized location with numerous screening and mitigation mechanisms available, The paper recommend that critical VoIP trafﬁc stay on private networks, the use of general DDoS mechanisms as a front-line defense, VoIP-aware DDoS detection and mitigation mechanisms, trafﬁc policing and rate-limiting mechanisms, the use of TCP for VoIP signaling (which makes IP spooﬁng, and hence anonymous/unﬁlterable DoS attacks, very difﬁcult), extended protocol compliance checking by VoIP network elements, and the use of authentication where possible. Bremler-Barr et al.  describe de-registration attacks in SIP, wherein an adversary can force a user to be disassociated with the proxy server and registrar, or to even divert that user’s calls to any party (including to the attacker). This attack works even when authentication is used, if the adversary can eavesdrop on trafﬁc between the client and the SIP proxy. They demonstrate the attack against several SIP implementations, and propose a protection mechanism that is similar to onetime passwords. Chen  describes a denial of service detection mechanism that models the SIP transaction state machine and identiﬁes attacks by measuring the number of transaction and application errors, the number of transactions per node, and the trafﬁc volume per transaction. If certain thresholds are exceeded, an alert is generated. Chen did not describe how appropriate thresholds can be established, other than to indicate that historical records can be used. Sengar. , describe VoIP flooding detection system (vFDS), an anomaly detection system that seeks to identify ﬂooding denial of service attacks in VoIP. The approach taken is to measure abnormal variations in the relationships between related packet streams using the Hellinger distance, a measure of the deviation between two probability measures. Using synthetic attacks, they show that vFDS can detect ﬂooding attacks that use SYN, SIP, or RTP packets within approximately 1 second of the commencement of an attack, with small impact on call setup latency and voice quality. A similar approach, using Hellinger distance on trafﬁc sketches, is proposed by Tang et al. , overcoming the limitations of the previous schemes against multi-attribute attacks. Furthermore, their scheme does not require the constant calculation of an accurate threshold (deﬁning “normal” conditions). Fiedler et al.  present VoIP Defender, an open architecture for monitoring SIP trafﬁc, with a primary focus on high-volume denial of service attacks. Their architecture allows for a variety of detection methods to be integrated, and several different attack prevention and mitigation mechanisms were used. Thus, Key design goals include transparency, scalability, extensibility, speed and autonomous operation. Their evaluation of the prototype implementation consists exclusively of performance measurements. Conner and Nahrstedt  describe a semantic-level attack that causes resource exhaustion on stateful SIP proxies by calling parties that (legitimately or in collusion) do not respond. This attack does not require network ﬂooding or other high trafﬁc volume attacks, making it difﬁcult to detect with simple, network-based heuristics used against other types of denial of service attacks. They propose a simple algorithm, called Random Early Termination(RET) for releasing reserved resources based on the current state of the proxy (overloaded or not) and the duration of each call’s ringing. They implement and evaluate their proposed scheme on a SIP proxy running in a local testbed, showing that it reduces the number of benign call failures when under attack, without incurring measurable overheads when no attack is underway. Luo et al.  experimentally evaluate the susceptibility of SIP to CPU-based denial of service attacks. They use an open-source SIP server in four attack scenarios: basic request ﬂooding, spoofed-nonce ﬂooding (wherein the target server is forced to validate the authenticator in a received message), adaptive-nonce ﬂooding (where the nonce is refreshed periodically by obtaining a new one from the server), and adaptive nonce ﬂooding with IP spooﬁng. Their measurements show that these attacks can have a large impact on the quality of service provided by the servers. They propose several countermeasures to mitigate against such attacks, indicating that authentication by itself cannot solve the problem and that, in some circumstances, it can exacerbate its severity. These mitigation mechanisms include lightweight authentication and whitelisting, proper choice of authentication parameters, and binding of nonces to client IP addresses. Fuchs et al.  apply anomaly detection techniques to protect against VoIP-originated denial of service attacks at the phone call level at public safety service centers (e.g., 911 or 112 operators). Speciﬁcally, they use call traces from normal operations to determine the level of calls coming from the Public Switched telephony network (PSTN), Global System for Mobile Communications (GSM) and Voice Over Internet Protocol (VoIP) networks during normal operation and at disaster time. They then use these proﬁles to discriminate against VoIP-based DoS attacks by limiting the accepted number of calls that can originate from that domain, building on previous work that identiﬁed the network of origin as a potential discriminator . Using call traces from a ﬁre department response center, they evaluate the call response rate against the DoS attack intensity. Their analysis shows that it is possible to identify such attacks early and to avoid false positives if VoIP-originated calls under normal scenarios are less than 27% of total call volume. Hyun-Soo et al.  propose a detection mechanism for de-registration and other call disruption attacks in SIP that is based on message retransmission: when a server receives an unauthenticated (but possibly legitimate) message M that could disturb a call or otherwise deny service to a user, it asks the user’s agent to retransmit the last SIP message sent by that agent, as an implicit authenticator. If the retransmission matches M (i.e., this was a legitimate request), the server proceeds with its processing. If the retransmission does not match M, or if multiple retransmissions are received within a short time window (as may be the case when an attacker can eavesdrop on the network link between the SIP proxy and the user, identifying the request for retransmission), M is discarded. However, the scheme requires a new SIP message to signal that a retransmission is needed. Geneiatakisand Lambrinoudakis ,  consider some of the same attacks, and propose mitigation through an additional SIP header that must be included in all messages and can cryptographically validate the authenticity and integrity of control messages. Ormazabal et al.  describe the design and implementation of a SIP-aware, rule-based application-layer ﬁrewall that can handle denial of service (and other) attacks in the signaling and media protocols. They use hardware acceleration for the rule matching component, allowing them to achieving ﬁltering rates on the order of hundreds of transactions per second. The SIP-speciﬁc rules, combined with state validation of the endpoints, allow the ﬁrewall to open precisely the ports needed for only the local and remote addresses involved in a speciﬁc session, by decomposing and analyzing the content and meaning of SIP signaling message headers. They experimentally evaluate and validate the behavior of their prototype with a distributed testbed involving synthetic benign and attack trafﬁc generation. Ehlert et al. ,  propose a two-layer DoS prevention architecture for SIP. The ﬁrst layer is comprised of a bastion host that protects against well-known network layer attacks (such as TCP SYN ﬂooding) and SIP-ﬂooding attacks. The second layer is located at the SIP proxy, and is composed of modules that perform signature-based detection of malformed SIP messages and a non-blocking DNS cache to protect against attacks involving SIP URIs with irresolvable DNS names . They conduct a series of evaluations in an experimental testbed, where they validate the effectiveness of their architecture to block or mitigate number of DoS attacks. Ehlert et al.  separate propose and experimentally evaluate (via a testbed) a speciﬁcation-based intrusion-detection system for denial of service attacks. Geneiatakis et al. ,  use counting Bloom ﬁlters to detect messages that are part of a denial of service attack in SIP by determining the normal number of pending sessions for a given system and conﬁguration based on proﬁling. Awais et al.  describe an anti-DoS architecture based on bio-inspired anomaly detection. They compare their scheme against a cryptography-based mechanism using synthetic trafﬁc. Similar work is described by Rebahi et al. . Akbar and Farooq  conduct a comparative evaluation of several evolutionary and non-evolutionary machine learning algorithms using synthetic SIP trafﬁc datasets with different levels of attack intensities and durations. They conclude that different algorithms like Supervised Classifier System (UCS) and Genetic Classifier System (GAssist) and settings are best suited for different scenarios. The same authors subsequently apply anomaly detection techniques to identify Real Time Protocol (RTP) fuzzing attacks that seek to cause server crashes through malformed packet headers and payloads . They investigate several different classiﬁers like Decision Tree algorithm, analyzing their accuracy and performance using synthetic RTP traces. Nassar et al.  use support vector machine (SVM) classiﬁers on 38 distinct features in SIP trafﬁc to identify SPIT and DoS trafﬁc. Their experiments using SIP trafﬁc traces show good performance with more than 95% of accuracy. Raﬁque et al. Akbar et al.  conduct an analysis of three anomaly detection algorithms for detecting ﬂood attacks in IP Multimedia Subsystem IMS: adaptive threshold, cumulative sum, and Hellinger distance. They use synthetic trafﬁc data to determine the detection accuracy of these algorithms in the context of a SIP server being ﬂooded with SIP messages. Truong et al.  describe a rules-based intrusion detection system for H.323 that uses an Finite state machine (FSM) model to detect unexpected messages, aimed at identifying illegitimate RAS (Registration, Admission and Status) messages being forwarded to a H.323 gatekeeper.  the decision tree used to trace back the attacker’s position after an attack is detected using a traffic-flow model matching technique, the experiment shown positive ratio about 1.2% – 2.4%, false negative ratio about 2% – 10% with different kind of attack.
In this Study, the Dataset was Capture By Kyoto 2006+ for 7 Days and Collect 474,134 session Requests involving between type of services such as web, SIP, email and DNS orders. This Dataset is available in http://www.takakura.com/Kyoto_data/ . The dataset consists of both numerical and categorical features , this data describe the features
Feature Selection may be the preparation process where one consequently selects the important features from the data that contribute most to the prediction variable or yield into which you’re interested. The elimination of irrelevant features reduces the complexity of the Kyoto 2006+ dataset. More over all categorical features – resulting subset contains 16 features (1, 3, 4,15-17, 23); Remove statistical features related to the duration of the connection (1, 3, 4), all the features used for further analysis and evaluation of the models (15-17, 23); and Feature 18 is used to categorize network traffic into Three categories: normal Connection as (1), anomalous Connection Known attack as(-1) and for abnormal connection Unknown attack as (-2) .
Data preprocessing can convert raw data into well-formed data sets in order to use data mining methods . Raw data is frequently incomplete and formatted inconsistently. During preprocessing, data goes through a sequence of steps:
1- Data cleansing entails operations like filling in missing values or deleting rows with missing data, smoothing noisy data, and resolving errors in the data.
2- Data integration Data integration is a set of processes used to retrieve and combine data from disparate sources into meaningful and valuable information. A complete data integration solution delivers trusted data from a variety of sources, moreover, to ingest and clean data then load it into a data warehouse.
3- Data normalization and generalization are two steps in the data transformation process. Normalization is a method of ensuring that no data is duplicated, that everything is saved in one location, and that all dependencies are logical. This Variable must be either 0 or 1, This process of making features more suitable for training by rescaling is called feature scaling. The formula for normalization is given below in equation. (1):
4- Data reduction: When a database has a large amount of data, it can become slower, more expensive to access, and more difficult to store properly. In a data warehouse, data reduction seeks to deliver a simplified version of the data. Data reduction can be accomplished in a variety of ways. For example, anything below a certain degree of significance is removed once a subset of relevant traits is picked for their relevance.Encoding mechanisms can also be employed to minimize data size. The technique is called lossless if all of the original data can be recovered after compression. It’s called a lossy reduction when some data is lost. Aggregation can also be used to reduce the quantity of data objects by condensing a large number of transactions into a single weekly or monthly value.
5- Data Discretization: Raw values could be replaced with interval levels if data was discretized. This phase involves dividing the range of attribute intervals to reduce the number of values of a continuous attribute.
6- Data Sampling: A dataset may be too large or complex to deal with due to time, storage, or memory restrictions. Only a subset of the dataset can be selected and worked with using sampling techniques, as long as it has roughly the same properties as the original.
Design And Method
The first step involved cleaning the data by pre-processing operation like data normalization as it affects the results. Data pre-processing is the process of transforming raw data into a more intelligible, usable, and efficient format by manipulating or discarding it before it is utilized. This is necessary for assuring or improving performance and it is a crucial phase in the data mining process . Generally, analysing data that has not been thoroughly checked for such issues might lead to false conclusions . As a result, before analysis, the representation and quality of data must come first. Data preparation is frequently the most crucial stage of a machine learning project. Data validation and data imputation are both part of the preprocessing process. Those the purpose of data validation is to determine whether the data is comprehensive and accurate. The purpose of data imputation is to rectify errors and fill in missing numbers. Both database-driven and rules-based applications use data preprocessing. Data preparation is crucial in machine learning (ML) processes to ensure that big dataset are prepared in such a way that the data they contain can be processed and analysed by learning algorithms.
Although, feature selection and normalization was particularly utilized in order to help minimize data complexity as well as reduce data processing runtime such as through selection of a better feature space. The process of data normalization involved structuring the data in a database. This particularly included generating tables and defining links between them according to rules aimed to secure data while also allowing the database to be more flexible by removing duplication and inconsistent dependencies . The resulting data set contained 14 feature starts with service and end with protocol number. By removing extraneous characteristics and normalizing instances, the complexity of the Kyoto 2006+ dataset was significantly decreased. The next step in the experiment involved applying 3 Models in Machine learning, the Research propose decision tree with Fuzzy C-mean, Combined SVM with C4.5 and combined CNN with C4.5 using python code with up of a combination of c4.5 and Fuzzy c-mean. Generally, the C4.5 algorithm was primarily used as a decision tree classifier for the purposes of making a decision based on a sample of data. The datasets were subjected to training and a testing process to improve the accuracy of the deep learning algorithms. This was particularly undertaken in order for the algorithm to be able to learn important knowledge or rules in the training set by building models and setting the corresponding parameters . Then After, the performance of the algorithm was effectively evaluated on the test set, which is characterized by a collection of instances in the same problem domain. However, these are not used, and they remain unseen during the training process. Finally, the learning ability of the classification algorithms was tested by applying them to a set of benchmark problems designed identify one of most popular attack for SIP network.
During building the decision tree, the algorithm applied an attribute selection measure which is used in selecting the feature that best divides the dataset instances into distinct target classes. C4.5 selects the data characteristic that most effectively divides its set of samples into subsets enriched in one class or the other at each node of the tree.
Although, the fuzzy c_mean application was used to solve various data analysis issues. This was particularly attributed to the fact that the application generated various fuzzy partitions and prototypes. These partitions can be used to confirm known substructures or to suggest substructure in previously unexplored data. A generalized least-squares objective function is used to aggregate subsets as a clustering criterion . The major benefit of adopting this approach was that unlike original C-means clustering methods, the fuzzy c mean application helped overcome the issue of getting trapped in the local optimum. Nevertheless, it is worth noting that the approach required setting a predeﬁned cluster number, which was not very easy while dealing with massive data without any prior knowledge of the data dimension. The final step involved implementing the experiment using python language in Google Colab https://colab.research.google.com. This was particularly because Google Colab provided a number of excellent tools and features for implementing the machine learning aspects of the experiment using Python code. For example, using the Colab enabled the project team involved in the experiment to effectively execute the python codes via browser and perform the different machine learning tasks involved in the experiment .
The simulation results which involved developing and implementing a model designed identify one of most popular attack for SIP network by applying decision tree and Fuzzy C-mean in python reveals a number of critical insights. moreover, the results show that highest accuracy the work propose was 98.98%. Accuracy is the number of data points that are correctly predicted from all data points. More formally, it is defined as the number of true positive and true negative results divided by the number of true positive, true negative, false positive, and false negative results showing in Eq. 4.1 . The test result can either be positive (classifying the connection as its attack) or negative (classifying the connection as normal). The test result for each record may or may not match the actual connection status. To accommodate these scenarios, the following instances after the simulation are postulated:
1. True Positive (TP): attack correctly identified as attack.
2. False Positive (FP): Non-attack incorrectly identified as attack.
3. True Negative (TN): Non-attack correctly identified as non-attack.
4. False Negative (FN): attack incorrectly identified as non-attack.
the results also showed that the algorithm has a fairly high level of precision (positive predictive value) and recall (sensitivity) when it comes to SIP Flooding Attacks. This is critically important as a good machine learning algorithm is required to possess a high precision as well as a high recall. Generally, precision is an important metric for the effectiveness of machine learning algorithms which can effectively be used to determine the number of correct positive predictions. This can effectively be determined by dividing the number of true positives by the sum of true positives and false positives showing in Eq.4.2, On the other hand, precision is the proportion of TP classifications from cases predicted to be positive Eq.4.3, moreover, The F1 score is defined as the harmonic mean of precision and recall. The equations of the performance metrics are as follows: