Intelligent Detection of Injection Attacks via SQL Based on Supervised Machine Learning Models for Enhancing Web Security

Journal of Artificial Intelligence and Big Data | Vol 4, Issue 2

Table 1. Summary of the study on Detection of SQLInjection Attacks using machine learning

Author	Proposed Work	Dataset	Key Findings	Challenges/Gaps
Hasan, Balbahaith, and Tarique (2019)	Developed a heuristic ML-based algorithm and GUI app using the top 5 of 23 classifiers	616 SQL statements	Achieved 93.8% accuracy in detecting SQLi attacks	Small dataset size; scalability to real-world scenarios not validated
Noor et al. (2019)	suggested an arrangement based on semantic ML to connect risks and TTPs via probabilistic networks	TTP taxonomy dataset (133 TTPs, 45 threat families)	Detected threats with 92% accuracy; low false positives; 0.15s average detection time	Specific to TTP-based threats; generalization to SQLi-specific detection not tested
Zhang (2019)	Designed ML classifiers (CNN, MLP) to detect SQLi vulnerabilities in PHP code using code-level features	PHP source code files	CNN achieved 95.4% precision; MLP achieved 63.7% recall and F-measure of 0.746	Limited to PHP; varying performance across different classifiers
Ul Islam et al. (2019)	Created a NoSQL injection supervised learning tool. detection with a novel dataset	Custom-designed NoSQL injection dataset	Achieved 0.93 F2-score; outperformed Sqreen by 36.25%; database-agnostic	Limited availability of NoSQL datasets; manual feature engineering required
McWhirter et al. (2018)	Gap-Weighted String Subsequence was used. Kernel + SVM on SQL query strings for classification	Amnesia testbed datasets	Achieved 97.07% (Select) and 92.48% (Insert) accuracy; adapted to unseen threats	Lower accuracy with unsanitized quotation marks; sensitive to input anomalies
Chattopadhyay et al. (2018)	examined the difficulties in implementing ML methods for identifying malware	Multiple datasets (unspecified)	Compared various ML techniques across datasets; summarized performance based on different metrics; identified optimal techniques for evolving patterns.	Lack of clarity in dataset specifics; issues in defining and generalizing ML approaches to dynamic, real-world intrusion patterns; scalability concerns.