Back to Article
Intelligent Detection of Injection Attacks via SQL Based on Supervised Machine Learning Models for Enhancing Web Security
Journal of Artificial Intelligence and Big Data
| Vol 4, Issue 2
Table 1. Summary of the study on Detection of SQLInjection Attacks using machine learning
| Author | Proposed Work | Dataset | Key Findings | Challenges/Gaps |
| Hasan, Balbahaith, and Tarique (2019) | Developed a heuristic ML-based algorithm and GUI app using the top 5 of 23 classifiers | 616 SQL statements | Achieved 93.8% accuracy in detecting SQLi attacks | Small dataset size; scalability to real-world scenarios not validated |
| Noor et al. (2019) | suggested an arrangement based on semantic ML to connect risks and TTPs via probabilistic networks | TTP taxonomy dataset (133 TTPs, 45 threat families) | Detected threats with 92% accuracy; low false positives; 0.15s average detection time | Specific to TTP-based threats; generalization to SQLi-specific detection not tested |
| Zhang (2019) | Designed ML classifiers (CNN, MLP) to detect SQLi vulnerabilities in PHP code using code-level features | PHP source code files | CNN achieved 95.4% precision; MLP achieved 63.7% recall and F-measure of 0.746 | Limited to PHP; varying performance across different classifiers |
| Ul Islam et al. (2019) | Created a NoSQL injection supervised learning tool. detection with a novel dataset | Custom-designed NoSQL injection dataset | Achieved 0.93 F2-score; outperformed Sqreen by 36.25%; database-agnostic | Limited availability of NoSQL datasets; manual feature engineering required |
| McWhirter et al. (2018) | Gap-Weighted String Subsequence was used. Kernel + SVM on SQL query strings for classification | Amnesia testbed datasets | Achieved 97.07% (Select) and 92.48% (Insert) accuracy; adapted to unseen threats | Lower accuracy with unsanitized quotation marks; sensitive to input anomalies |
| Chattopadhyay et al. (2018) | examined the difficulties in implementing ML methods for identifying malware | Multiple datasets (unspecified) | Compared various ML techniques across datasets; summarized performance based on different metrics; identified optimal techniques for evolving patterns. | Lack of clarity in dataset specifics; issues in defining and generalizing ML approaches to dynamic, real-world intrusion patterns; scalability concerns. |