This project focuses on detecting malware by classifying images using a Convolutional Neural Network (CNN) combined with residual blocks. The model is trained to distinguish between malware and goodware based on image represented by Fractal Geometry. The project achieves a high accuracy of 94% on the test dataset, demonstrating its effectiveness in malware detection.
- Deep Learning Model: Utilizes a CNN with residual blocks for enhanced feature extraction and learning.
- High Accuracy: Achieves 94% accuracy on the test dataset.
- Precision, Recall, and F1-Score:
- Precision (Malware): 93.19%
- Recall (Malware): 93.10%
- F1-Score (Malware): 93.15%
- Image-Based Classification: Converts malware and goodware files into images for classification.
- Residual Blocks: Improves gradient flow and model performance by addressing vanishing gradient problems.
The model was tested on a dataset of malware and goodware images, yielding the following results:
- Total Correct Predictions: 1863
- Total Incorrect Predictions: 137
- Test Accuracy: 94%
- Precision (Malware): 93.19%
- Recall (Malware): 93.10%
- F1-Score (Malware): 93.15%
The model architecture consists of:
- Initial Conv Layer: A convolutional layer with 8 filters, followed by batch normalization and max pooling.
- Residual Blocks: Three residual blocks with increasing filter sizes (16, 32, 64) to capture hierarchical features.
- Fully Connected Layers: Dense layers with dropout for regularization.
- Output Layer: A single neuron with a sigmoid activation function for binary classification.
The dataset consists of images representing malware and goodware files. The images are stored in two directories:
- GW: Contains images of goodware.
- MW: Contains images of malware.
The dataset is split into training and validation sets with a 90:10 ratio.
To run this project locally, follow these steps:
- Clone the repository:
git clone https://github.com/your-username/malware-detection-cnn.git cd malware-detection-cnn
- Install dependencies:
pip install -r requirements.txt
- Download the dataset:
- Place your dataset in the
C:\Data for Classification
directory. - Alternatively, update the
data_dir
variable in the code to point to your dataset location.
- Place your dataset in the
- Run the code:
python main.py
-
Save your Python script as
main.py
(or whatever name you prefer). -
Ensure the script is in the root directory of your repository.
-
Users can now run the script using the command above.
-
Customization:
- To change the dataset path, update the
data_dir
variable in the code. - To modify hyperparameters (e.g., batch size, epochs), edit the following lines in the code:
batch_size = 8 img_height = 512 img_width = 512 epochs = 20
- To change the dataset path, update the
-
Dataset Structure: The dataset should be organized as follows: Data for Classification/
├── GW/ # Goodware images │ ├── image1.jpg │ ├── image2.jpg │ └── ... └── MW/ # Malware images ├── image1.jpg ├── image2.jpg └── ...
-
Testing the Model:
- After training, the model will automatically test on a subset of the dataset.
- The results (accuracy, precision, recall, F1-score) will be printed in the terminal.
- You can also test the model on custom images by modifying the
test_images
section of the code.
-
Visualizing Results:
- The training and validation accuracy/loss plots will be displayed after training.
- The model architecture will be saved as
model_plot.png
in the project directory.
Contributions are welcome! If you'd like to contribute, please fork the repository and create a pull request. You can also open an issue for any bugs or feature requests.
Thanks to the TensorFlow and Keras teams for providing the tools to build this model.
Special thanks to the creators of the dataset used in this project.
For any questions or feedback, feel free to reach out:
Email: [email protected]
GitHub: Aktham9