| --- |
| datasets: |
| - priamai/AnnoCTR |
| base_model: |
| - urchade/gliner_small-v1 |
| tags: |
| - Security |
| - NER |
| - CTI |
| language: |
| - en |
| --- |
| # AITSecNER - Entity Recognition for Cybersecurity |
|
|
| This repository demonstrates how to use the **AITSecNER** model hosted on Hugging Face, based on the powerful GLiNER library, to extract cybersecurity-related entities from text. |
|
|
| ## Installation |
|
|
| Install GLiNER via pip: |
|
|
| ```bash |
| pip install gliner |
| ``` |
|
|
| ## Usage |
|
|
| ### Import and Load Model |
|
|
| Load the pretrained AITSecNER model directly from Hugging Face: |
|
|
| ```python |
| from gliner import GLiNER |
| |
| model = GLiNER.from_pretrained("selfconstruct3d/AITSecNER", load_tokenizer=True) |
| ``` |
|
|
| ### Predict Entities |
|
|
| Define the input text and entity labels you wish to extract: |
|
|
| ```python |
| # Example input text |
| text = """ |
| Upon opening Emotet maldocs, victims are greeted with fake Microsoft 365 prompt that states |
| “THIS DOCUMENT IS PROTECTED,” and instructs victims on how to enable macros. |
| """ |
| |
| # Entity labels |
| labels = [ |
| 'CLICommand/CodeSnippet', 'CON', 'DATE', 'GROUP', 'LOC', |
| 'MALWARE', 'ORG', 'SECTOR', 'TACTIC', 'TECHNIQUE', 'TOOL' |
| ] |
| |
| # Predict entities |
| entities = model.predict_entities(text, labels, threshold=0.5) |
| |
| # Display results |
| for entity in entities: |
| print(f"{entity['text']} => {entity['label']}") |
| ``` |
|
|
| ### Sample Output |
|
|
| ```bash |
| Emotet => MALWARE |
| Microsoft => ORG |
| ``` |
|
|
| ## Model Details |
|
|
| The **AITSecNER** model was fine-tuned using the [urchade/gliner_small](https://huggingface.co/urchade/gliner_small) model from Hugging Face on the [priamai/AnnoCTR dataset](https://huggingface.co/datasets/priamai/AnnoCTR). For more details about the dataset, see the paper ["AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports"](https://arxiv.org/abs/2305.10472). |
|
|
| GLiNER is described in detail in the paper ["GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer"](https://arxiv.org/abs/2311.08526). |
|
|
| ## About |
|
|
| **AITSecNER** leverages GLiNER to quickly and accurately extract cybersecurity-specific entities, making it highly suitable for tasks such as: |
|
|
| - Cyber threat intelligence analysis |
| - Incident response documentation |
| - Automated cybersecurity reporting |
|
|
|
|
|
|
| ## Licence |
| This model is licensed for non-commercial use only (CC BY-NC 4.0). |
| For commercial inquiries, please contact dzenan.hamzic@ait.ac.at. |