# (WIP) Training Issues

### Understanding AI Training Cycles

AI systems, particularly those based on machine learning (ML) and deep learning (DL), require extensive training cycles using large datasets to learn and make predictions or decisions. These cycles can be broadly categorized into three phases: data collection, model training, and model validation. Each phase presents unique security challenges. Let's generalize some possible attacks under those three phases.

#### Possible Data Collection **Security Issues**:

* **Data Poisoning**: Malicious actors may introduce corrupted data into the dataset, aiming to skew the AI model's learning process. Discovery typically involves data validation techniques and anomaly detection to identify outliers that do not fit the expected data distribution.
* **Privacy Leaks**: Collecting data from individuals without adequate consent or security measures can lead to privacy breaches. Discovery involves auditing data collection processes and employing privacy-preserving techniques like differential privacy.

**Risks**:

* Skewed AI decisions, potentially causing financial loss or reputational damage.
* Legal repercussions from privacy violations.

#### Possible Model Training **Security Issues**:

* **Adversarial Attacks**: During training, models may be susceptible to adversarial examples designed to mislead AI predictions. These can be discovered through robustness testing, where the model is exposed to various manipulated inputs to assess its response.
* **Overfitting to Sensitive Data**: If a model overfits its training data, it might inadvertently reveal sensitive information through its predictions. Techniques like model auditing and implementing generalization measures (e.g., regularization) can help identify and mitigate this issue.

**Risks**:

* Compromised decision-making leads to security vulnerabilities.
* Unintentional data leakage, compromising user confidentiality.

#### Possible Model Validation **Security Issues**:

* **Insufficient Testing**: Failing to thoroughly test the model against a wide range of scenarios can leave unseen vulnerabilities. This can be discovered through comprehensive testing, including stress and scenario-based tests, to evaluate the model's performance across diverse conditions.
* **Bias and Fairness**: Models might exhibit biased behaviour if not properly validated for fairness, which can be discovered through fairness assessments and bias mitigation techniques.

**Risks**:

* Inadequate model performance under unexpected conditions, potentially endangering users.
* Ethical and legal issues from biased decision-making.

### (WIP) How can you find the vulnerabilities in the training parts?

&#x20;\- Training cycle frequency and how data is provided into those cycles\
\-  Feedback mechanisms\
\-  Platform related issues\
\-  Privacy matters\
\-  Synthetic vs. Real Data\
\- &#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://lookbook.cyberjungles.com/ai-ml-llm-security/wip-ai-ml-llm-application-security-testing/wip-training-issues.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
