Impressively in depth! I'll be sharing this around. Some questions:
1. Is "Full retraining" an example of 'Exact unlearning'? Based on how headers are organized, I'd say yes, but based on your wording/explanations, I'd say no.
2. In 'Exact Unlearning' section. "These approaches [retraining and exact unlearning] are usually the next best thing to training data properly the first time". What is difference between retraining and 'training data properly the first time'?
3. In section "Full retraining". You say "potentially adjusting parameters or weights". Why is this potential? If you're retraining from scratch, then the parameters are guaranteed to be changed? I think I've misunderstood.
4. "need to dig through and find all those relevant vectors". If I understood full retraining and SISA correctly, nobody digs through weights/parameters/vectors. The internals of the ML models are treated like black boxes I think.
1. Full retaining would be an example of exact unlearning -- at least as I understand it (and as most of the research conveyed). Generally, exact unlearning's core definition is that it's verifiable & guaranteed. If you re-train a model without the data you want to remove (and you don't let it do something silly like connect to web search), you can definitively guarantee that the LLM does not include the unwanted data. I think this would hold even in the legal sense if 'John Smith' asked for his data to be removed, you did so, and 'John Smith' still appeared in another context, since as it relates to the John Smith making the request, his data would be removed.
2. What I meant here was that you trained the LLM without offending data. 'Offending data' might be copyrightable works, or in my world, the personal data of individuals.
3. No, I think you're probably correct. By definition, it would guarantee that weights and parameters are changed.
4. Possibly. I think I may have read something in one paper that mentioned tweaking vectors -- or biasing certain connections to avoid problematic outputs notwithstanding the John Smith example I mentioned above -- This might have been in the Harry Potter paper (which wasn't on exact unlearning, admittedly). But it's been over a year, and I'm not sure I have the willpower to dig through the research again. If I do find it, I will let you know.
These are all great points though, and a reminder that even if it was clear to me, it's not always clear to someone with fresh eyes.
Impressively in depth! I'll be sharing this around. Some questions:
1. Is "Full retraining" an example of 'Exact unlearning'? Based on how headers are organized, I'd say yes, but based on your wording/explanations, I'd say no.
2. In 'Exact Unlearning' section. "These approaches [retraining and exact unlearning] are usually the next best thing to training data properly the first time". What is difference between retraining and 'training data properly the first time'?
3. In section "Full retraining". You say "potentially adjusting parameters or weights". Why is this potential? If you're retraining from scratch, then the parameters are guaranteed to be changed? I think I've misunderstood.
4. "need to dig through and find all those relevant vectors". If I understood full retraining and SISA correctly, nobody digs through weights/parameters/vectors. The internals of the ML models are treated like black boxes I think.
Ok -- now a little time:
1. Full retaining would be an example of exact unlearning -- at least as I understand it (and as most of the research conveyed). Generally, exact unlearning's core definition is that it's verifiable & guaranteed. If you re-train a model without the data you want to remove (and you don't let it do something silly like connect to web search), you can definitively guarantee that the LLM does not include the unwanted data. I think this would hold even in the legal sense if 'John Smith' asked for his data to be removed, you did so, and 'John Smith' still appeared in another context, since as it relates to the John Smith making the request, his data would be removed.
2. What I meant here was that you trained the LLM without offending data. 'Offending data' might be copyrightable works, or in my world, the personal data of individuals.
3. No, I think you're probably correct. By definition, it would guarantee that weights and parameters are changed.
4. Possibly. I think I may have read something in one paper that mentioned tweaking vectors -- or biasing certain connections to avoid problematic outputs notwithstanding the John Smith example I mentioned above -- This might have been in the Harry Potter paper (which wasn't on exact unlearning, admittedly). But it's been over a year, and I'm not sure I have the willpower to dig through the research again. If I do find it, I will let you know.
These are all great points though, and a reminder that even if it was clear to me, it's not always clear to someone with fresh eyes.
Great questions, and I will respond with some answers shortly.