I have been looking at Anonymisation of consultation responses in response to FOI requests etc.
I think this is one area where centralising capability and functionality in a high quality and validated tool could save significant time across government and enhance transparency. it also ties nicely with the use of consult on the front end for example as people read information on the front end they could approve of reject anonymisation (gathering a decent validation set over time)
Feature ideas:
- Use of LLM to tag and mask PII and other sensitive information.
- Organisations will have their own relevant patterns for example customer ids, or transaction ids which may be considered PII. You could use LLM to generate Regex patterns based on user provided example inputs. These could then be applied to the consultation responses and users validate pattern.
- a nice UI for people to approve anonymisation. if they are reading responses within consult anyway. Using this time to validate anonymisation could allow us to centrally gather a massive dataset of anonymised consultation responses (a very useful validation dataset) Although I appreciate a validation dataset ends up having longer term data storage requirements. I think gathering this across government is one of the key values of this feature.
- There are model architectures which can provide zero-shot token classification which can be trained to catch PII. For example gliner-pii-large-v1.0 an example trained on high quality validated UK data (gathered via the UI example above) and on more recent base models gliner_large-v2.5 might be able to perform high quality flexible anonymisation, allowing users to target specifically missed categories, without the cost inherent in running all data through a LLM endpoint.
I have been looking at Anonymisation of consultation responses in response to FOI requests etc.
I think this is one area where centralising capability and functionality in a high quality and validated tool could save significant time across government and enhance transparency. it also ties nicely with the use of consult on the front end for example as people read information on the front end they could approve of reject anonymisation (gathering a decent validation set over time)
Feature ideas: