Wednesday, 6 February 2008

Testing with Real Data

This looks at the issue with testing with production data. It says that testing with production is risky as there is a higher chance of the data being leaked. Yes this is a valid risk and is a problem.

But how do you actually obfuscate the data? What information needs to be protected and not linked together? Names, DOBs, IRD Numbers should be protected. But how to do it? You need to think about what application logic that could be affected by it.

For instance can't just pick a random birth year as for instance you are on a benefit system and well only people over 65 can get Super. So after the obfuscation you many have people younger then 65 with super and this could cause errors that would never happen as the front end validates stuff. But also the month can be important when you turn five you can go to school and when you turn 16 you can leave school so in this case the day and the month are important.

Ok names that should just be repopulate from a dictionary. but does you dictionary of name push all the limits that could be associated with that coloumn storing the data and the fields displaying the data? What happens if there is a really long first name and surname because you dictionary may only put a long first name with a short surname so you might miss a bit where the combined length breaks the gui or some report.

But now organisations are starting to integrate all there different systems. What data is used to join the pieces of data together? Because names, DOBs, etc are often used in keys.

So there are cases where you do have to use really data and obfuscation may not always work.