As of now the approach is to load these files in a database (PostgreSQL) and then fire test cases using Python.
Running test on all the files (~80 GB) takes anywhere around 7/8 hours.
Even though the process is majorly automated, there are few drawbacks, namely a lack of proper reporting and manual intervention.
I want to optimise and improve this process.
Is the above approach is the best way of testing data?
Are there better ways to do this?
What are the approaches you guys use to test data?
Can data languages like R / Scala be useful for this kind of testing?
Looking for suggestions and pointers.