My current assignment requires me to test in a large number of ~ 200, usually ranging between 1GB to 20GB.
The data is in same schema with coordinates.

As of now the approach is to load these files in a database (PostgreSQL) and then fire test cases using .
Running test on all the files (~80 GB) takes anywhere around 7/8 hours.

Even though the process is majorly automated, there are few drawbacks, namely a lack of proper reporting and manual intervention.

I want to optimise and improve this process.

Is the above approach is the best way of data?
Are there better ways to do this?

What are the you guys use to test data?

Can data languages like R / Scala be useful for this kind of testing?

Looking for suggestions and pointers.


Source link


Please enter your comment!
Please enter your name here