Here at Data Ladder we know we have the fastest, most accurate data matching and cleansing solution. So when we received a report from ESG Labs showing the performance of Trifacta’s Wrangler Enterprise, we couldn’t resist seeing how DataMatch Enterprise compared.
I will say at the outset that the test was a little skewed – we didn’t run our tests on the same hardware. In fact, the hardware we ran our tests on was about one quarter the power of the platform on which Wrangler Enterprise was tested.
In the real world, a fair comparison would be carried using identical hardware. ESG Lab carried out their testing on a single edge server (dual eight-core, Intel Xeon processors; 128GB RAM) within a Cloudera Hadoop cluster.
For our tests, we settled on a desktop PC running Windows (spec) within an office in Connecticut. There’s a couple of reasons for this:
- We know that not everyone has access to a platform with the type of specs that ESG Lab ran their tests on. Most people tasked with data cleansing and matching are going to be running on much lower spec platforms – like desktop PCs.
- The results are impressive enough given the difference in platforms. If we ran the tests on an identical platform it would have been highly embarrassing for Trifacta, and we wanted to be nice about it. We are gentlemen after all.
On to the tests.
The textual transformation test focused on sort of transformations that are very common in data wrangling settings, and included merging, extracting, creating, sorting, and joining text. Seven textual transformations were tested by ESG Lab, we settled on testing three features: merging, extracting and sorting. In each test, we used 1GB of data to compare with Trifacta’s Photon Compute Engine and Apache Spark.
For the merge test, two columns were merged into a new column with a comma separating the inputs. For the extraction test, specific text based on a predefined search word or character pattern was extracted. For the sorting test, a column was sorted into alphabetical order. We used 1GB of data in each test, and even though I say so myself, the results were impressive.
Firstly, on all tests, Trifacta’s Photon outperformed Spark, so I’ll save some typing by not comparing DataMatch Enterprise to Spark and instead focus on comparing Trifacta to Data Ladder.
In the merging test, DataMatch Enterprise completed the merge in 20 seconds, compared to 50 seconds by Wrangler Enterprise.
In the extraction test, the results were the same: 20 seconds for Data Ladder compared to 50 seconds for Trifacta.
In the sorting test, DataMatch Enterprise completed the sort in 12 seconds, where Wrangler Enterprise took 25 seconds.
Even using one quarter of the processing power, DataMatch Enterprise outperformed Trifacta’s Photon Compute Engine and Apache Spark
Credit where it’s due
In fairness, we have to give props to Trifacta for Wrangler Enterprise. It’s a good tool with some impressive features and result – but it seems clear that, once again, Data Ladder has beaten a so-called ‘industry leader’.
If you’re looking for a data wrangling tool that is fast, accurate and inexpensive, you owe it to yourself to include DataMatch Enterprise in your evaluation, and we’d love to help you.