You’re giving this person a lot of credit. It’s probably all in the same table and this idiot is probably doing something like a for-loop over an integer range (the length of the table) where it pulls the entire table down every iteration of the loop, dumps it to a local file, and then uses plain text search or some really bad regex’s to find the data they’re looking for.
Considering that is nearly exactly some of the answers I’ve received during the technical part of interviews for jr data eng, you’re probably not far off.
Shit I’ve seen solutions done up that look like that, fighting the optimiser every step (amongst other things)
I have to admit I still have some legacy code that does that.
Then I found pandas. Life changed for the better.
Now I have lots if old code that I’ll update, “one day”.
However, even my old code, terrible as it is, does not overheat anything, and can process massively larger sets of data than 60,000 rows without any issue except poor efficiency.
You’re giving this person a lot of credit. It’s probably all in the same table and this idiot is probably doing something like a for-loop over an integer range (the length of the table) where it pulls the entire table down every iteration of the loop, dumps it to a local file, and then uses plain text search or some really bad regex’s to find the data they’re looking for.
Considering that is nearly exactly some of the answers I’ve received during the technical part of interviews for jr data eng, you’re probably not far off.
Shit I’ve seen solutions done up that look like that, fighting the optimiser every step (amongst other things)
I have to admit I still have some legacy code that does that.
Then I found pandas. Life changed for the better.
Now I have lots if old code that I’ll update, “one day”.
However, even my old code, terrible as it is, does not overheat anything, and can process massively larger sets of data than 60,000 rows without any issue except poor efficiency.