Sorry not following what you mean here when you say “check it against table 1 when it’s entered”.
Tried this before, also as a temp table and ran into the same performance issues.
Sorry not following what you mean here when you say “check it against table 1 when it’s entered”.
Tried this before, also as a temp table and ran into the same performance issues.
Well if this was say sales data and you controlled the code where it’s entered into the table, you could do stuff there. So one at a time you’re either adding a column to table 1 or table 2, based on a match with the other, when the data is entered. But if it’s tables you’re getting whole from somewhere else you don’t control, then you can’t do much.
What about running a job on table 1 every night where you get a dynamic list of the lengths?
The gotcha is if you do that you’re making 7+ chained calls. Instead you should divide the count by the results per page, ceil it, and then do all 7+ calls at once in (js) Promise.all.
Ah I see, no i don’t have any control over that.
Re the dynamic lists, I ran the query just using a hatdcoded list and it took 15 mins to run, so I assume a dynamic list wouldn’t improve things?
No it would just mean you don’t have to change your code every time a new length appears. Also if a length disappears you wouldn’t waste time on it.
How long are both of these tables?
Agree, but the 15 min vs 1 min difference is going to fly with the users.
Table 1 - 3 million records
Table 2 - 400 million records
Right - but I’m saying maybe you could run the job to get the lengths every night or something, so it’s ready whenever you want to run the main query.
Is this VIN numbers? I’ve had to work with those before. They’re a lot of fun. Especially all the zeros entered as O, l vs 1, etc.
But haven’t I effectively tested this already by creating a hardcoded list and joining to that, which took 15 mins? Sorry I’m not the most technical so I might be missing something obvious.
Nah they’re bond trade identifiers.
Ok which query was it that only takes a minute?
FWIW a complex join of 3 million into 400 million isn’t the kind of thing you expect to happen super quick. It’s more like the kind of job you put on a schedule and make the results available once a day or something.
The query where I union 5 queries together, where each join has the length hardcoded.
table1.attribute = substring(table2.attribute, 4, 8)
Union
table1.attribute = substring(table2.attribute, 4, 9)
Union
table1.attribute = substring(table2.attribute, 4, 10)
Union
table1.attribute = substring(table2.attribute, 4, 13)
Union
table1.attribute = substring(table2.attribute, 4, 16)
Right. So can’t you dynamically generate that query before running it? If the number of lengths has to be absolutely up to the second, then maybe not. But if you can run the length job on a schedule, then use that to dynamically generate the query then run the query - you should get the same results, right?
I’m not sure i have the permissions to dynamically create the query, but that certainly seems like it would be the option to explore.
Thanks so much for your time and patience!
No problem I enjoy data stuff.
There might actually be a slicker way to do what you’re trying to do within the structure of Bigquery. I’m not familiar with that and I suspect no one else here is either. I assume they have forums or some resource.
Does it help to pre-trim the first 4 characters from the table2 attribute?
I’ll give this a try too
haha wanting to relax on a sunday but being on call
my phone starts buzzing with weird failures I dont normally get on a rotation. chat rooms buzzing. cloud outage, ok. can’t bring my stuff back. error messages really weird and bad.
check my email after lunch, 40 more alarms on fire. email from cloud provider: “We have identified the root cause of the thermal incident in this availability zone” AKA “OUR SERVER LITERALLY CAUGHT ON FIRE LUL SORRY”.
it actually is the jackpot of on call because you cant really do anything since everything is in cloud now
cloud computing is great. server on fire today and I’m on call, if it was on prem I’d be having a horrible weekend, instead I just communicate with the cloud provider, fail over to another cluster, and call it a day while I watch graphs come back online. plus, I’m sure we get some kind of voucher now.
So here’s one, new guy comes on my team from a project that has been cancelled or something, (I’m tech lead) and like 3 times now he schedules time because he can’t figure out how to do this feature, which while somewhat complicated isn’t anything special and should be doable by anyone who knows anything about react/js.
On these calls I’m like “uhh you go in and debug it and figure it out and do the needful here” while thinking “yeah this guy just wants me to do this for him” and am just not gonna do that IDK what do I even do? Tell manager he’s not working out?
If it should be doable for someone coming in fresh to the team/project then it’s definitely a red flag. That said I’d probably hand hold him through it this one time and see what happens next time.
Anyone know anything about laptops? I’m looking for one that will run Visual Studio and SQL Server Express without grinding to a halt. A decent display would be nice too, reasonable battery life. Don’t care about graphics, don’t really care about weight.