Programming

jmakin · September 13, 2020, 1:34am

woops I didnt see the second link. so weird to me it is a blocking issue.

jmakin · September 13, 2020, 1:41am

LOL the more I read about this the more I am amazed at how bad it must be to work with that. There’s literally no support for it whatsoever other than what people have cobbled together through trial and error. Oh my god. What a hellscape.

ChrisV · September 13, 2020, 1:44am

Not entirely sure it is anymore, I connected to the same VPN provider locally and the curl command works. My friend handled installing the VPN on the cloud so maybe he screwed it up but I doubt it. I think it has to be some sort of blocking issue though, given the reports. Possibly a few VPN IPs work but most don’t.

Re working with the API, as far as I’m aware there is no API anymore. There are JSON endpoints that are used for providing the data for web pages and you can cobble together an “API” out of hitting those endpoints, but there’s no official support for this. In fact, as can be seen from the IP blocking etc, they are hostile to this. They sell the data to companies, some of which are official resellers. Some data, such as granular shot by shot data, is no longer available at all, even on the webpages.

Edit: Yeah I just tried another VPN IP, a US one this time, and the call failed.

jmakin · September 13, 2020, 3:30pm

My initial reaction which I edited was it seems like people are jumping to conclusions about it being an ip blacklist. We do a lot of blacklisting/whitelisting of IPs at our company and my impression of it is that managing it is a huge pain in the ass. I’m a noob though so maybe it’s just a PITA at my company.

I guess it’s possible I just can’t imagine what it would look like on their end. Maybe someone else can shed some light because I’m really curious. Blocking ALL cloud provider inbound traffic seems like a nuclear option. It would likely make your own services and shit not work unless you also maintained a whitelist of your own services. That’s such a pain.

The ipv4 vs ipv6 thing I want to try. But if a bunch of people are saying it, it’s probably true anyway. I did more digging and this is the conclusion many have come to.

Curious also, what made you want to start such a project? It legitimately sounds like a nightmare.

jmakin · September 13, 2020, 3:38pm

On the topic of cloud providers, we are a hybrid cloud model so I do a lot of stuff in both AWS and GCP. GCP is soooooo much better of a UI it’s a joke. I love the gcloud console UI. It’s intuitive, simple, easy to use. Very easy to find stuff. The amazon dashboard is a nightmare in comparison. I like google’s documentation much better too.

It’s so much nicer to use, in my next job hunt, it would affect my choice of where to work.

jmakin · September 13, 2020, 4:10pm

Having no problems with the curl command in a gcp vm.

edit: definitely having problems with AWS. The curl command hasn’t timed out yet but I’m not sure what the default timeout is on this instance. It’s been several minutes.

grue · September 13, 2020, 7:55pm

There’s an API that is essential a “get resource”. It currently responds with a 200 every time it successfully goes through.

Some times that resource has been disabled (but not deleted in the database), and the JSON it responds with has that information in it.

Should the API still respond with a 200, or should it reflect the disabledness in the status code, like maybe a 410 GONE or 302 FOUND or something?

API should return 200
API should return something else

0 voters

suzzer99 · September 13, 2020, 8:01pm

I think 500 is ok here. Or maybe one of the other 500s.

Unless this is a request the client should know not to make - then one of the 400s.

It really depends on your client audience and overall strategy imo. If 3rd party clients I’d definitely do one of the error codes so they know something went wrong.

If internal clients and they really want 200 then that’s fine.

My only non-negotiable is human-readable error keys (IE - “RESOURCE_DISABLED”) in a standard place for all errors. IE - {Error: {Key: “RESOURCE_DISABLED”}} or similar.

jmakin · September 13, 2020, 8:04pm

The web seems split on this but I’ve more often seen 200 OK returned with an error body response than the other way around. I think some of it depends on what kind of error too.

HTTP is at the application layer. If the message was successfully received and processed, I believe 200 is correct, regardless of what the business logic determines about your message. The HTTP request was successful, therefore 200 OK is appropriate.

Basically I dont think the HTTP layer is the appropriate layer to reflect business logic errors.

grue · September 13, 2020, 8:04pm

I’m trying to figure out the strategy yeah/what is industry standard here. My gut says 200. The call to get the resource has successfully gone through.

suzzer99 · September 13, 2020, 8:09pm

It can go either way regarding status codes. It depends on if you consider these first-class errors, or just non-happy-path part of the normal flow.

Mainly it depends on what you think will make things easier for the client.

And actually now that I think about it - if you’re going to use an error code - use a custom one. If you use 503, then those could get mixed up with a 503 thrown by a firewall or something.

Maybe 200 for non low-level errors is safer. Or you could use a custom code for “exception, not error”.

For stuff that’s part of the normal flow, but not happy-path, like CREDIT_CARD_DECLINED, I usually go with 200 status code and an Error object like I showed. The client knows that either there is an Error object or there is a good response object. Never both.

The Error object must contain Key, but could also contain a numeric code, a level (ERROR, WARN, INFO, etc), a full text message and anything else you want. So it’s easy to write cross-cutting client code. IE always show Errors in a red lightbox, warnings in yellow in info in green.

bobman0330 · September 13, 2020, 8:41pm

I voted non-200, but this is pretty compelling, so I want to disavow my vote.

suzzer99 · September 13, 2020, 8:44pm

Yeah I agree. Never mind that first stuff I said.

ChrisV · September 14, 2020, 6:58am

I wrote some code to integrate a rotating proxy service called SmartProxy and paid for a month of it. I confirmed that it’s using different IPs for each request. They’re residential IPs from all over the world, nothing in common, no shared subnets or anything. The requests all time out again. I suspect it’s something the proxy is doing “wrong” on its requests as IP ban seems so implausible but I can’t easily debug it as their server just doesn’t give any hint as to what I’m doing wrong. This is really starting to get on my nerves, lol.

suzzer99 · September 14, 2020, 7:37am

This is weird.

You have to start wondering if maybe there’s some crowd-sourced database of all VPN IPs out there that mega-sites like nba.com pay a lot of money for.

This seemingly has to be IP based.

It also clearly illustrates the difference between a site that wants to block all traffic from VPS - vs say a poker site that just wants to make a token effort but still mostly just wants your money.

ChrisV · September 14, 2020, 8:11am

It gets weirder. Here again is an example curl command:

curl --compressed -H "Accept: application/json, text/plain, */*" -H "Accept-Encoding: gzip, deflate, br" -H "Cache-Control: no-cache" -H "Connection: keep-alive" -H "Pragma: no-cache" -H "X-Nba-Stats-Origin: stats" -H "X-Nba-Stats-Token: true" -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36" -e "https://stats.nba.com/teams/traditional/?sort=W_PCT&dir=-1&SeasonType=Playoffs&Season=2019-20&DateFrom=09%2F11%2F2020&DateTo=09%2F11%2F2020" "https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=09%2F11%2F2020&DateTo=09%2F11%2F2020&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Playoffs&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision="

If I just go to cmd in Windows and execute that on my normal connection, it works no problem.

If I connect to a US IP on my VPN provider, it times out.

BUT. If I go to the page which provides that data, the JSON request on there succeeds via the US IP.

So it’s not an IP ban. I’m struggling to work out wtf the difference is though. Chrome DevTools’ “copy as cURL command” seems completely broken. I’ve tried it in cmd and bash and it’s just incorrect syntax.

If anyone wants to have a crack at this, I can give you a login to the VPN provider in question.

ChrisV · September 14, 2020, 8:44am

I am descending into Lovecraftian madness trying to figure this out. I can clear cookies and then replay the XHR and it works. But I can’t get Chrome to convert it into any kind of command that will work. I’ve tried curl and fetch.

ChrisV · September 14, 2020, 10:24am

Holy shit that was a nightmare. Their server is incredibly specific about what it wants. One of the issues was that I needed to send an Origin header for CORS purposes, but the last thing was, I was sending Connection: keep-alive and I needed to send no Connection header instead. I have no goddamn idea why this is the case. It’s very hard to debug when any slight deviation from what it wants is met with no response to your request.

My assumption is that the reason basic requests worked on some IPs and not others is that they have different servers serving different locations and that some of them have CORS working and whatnot and others don’t. Who knows.

dbvm · September 14, 2020, 11:32am

Don’t know if you’re familiar with it, but in these cases I usually use ngrok to tunnel outgoing requests from the cloud to my local machine, to be able to inspect how they are sent out exactly.

Edit: I used to have a similar problem when trying to call a web api that was extremely specific about the headers it needed form Azure Data Factory. ADF was adding some perfectly reasonable but unwanted headers that made the api ignore my request. Without ngrok I wouldn’t have been able to see what exactly was causing the problem.

suzzer99 · September 14, 2020, 3:52pm

Get Postman or Advanced Rest Client. Match every header the browser is sending.

Also hit your test site to to see what headers are coming through and look at the case closely. One of them changes the case, one doesn’t. But I forget which.

Case shouldn’t matter in http headers. But nba.com might be playing with that.