Programming

Not sure how they would know as long as you don’t hit it too often and use a realistic user-agent.

1 Like

It depends on how much data you want to scrape and how quickly you want to do it. If you can get what you want while making the requests look like a real user it’s trivially easy.

1 Like

I spend half my life doing this sort of thing. You can get historical closing lines from OddsPortal for some sites. Pinnacle has an API available, so you don’t have to write a scraper. Some sites have antihammer protection (i.e. they will give you a timeout if you make too many requests too fast and might IP ban you if you do it constantly) but it’s rare to run into more resistance than that. In extreme cases you could just use a rotating proxy anyway and hit the site with requests from all over the world.

1 Like

Did you ever get around what nba.com was doing?

Cool, sounds like a good beginner project then. I just want to grab alt spread/total and 1H/1Q lines a few times on gamedays mostly and then analyze them, doesn’t need to be high-speed or frequency.

They’re just super particular about what HTTP headers your request has to have and not have. For some reason (different server serving the requests, I guess) these header requirements didn’t apply to requests from Australian IPs, which complicated me working out what the problem was as it looked like some difference in environment between my local machine and cloud server.

Consuming the Pinnacle API is a decent beginner project. You could roll your own if you want to learn how to do that, depending on what language you use there are also already client libraries out there, still plenty of scope for learning in figuring out how to use the library to make requests and what to do with the data once you get it.

Scraping OddsPortal is not so good a beginner project, it’s a bit convoluted.

1 Like

Just remembered, the Pinnacle API requires an active account (they sometimes ban people if they just use the API and don’t bet) and it’s not accessible from US IPs, so idk where you are but there might be some hurdles to get over there.

HTTP status code question. Say you have a website with a route “object//detail” that maps to a view that shows the details of a private object with id . What are the right response codes in these situations?

  • User not logged in at all. I think is 302 to login page, or 401 if an API.
  • User logged in, but does not have appropriate permission. I think this is 403.
  • User logged in, but does not map to an existing object. I think this is also 403, so that you’re not leaking any information about what objects do or don’t exist.
  • ‘obejtc/501/detail’. Obviously a 404.
  • ‘object/5o1/detail’. Also a 404 (whether or not the user is authenticated), since it’s not a valid route? Or should this also be a 403 for coyness’ sake?

The last one should be a 400 Bad Request. In the obejtc example, the route does not exist. The route object/id/detail does exist, that’s just not a well-formed ID. This doesn’t give anything away since of course a badly formed ID does not exist. Return 403 for well-formed IDs which either do not exist or where the user doesn’t have permission.

Edit: It’s not a big deal if you do return a 404, but basically as soon as anything actually hits code handling a route, that’s a 400 not a 404.

Edit 2: If you’re super worried about concealing your object IDs, don’t use a sequential integer key, since people can just guess IDs anyway. Use a GUID or something.

1 Like

I’m not sure if I’ve ranted before about the scourge of the “zero config” philosophy. But basically no, serverless_next, you should never create randomly named things all over AWS that no one will know what they are and serverless remove doesn’t even remove.

Just make me give you a freaking name for the app so I can track down all these little turds you drop all over AWS. A minimal basic forced configuration is much better than contorting your app into a stupid pretzel to reach zero configuration.

Hmm I’ve never seen a 400 returned for that, just 404, though 403 is fine if its critical to hide info.

I’m not actually sure how most routing services would handle this, personally I think if your server does know what resource is being requested and you’re able to tell the client what’s wrong, you should, rather than just for no reason denying that you know what the client is talking about. Like if you provide a malformed parameter in a POST request to a valid URL, that’s obviously a 400. It seems weird to me to vary this based on how the parameter is provided, it’s still a bad parameter. I guess technically it’s a 404 because the URL is bad, but meh. Doesn’t matter much either way.

IMO I’ve never seen a 400 returned for a bad URL in the request, only for bad parameters in the request body. AFAIK I’ve never seen a 400 returned for any GET. Could be wrong.

Is anyone doing Advent of Code this year? If there’s interest, it might be fun to set up a private leaderboard for bragging rights and/or repo to share solutions. Prior years are here.

Looks fun and I did get the right answer on the first one from last year (I’m sure others are way harder), but dunno about racing to do this. I’ve pretty much only been doing C++ recently and have to look up everything I’m doing in python (what I used for the first example).

Anyone here use Azure? I’m in the midst of writing my first app on it, have got to the point where I need to add some sort of logging, googled for help and it’s all pretty incomprehensible. One of those situations where it’s like “you can use anything!” and I’m like OK, sure, but what SHOULD I do.

One of the big problems in googling for assistance is the conflation of “logging” in terms of API calls, security logs and that kind of service-layer stuff and “logging” in terms of my code did a thing that I want to make a note of.

I just VPS all the things

My one experience with Azure was pretty similar - painful. I’d put AWS, Heroku and Digital Ocean all much better than Azure from a usability POV.

You want application logging. What language are you using? You should be able to just log and it will show up somewhere. Every running app should have a link to logs which will be a mix of system-level stuff added by Azure and whatever you log out.

Yeah I figured it out, wasn’t too tough in the end, just had to find the right set of instructions. I haven’t written in C# for like 5 years so I’m having to remember how everything works at the same time as learning how Azure works, not to mention .NET Core was only just a thing back then and things work slightly differently in that, the built in DI container is new, etc etc.

It’s nice to come back to C# after Java though, where the language helps me out instead of constantly getting in my way.