Fun Poster Data

Don’t tell the others but riverman is actually George Soros and he sends me a weekly stipend for likes.

Please don’t share this DM publicly.

3 Likes

Interesting. I don’t think there’s any data in there that I can’t really get from the API, but a SQL query might be significantly easier than what I’m planning on doing.

Right now I am gonna work on getting all topics across the site, then getting all posts across the site. From there I think I can do a lot. One really irritating thing about the data that comes back on individual posts - There’s no fucking URL field for the post! It’ll give me the entire post, and a lot of data about that post, except the frickin url.

in b4 OMG HE’S STEALING THE SITE!

6 Likes

The GOP would defnitely NOT like to offer you a job as a vote tabulator.

Scrape the data imo

-72o

I just discovered that you can see who has liked your posts the most, so thanks @RiskyFlush for being my #1 fan!

1 Like

Summer 2022 I’ll see you there.

1 Like

You made it easy :+1:

1 Like

I would like to see which active users have liked my posts the least so I can make a special list.

:star_struck::relaxed:

4 Likes

I liked this because it was clever, but also because I want to stay off any bad lists.

3 Likes

I posted this tweet yesterday and thought it would get a few laughs but I got ponied and IF that’s not bad enough all you lot go AND FUCKING LIKE THE PONY POST… THIS IS PURE FUCKING MENTAL MAN.

Bastards

/S

3 Likes

mental pony posts ftw :metal:

giphy

8 Likes

Hit a little bit of a wall here. The API does not have good support for getting all posts from a topic. Basically, it’ll return the last 20 posts and then a field called “stream” which is simply the post ID’s for every post in the topic. Then you are supposed to query each of those post ID’s to get the post data. You can receive more posts, but only up to 1000.

But, there’s like 100k+ posts on this site. I don’t want to send a GET request for each one, it’ll take all day. Granted, it could be a rare operation, but still, ain’t nobody got time for that. Maybe I’m missing something. Their discussion forums aren’t being helpful on the matter and recommended the method I just suggested.

3 Likes

Sounds like a SQL join, no?

Or would you have to send the SQL individually with each of the 100k IDs paramterised?

caveat: I have nfi how the web stuff works

If this was in a SQL database yea. Maybe I have to use gregorio’s plugin - but I am not an admin here. The problem here, if I did not state it well, is I have to submit a HTTP request for each post I want. The round trip time of one request is like, a half second? That’ll take several hours to get the whole site’s post data. Which is theoretically fine, I could just create a DB and then update the DB once a week or so.

All I am doing, basically, is hitting a specific URL on this discourse site and it spits back data out at me. Then I iterate through all the URL’s that are available on a particular piece of data I’m asking for.

You can probably do that in an SSIS package and dump the output into one large staging SQL table, and work from there.

Once that’s done it’s easy to create an Agent job running periodically.

Wouldn’t blame you one bit if you don’t want to take it on, though.

For instance if you want data on the donald trump thread:

go to this link: The Pozzidency of Donald J. Trump: Typhoid Donnie's Slow Hypoxic Demise **Sweat Thread** (updated 100x/minute)

2505 is the thread id of the trump thread. I’m just doing GETs of these urls.

And then repeating for each ID in that comma separated list? hahaha nightmare.

Yea I feel a weird need to do programming stuff on my days off now. I was hoping the brute force, naive approach would work, but it seems like it probably won’t in a timely manner. I’ve had to engineer crazier things to get around rate limits and stuff for REST APIs (don’t even get me started on rate limits, fucking hate them) so I can probably figure something out but it won’t be like a few hour thing like I was hoping.

Oh well, I like a challenge. I’ve needed a side project for a little while now and I never have any ideas on what to do.

1 Like

Can you export that list of IDs into a single column in SQL?