Programming

Motherfuckers have a config setting that says “Validate using Banana” which actually requires an Apple to validate, and then they have another config setting that says “Validate using Orange” which actually uses an Apple to validate. FFFFFFFFFFUUUUUUUUUUUUUU!!!1111

wut

They were too lazy to update the configuration labels to match what they actually did. I’ll be glad when we get everyone off this POS.

My boss went to Re-Invent and is now all excited about AI-assisted programming. I’m sure copilot can help with the 5% of my day I spend actually coding. But not sure we gain much overall.

Maybe you can get product to write requirements in the form of a prompt going forward! I imagine fixing AI code is worse than fixing jr programmer code because you can’t mock the AI’s haircut or taste in music while you’re doing it.

I’m working on a tricky problem right now for our auctions app and thinking through what automated coding bot could do and couldn’t do as I go along. Here’s a summary:

Basically I need to send an email to to everyone who won an item(s) just after the auction has ended. I have the function that sends the emails. But to schedule it to happen when the auction ends, I need to be able to create, update and remove an AWS scheduler programmatically from the auctions app. (I’ll explain why I can’t just use the app to do it below).

I used ChatGPT to get create some basic lambda handler code to programmatically interact with AWS EventBridge. I even used it to explain to me what the roleArn property should be, how to create the role, what permission and trust relationship to assign. I was surprised that it was able to do this. IAM is always so damn confusing to me - what’s the source, what’s the target, should I create a new role just for this, should I used a managed permission set or roll my own, etc.

So kudos to ChatGPT on that. But here’s all the stuff so far that I think still needs a human to figure out and I don’t see how copilot or whatever could ever realistically decide. Also, not coincidentally, these are the kinds of decisions a junior programmer could really go down the wrong path with and make a mess of.

  1. Why not just have the app trigger itself to send the emails when the auction is done? It’s because we’re using AWS AppSync, which is really just a collection of lambdas, which are just inert functions that get executed at run time. There’s no running server to run some kind of internal chron job with.

  2. How to solve this? My first idea was just to have a simple node server running on the side that could interact with AppSync and trigger the emails to send at a specific time. But that would involve creating some kind of EC2 instance, either through Elastic Beanstalk, or manually, and a bunch of other permissions and communication type stuff I didn’t feel like dealing with. I do everything I can to avoid having servers to babysit.

  3. Another solution that might appeal to a junior developer is to have the admin user’s browser trigger the end of auction. But of course that requires that an admin user have their browser open at the right time, and would be a very bad idea imo.

  4. So I decided to go with the scheduler, even though it’s scary because there are a ton of moving parts, and sending out these emails to a bunch of people before the auction is over could be very bad. Which means and I need good integration tests that create an auction from whole cloth, have fake users bid on stuff, end the auction and make sure the emails go out (using Pop3 to connect to Gmail test accounts). Luckily I already have all that stuff in place. But good luck having ChatGPT tell you how to set it all up.

  5. How to interact with the scheduler? Apparently AppSync can talk to AWS EventBridge directly. But that spooks me a little because the AppSync functions and resolvers only allow a small subset of Javascript—including not allowing the ‘new’ keyword, which makes dealing with dates and time zones a gigantic PITA. Also the AppSync logs mush everything together for each request, which is getting harder and harder to trace as our app gets more complicated. So instead I have AppSync talk to a lambda that talks to EventBridge. That way if we have some crisis in real time after the auction ends, or during it, I can quickly go to the dedicated lambda logs to see what’s going on.

  6. EventBridge has separate functions for CreateSchedule and UpdateSchedule. It’s not super easy to tell if the schedule has already been created or not. So should I go through extra hoops to detect if a schedule already exists for this auction, or just delete any existing schedule and create a new one? I went with the latter but I haven’t got it working yet. If I run into weirdness (like time lags for deleting, which AWS does for some resources) I may need to revert to the first method. Also trying to delete a non-existent schedule and then catching the error will work, but it’s kind of code smelly imo.

  7. Which logic should go into the AppSync function that calls the lambda and which should go into the lambda itself. IE - how generic should I try to make the lambda? Should it be able to schedule anything for any app? Should it be able to schedule anything, but just for the auction app? Should it be able to only schedule the end of auction event for the auction app?

  8. I went with the last thing. I’ve found that trying to make things too generic too early just leads to pain down the road, because inevitably you assume a bunch of stuff that you don’t need, and find out you need stuff you didn’t consider.

  9. In general, I’ve found it much easier to just make the thing very specific the first time, then worry about sharing code when you have a second use case, and maybe not even then. Instead of DRY (don’t repeat yourself), I prefer WET (write everything twice). Then maybe the 3rd time think about creating some new layer of abstraction or mechanism to share the code/function. This is something that it usually takes junior developers a long time to figure out. Some never seem to and are always trying to write the perfect abstraction, then getting endlessly frustrated when future requirements throw a monkey wrench into their beautiful crystalline design. I’ve found it at least an order of magnitude easier to add a new layer of abstraction later than remove an unneeded one. Your boss finds this out when they ask you for a new feature on a mature app you created and you go into a conniption fit about how hard it will be, or you say no problem, we can do that pretty easily.

Anyway this is just a sliver of the thought process I’ve had on this so far, and I haven’t finished coding it yet. It’s why I don’t worry about Copilot taking my job for the near future at least. I spend about 90% of my dev time thinking about stuff like this and 10% actually typing out code on the keyboard, if that.

Oh yeah, and I did have to spend some extra time down a typical AWS cryptic error message rabbit hole because ChatGPT was adamant I needed to add a ‘Z’ to the end of the timestamp (ISO minus milliseconds), when the Z definitely should not be there. So there’s always that.

2 Likes

Scheduling the emails in advance seems like a pretty dubious design. Better to schedule an update to the status of the auction from open → complete in Dynamo and then listen to the Dynamo stream to trigger an end-of-auction handler that sends the emails along with doing everything else you need to do at the end of the auction.

We don’t have Dynamo streams enabled and it seems like that might be overkill just for this, although that does look like an interesting feature that I didn’t realize existed until now.

FWIW - the scheduler right now calls the lambda that sends all the emails because that’s all I need to do when the auction ends. But if I ever need to do anything else I will have the scheduler call a lambda that does end of auction stuff, one of which will be calling the lambda that sends the emails.

I’m surprised there aren’t other consequences of an auction concluding (account adjustments, e.g.), but even if that’s true, it’s a big red flag if you need to change your auction-initiation code to schedule a different lambda because you change what you do when the auction ends. You shouldn’t even need to worry about cancelling schedules because you should be able to handle duplicate invocations of the auction-ender lambda.

A lot of display changes happen in the auction user’s browser and the admin user’s browser when the auction ends, and any updates in the auction end time are published to the browser as they happen. But none of those affect the DB state of the auction.

Updating a schedule happens if the admin user wants to say extend the end time while the auction is running (which happened twice in our first auction). If we don’t either update or remove the original scheduler, the auction end will be triggered prematurely.

I suppose we could always check to make sure the end time the scheduler knows and the actual end time in the DB match, and ignore the end-of-auction trigger if they don’t. But that seems like it could spawn some race conditions and edge cases that would need to be thought through. I think just removing or updating the previous scheduler is simpler.

Also schedulers seem to need to have unique names within the region (or maybe account). So I can’t just keep creating more identical schedulers.

I don’t think it’s code smell at all to have a specific lambda now that does the one thing we need, and only add a more generic one later when some other action needs to happen on auction ending. When we need it, it will be trivial to create a new lambda that calls multiple other lamdbas (including sending the emails) or does whatever else is needed when the auction ends.

Our whole app ecosystem is based on this pattern: the client (usually browser, in this case scheduler) calls a specific lambda if that’s all it needs. But if more logic is needed the client can call an orchestration lambda that in turn calls 1-n fine-grained lambdas (in series or parallel) and does any business logic needed to put them together. But I’m not going to have every call go through the orchestration layer unless it’s needed.

This is backwards. If you don’t need a separate orchestration layer, then inline the email-sender lambda into the auction-closer lambda. The point is to hide the implementation detail of how you’re sending out emails (and future implementation details of other stuff that you will add to the end of an auction a couple months from now) from the auction-initiation code. Only one part of your code should need to care at all about sending out notifications when an auction ends, or even need to know that it happens.

None of this stuff will be hard to change when it becomes clear the auction initiation needs to do more than set the end of auction scheduler, or more things need to happen on auction close than sending emails. And it will be a lot easier to refactor because I’ll have solid use cases rather than assumptions.

I do agree having a DB field for AuctionFinished and having everything key off of that is cleaner than what we’re doing now which is keying off the auction EndTime, primarily because the time zones are such a gigantic PITA. I’m going to look at changing that after Christmas. The only downside is worrying about what happens if EndTime and AuctionFinished somehow get out of sync.

I’ll give a proper response to this in a few hours but this is one of those classic problems, especially in AWS, that can be solved 50,000 ways and trusting copilot or any other tool to understand the implications of anything it barfs out is a good sign and instinct to learn to start to ask human advice for. Particularly, especially particularly if your stuff or domain gets audited by anyone important and not stupid. Lots of stuff it spits out “works,” but usually my challenge to people who tell me this in my consulting sales is ok, here’s my card, ask me in 6-12 months if it is still working as well for you and let’s talk.

I think this is much more complicated than it needs to be and if your app can establish outbound network calls without much fuss, there’s a way way easier way to do what I think you are doing.

Anyone use redis to hold application state(game-style, think on-click events from a lot of users)? I make my own locks? yikes? IDK what I’m doing, scale wise. My next game is maybe 1/3 complete. It’ll make me the next localthunk.

I am only commenting on this from the AWS infra side of me, and not any particular design comments, because you have a better sense of your problem and constraints than I do, and I hope I’m not misunderstanding anything.

I’ll just say right now anything automated that claims it can maintain what you’re describing is lying. If you don’t believe me, for some reason, imagine if you will AWS could do this - why the hell wouldn’t they offer that as a product already? Or anyone else? Because it isn’t possible, yet, and if you ask me, never will be - and while I can take this opportunity to go on an idealistic spiel, that’s because organizational problems eventually become infrastructure problems, and it can go the other way too, and since all organizations are innately different, there’s no general solution. To make matters worse in this space, AWS itself offers 50 billion ways to solve a problem, so you can easily go down a terrible path without knowing or years of painful experience.

The “AWS” architect way, I think, because I refuse to take that stupid cert, would be to recommend you fire events to your AppSync, which I’ve never heard of til now and am just guessing, to some AWS lambdas that are stitched together through AWS eventbridge configs, which you are right, are super horrible to guess through and understand what’s going on, and AWS docs aren’t notorious for being amazingly helpful. Eventbridge, formerly known as “cloud watch events” (liked that name better because it actually integrates with cloud watch a lot better than anything else due to how it evolved), is a formidable and powerful tool but involves complex permissions, as you noted with IAM, that are easy to fuck up and come without a lot of guard rails. That is why I mention the audit thing, I don’t know your domain and don’t want to, but these things are very easy to mess up in a terrible way, at least from a compliance/security standpoint.

Yea I imagine these are all fired from eventbridge. You could alternatively just set up a lambda on its own on a schedule, if you don’t wanna do it the “AWS” way with events, just set up a web hook that fires on a schedule you control from elsewhere. you’re an adult, you can deal with how you communicate with it and authenticate to it (if needed). You can even define your own custom authorizers with lambdas, if you want to get really technical with API stuff. The problem with event-driven solutions when you are building them in AWS is they are extremely hard to test in development. Which is why out-of-the-box solutions probably seem appealing, but are a nightmare to maintain.

External server, if you’re more comfortable setting that up, yes! and skip app sync, why not just have it talk directly to the lambda via a web hook? Not sure I see why the app is involved, but if it has to be, just have the app fire the events to a web hook endpoint (super easy to set up) and skip the complexity of app sync!

that is terrible

one of these parts will break at some point and you won’t know how to set it up. the docs will tell you to check cloud watch logs, but oh wait, your AI advised IAM role didn’t have the permissions for that and you don’t have logs without spending a day how to set that up.

skip eventbridge just make outbound network calls to a web hook. That probably applies to the rest below, other than to also say, if you’re designing your code around infrastructure that’s a sign your infrastructure design is bad.

1 Like

I don’t think you need locks as long as you only have one redis instance. Redis is single threaded and everything is atomic. If you’re still on node and want to do a node<->redis design session/sanity check, feel free to hit me up.

It sounds like Redis locking is for using redis as a distributed lock.

Thanks. Some good info in there. AppSync is just a collection of lambdas and an API Gateway instance that form a static graphql server. We decided to go with it for this project to see if it would work, and for the most part it does.

For the record I don’t think anyone is claiming to automate all these decisions. I was just pointing out that when my boss goes to reinvent and gets all excited about AI code helpers, in the end it may make that 5-10% of my job I spend actually coding 20% more efficient or something. Not exactly a huge game changer. What it may do is eliminate a ton of junior devs, which will suck when there are far fewer new senior devs being created.

yeah I’ll take whatever code review you want to give but uh as usual I’m mostly working on the UI. Here’s a fun one though, right from the start the top level of the repo is /client, /server, /shared, but the client and server have different tsconfigs (trust me, it took me weeks to get this all working). And neither of them like importing from a ts file outside of its rootConfigs.

So I gave up and am now copy pasting every time I change a shared type =/

I avoid typescript unless I’m building a shared library that will be used by other apps. It adds some efficiencies, but also takes a lot of the fun out of developing Javascript. I’ve definitely never had to deal with a shared tsconfig.

In the long run, all the distributed-systems stuff becomes much simpler if every state change runs through Dynamo and every side effect triggers off of a state change in Dynamo. Dynamo gives you consistent updates, and it’s much easier to unit test lots of permutations of current state + event → new state than it is to try to write integration tests for all the weird race conditions and edge cases you’ll have trying to run around cancelling pending schedules.

You can refactor the details, but the basic bones of state-change lambdas that trigger when stuff happens, interact with Dynamo to get and update the current state, then trigger all the side effects, is nearly always the right basic choice. Letting stuff happen without running through Dynamo makes things much more complicated and buggy.