Programming

bobman0330 · February 11, 2020, 6:42pm

goofyballer:

A couple random interesting items from today’s Hacker Newsletter:

Why Discord is switching from Go to Rust

With the Go implementation, the Read States service was not supporting its product requirements. It was fast most of the time, but every few minutes we saw large latency spikes that were bad for user experience. After investigating, we determined the spikes were due to core Go features: its memory model and garbage collector (GC).
…
After a bit of profiling and performance optimizations, we were able to beat Go on every single performance metric. Latency, CPU, and memory were all better in the Rust version.

I thought Go was, like, supposed to be good at this sort of thing? This article makes it seem like Java Lite with less factories. What is Go for, exactly? I worked with it a bit and it’s not bad (Trumpbot is written in Go!) but for serious projects, I’m not sure what set of criteria at this point would result in saying “yes, Go is definitely the best tool for this job”.

This one I like much better. Let me see if I have this right:

Rust doesn’t have garbage collection, per se, it just deallocates memory when the variable that owns it goes out of scope. This means that the code has to impose a bunch of restrictions on permitted references so that the memory can’t be deallocated while still referenced. Not sure about this, but maybe it means that there’s some overhead whenever you leave a scope, because there’s a mini-gc event?
Go does garbage collect, which means that it has to run an expensive gc process every now and then. Presumably this makes it easier to write code, since you can reference data promiscuously without having bad references?
One point that is unclear to me is whether Go can deallocate more garbage. In other words, is Go able to dynamically clean up things that are in-scope (but not actually referenced), or is it just that Go keeps objects that are not themselves in-scope alive if you have live references to them?

jmakin · February 11, 2020, 6:49pm

The overhead when leaving a scope should just be a call to free(), right? Memory deallocation is cheaper than malloc. I don’t know by how much but I would guess by a lot.

suzzer99 · February 11, 2020, 6:59pm

Starting a business is just something I’ve never had in me. Unless you count putting up a sign at my local neighborhood grocery store to due hedge trimming and the like for the summer. Cost - 1 electric hedge trimmer which I think my uncle paid for because I was living with him and we needed one, and a professor with a Mac to lay out the fliers.

That’s hustle right there boy. Good for a nice little side income for a couple years. My rent was $165. Food, gas and beer money on the weekend were pretty much my only expenses.

I did not trim that hedge btw. It was at my university.

microbet · February 11, 2020, 7:05pm

Working for people is something I’ve never had in me, but I was eyeing a listing for a programming jerb at UCLA recently.

suzzer99 · February 11, 2020, 7:22pm

I have no ability to get my own ass out of bed in the morning w/o a boss/job looming over me. I mean who am I letting down? Me? Fuck that guy. I’ve let him down so many times before. He should know better than to even count on me.

Even the bush trimming thing was bad. I would schedule a job on Saturday morning and then go out and get wasted or trip acid or something on Friday night - because I was 20 freaking years old.

Then I’d usually cancel with some lame excuse. Most of the people would tolerate one cancel - but some old guys would cut you off right there.

But if I was already on strike 2 I’d usually rally to go do the job. Yard work in 90 degree heat coming down off acid with no sleep is fun. The sweat that would come out of my body reminded me of the semi-transparent congealed fat that comes with a canned ham.

jmakin · February 11, 2020, 7:31pm

You can’t do the whole site, like array section doesnt have a bash option. But I’m working my way through the string manipulation problems.

Here’s a fun one I didn’t know. You can reference a specific char in a string in two different ways:

${string:1:1} will get the first character from string

Or, you can use cut:

cut -b 1 <<< $string

Cut can also replace awk in a lot of situations by using the -f option. Handy command I had never really played with. Pretty basic I know but I always just used awk for columnar operations.

Here’s a problem I was given during an interview. Say we have a NxM matrix of characters. We are only interested in printing the rows that start with ‘x’.

This is easy in most languages but in bash sometimes easy things like that can be tricky. I initially had a nasty solution using awk and grep and a lot of pipes but this was a much cleaner solution:

#!/bin/bash

while read line
do

col1=$(cut -d ’ ’ -f 1 <<< $line)
[[ $col1 == ‘x’ ]] && echo $line

done

jmakin · February 11, 2020, 8:16pm

another funny thing about the hackerrank challenges in bash. There’s an added layer of difficulty - bash is too slow to finish a lot of tests in the allotted time. So you get test failures due to time outs even if your solution is pretty clean and looks optimal. I’m not entirely convinced all of them are solvable in the allowed time limit.

jmakin · February 12, 2020, 2:46pm

suzzer trigger warning

A bash quirk I learned today while writing a word frequency counter (lists in descending order the freqs of each word in a file) which was heavily using bash arrays, which I have not done a lot with.

referencing all elements in the array is done by these two syntaxes:
${array[@]}
${array[*]}

they are described as being basically the same. I thought they were. HOWEVER…

When using double quote expansion, the * notation will expand the array by whatever IFS is set to. For example, setting IFS to newline char and then echoing with the star notation (in double quotes) will print the elements of the array each separated by newline.

example:
array=(a b c d e f g)
IFS=$’\n’
echo “${array[*]}”

output:
a
b
c
d
e
f
g

Lol. The only reason this semantic came up was because I wanted to sort all the words in the file before I counted their frequencies. Well, sorting an array is not so straightforward. “sort” command only sorts files by newline delimited sequences… eg, it only sorts the lines in a file. So, first I had to take my flat array, delimit it with newlines, and feed that into the sort command., which led me to this insanity. As far as I have read, this is the only difference between array[@] and array[*].

Bash man. Hahaha.

Another funny thing I learned while optimizing - the bash interpreter is really fucking slow compared to the built in commands (which is like, LDO but I didn’t know that it was so massively different). My initial implementation tried to use loops to sort, but sorting Jane Eyre timed out that way. the “sort” command takes a fraction of a second, and the total runtime of the word frequency script was 8 seconds for that long ass book. Pretty good IMO.

bobman0330 · February 12, 2020, 4:17pm

Thanks, this is helpful. As you can tell, my understanding of how memory actually works is limited… On the scope vs referenced point, what I thinking about is a situation where you declare let x = 5;, but then never actually use x. Or in a more general sense, if a value is in-scope, but there are no logically reachable references from where you are in execution. But on further reflection, it seems that those kind of situations maybe are either obvious enough (like my example) that the compiler could simply clean it up, or else so complicated that the gc algorithm can’t decide? Some reading on wikipedia has made me believe that it works like this:

Things on the stack are inherently not garbage.
The “things in the stack” include reference to all the heap data that is bound to in-scope variables.
Therefore, data that has a direct in-scope reference from the perspective of code is never garbage, since the meaning of having an in-scope reference is that there’s a reference in the stack pointing to you.

Good so far? Now if this was all there was in the world, that would seem to make gc trivial: when you delete a reference on the stack to something in the heap, you just recursively free everything else in the heap that’s referenced by the thing you just freed up. However, there’s at least one reason that can’t work, which is that there could be two references in the stack to something in the heap (right? the thing and a pointer to the thing? which from a low-level perspective are both just memory addresses in the heap??). So if you delete one reference to a thing in the heap, you can’t just automatically burn down the whole recursive tree, because there could be another reference to the thing you just burned down, which is now a dangling pointer. And you have the same problem if the code explicitly free()'s the memory when it’s done with it, but fails to confirm that there are no other references to it.

I spent all last night in a Rust rabbit hole, but I still don’t exactly understand how its references work. The key idea is that every object has a single programmatic owner in code. What I think that means is that it’s guaranteed safe to burn down the recursive tree when you delete the owner’s stack reference. This is safe, even though there can sometimes exist other pointers in the stack to the same address in the heap because [reasons]. I think the reason is basically that the compiler validates all non-owning references and won’t permit references that aren’t structurally guaranteed to be out-of-scope by the time the owner goes away (either by being explicitly free’d or going out of scope).

What I’m still not getting is what the downside is for Rust to handle it this way. Why does any language garbage collect rather than automatically deallocate and forbid potentially dangling pointers? It seems like the answer must be that there are beneficial uses of pointers that a programmer can understand will never dangle, but the reason is too complicated for the compiler to understand? Not sure what those are though.

suzzer99 · February 12, 2020, 5:17pm

If you mean code that is vulnerable to SQL injection, yes - that is exactly what you’re not supposed to do.

The correct way is to parameterize user input, something like this:

db.query(“SELECT * FROM yyy WHERE zzz IN ?”, id)

Although there may be different syntax.

suzzer99 · February 12, 2020, 5:19pm

Some of the packages known to be affected are: 3d-view, error-ex ,42-cent-base, 8base-cli.

Npm has been breaking in our AWS codebuild for a while now - apparently the problem happens on older versions of npm for packages that start with a number (in our case serverless.com is trying to pull 2-thenable).

Heh, I’d love to see the bug that’s causing this. Well then again not sure about error-ex.

bobman0330 · February 12, 2020, 6:32pm

This has relevant info:

https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html

One tradeoff seems to be that, if a pointer could be assigned to one of several items at runtime, none of the candidate items can be deallocated while the pointer is in-scope. So if your program had a bunch of files in memory, and you needed to assign a pointer to one of them based on some criteria, then you can’t immediately deallocate the ones that weren’t selected, because the compiler can’t be sure that you are deallocating the correct objects. You could take ownership of the selected object, but then no one else can own it, and if they want to reference it, maybe you end up with the same inability to deallocate the others.

bobman0330 · February 12, 2020, 10:21pm

Moving on to more practical stuff, I have some Django/python/db questions:

This is very simple, but where is the normal/correct place in a server application to store discrete pieces of information that change from time to time. I have a db that will periodically need to download new content, and it needs to figure out when it was last updated to do so. Where does that go? Am I supposed to save it in the environment? Create a dinky table that just has one value?
Also pretty remedial, but one of my tables is a list of entities, and I want to keep track of former names for each entity. I think that the answer is to create a separate table with a foreign key to the entity table and then columns for the former name and any metadata. Yes? Is there any performance-related or other reason to be concerned about promiscuously spreading data across tables like this?

jmakin · February 12, 2020, 11:22pm

I got a new project today at that freelance gig. I think I’m in way over my head on this one. I had a hard time following exactly what they wanted and requested they draw up a more clear requirements doc, so they are doing that. They did admit it was a lot to throw out at me and not really super fair to me. I think their requirements are not fully fleshed out yet either, which is usually a disaster. Luckily I have experience managing these kinds of things.

It’ll likely be a 100-150 hour project but I have a feeling he wants to just have the budget at 80 hours because he kept saying that number. I think it’s very likely I end up doing 100+ hours of work for less than 100 hours pay - oh well. I will definitely stand up for myself though, if it seems like a 150+ hour project and I only have a few weeks, I am going to ask for more or just not do it. But I do feel I really desperately need some experience.

I just really want to avoid a scenario where I embarrass myself or fall flat on my face. I wasn’t happy with the quality of the work I turned in last time, but they seemed to like it well enough.

Basically they want me to automate certain functions and map certain special relationships with their CRM tool using the API’s, but I’m not super clear how their business works or what the flow is, it seems unnecessarily convoluted and the relationships are hard to follow. They are basically using the CRM tool as a one-shot to manage their entire business, and it clearly wasn’t designed for that, so they have jury rigged a lot of automation and custom conventions to get around it. It is not a tech company and there’s only one other developer who’s sort of MIA a lot.

Could be a nightmare, who knows. Probably

ChrisV · February 12, 2020, 11:50pm

bobman0330:

Moving on to more practical stuff, I have some Django/python/db questions:

This is very simple, but where is the normal/correct place in a server application to store discrete pieces of information that change from time to time. I have a db that will periodically need to download new content, and it needs to figure out when it was last updated to do so. Where does that go? Am I supposed to save it in the environment? Create a dinky table that just has one value?

Also pretty remedial, but one of my tables is a list of entities, and I want to keep track of former names for each entity. I think that the answer is to create a separate table with a foreign key to the entity table and then columns for the former name and any metadata. Yes? Is there any performance-related or other reason to be concerned about promiscuously spreading data across tables like this?

Dinky table with one variable if the data must be maintained between application restarts, otherwise an externally readable static variable on your database class is how I’d do it. It’s worth being aware that on applications which might be deployed running on multiple servers for load-balancing, static variables might only be global across one server, not across the whole application. Not something you need to worry about unless you’re developing a huge application, just thought I’d mention it.
Yep, that’s correct. Performance is definitely not a concern there. In general just doing table joins on IDs is incredibly efficient in relational database engines.

kerowo · February 13, 2020, 3:56am

Doesn’t x=5 have a reference to itself that keeps it alive or am I misremembering pre-gc iPhone programming?

bobman0330 · February 13, 2020, 6:58pm

goofyballer:

bobman0330:

what I thinking about is a situation where you declare let x = 5;, but then never actually use x. Or in a more general sense, if a value is in-scope, but there are no logically reachable references from where you are in execution

So, there’s a couple reasons this doesn’t super make sense:

If you let x = 5, x is in scope. You might say “well sure, but nothing’s using it”, but you can make objects that do stuff on their own - perhaps nothing else uses x, but say that x is a thread that’s running some task, it doesn’t matter that nothing else is using it, it still needs to exist. Or, going to the aforementioned SmartPointer example in C++ - sometimes people write “smart cleanup” objects where the whole purpose of the object is to run an action at the logical point where it goes out of scope (where C++ would run its destructor).

If something has no references, it can’t really be in scope - idk if Go has a reason you might do something like &MyValue{} (like, by itself on its own line), but that’s not in scope anymore since it’s not assigned to a variable in the scope. While C++ doesn’t have garbage collection, it would let you do that and destroy the variable on the same line where it was created.

re: GC algorithms - yeah, there’s a lot of edge cases that can complicate what seems at first glance to be a simple operation. GC algorithms are a whole area of research for how to do them efficiently, I’ve never looked into them much, partly because I’ve spent 90% of my career using non-GC languages and also because I’m happy just trusting it to work.

bobman0330:

What I’m still not getting is what the downside is for Rust to handle it this way. Why does any language garbage collect rather than automatically deallocate and forbid potentially dangling pointers? It seems like the answer must be that there are beneficial uses of pointers that a programmer can understand will never dangle, but the reason is too complicated for the compiler to understand?

I think it might be more that it requires programming via a different paradigm and a different way of thinking about your code. Garbage collection is easy - you basically don’t think about the memory your program is using, and as long as you aren’t doing silly things like building up eternal lists of long-lived objects past the point you’re done using them, your program will probably run fine. It’s easy to write code that way! Writing code that has to explicitly define ownership relationships is a lot harder.

Having worked in C++ for awhile, sometimes you see newer/not great C++ programmers fall into a tendency of passing everything around in a std::shared_ptr. There are some cases where shared_ptr is the appropriate storage mechanism, but they’re dwarfed by the number of instances where people are like “fuck, I don’t know what’s supposed to own this thing, I’m just gonna pass it around via this reference-counted mechanism and let that take care of it.” That’s kinda the garbage collection mentality - you don’t have to think about that stuff, it just works.

Thanks! Rust has a similar smart pointer that I think is the fix to the problem I mentioned above about not being able to destroy something if it was a candidate for having a pointer pointed at it. The point owns the object, then multiple people can hold references to the object (immutably), and all the candidates that aren’t owned by the smart pointer can be destroyed.

It seems like the spectrum is:

Go: No need to worry about memory management when writing code, but potential performance costs from the gc process.
C++: Much worrying about memory management when writing code, but better performance. Also, you’ll have memory leaks or null pointers if you screw something up.
Rust: No worrying about memory management when writing code and also great performance. However, your code won’t compile because [long list of reasons].

So… Go : C++ : Rust :: Python : JavaScript : TypeScript?

jmakin · February 14, 2020, 3:42am

I have a 2 hour timed test on codility tomorrow. 3 problems. Open book, I am assuming, which is good because after programming so much bash I’ll need a reference of different methods for list/string manipulation and some syntax stuff.

The great thing about solving all these challenges in bash is that the challenges in python feel super easy now. I hope the problems aren’t too gnarly.

jmakin · February 14, 2020, 4:50pm

Tell me why I’m wrong please or if I’m right and just questioning my sanity here:

I’m in a debate with someone that merging 2 lists is an O(n + m) time operation. I am saying it can be done in O(1) time, because you simply update the pointer of the last element in list 1 to point to the first element in list 2. I know I’m right I’m just really questioning my sanity here.

Can two arrays ever be merged in O(1) time? No, right? Since new memory must be allocated and elements copied over. Sorry for the stupid question it’s been a while since I studied my algorithms.

kerowo · February 14, 2020, 5:08pm

Is the other guy thinking merged means more than appended?