This is a good practice tho. The HTTP code describes the status of the HTTP operation. Did the server handle it? No? Was the url not found? Did it time out? Was the payload too large? And the JSON describes the result of the backend operation. So 200 OK with error: true means that your HTTP request was all good, but the actual operation bugged out for whatever reason. If you try to indicate errors in the backend with a HTTP error code, you quickly get confused about which codes can happen for what reason.
This is very frustrating! I get so many requests from customers asking why we returned response code 400 when we gave a description of the problem in the response body.
I know an architect who designs APIs this way. Also includes a status code in the response object because why have one status code when you can have two, potentially contradictory, status codes?
I inherited a project where it was essentially impossible to get anything other than 200 OK. Trying to use a private endpoint without logging in? 200 OK unauthorized. Sent gibberish instead of actual request body format? 200 OK bad request. Database connection down? You get the point...
When I used to work at Oracle every so often a customer would call and complain some function was throwing error "ORA-00000 normal successful completion" and they wanted it filing as a bug and for us to fix it.
I was never quite sure how we were supposed to fix stupid.
Well, looking at your example, I think a good case can even be made for it.
“s23” doesn’t look like an HTTP status code, so including it can make total sense. After all, there’s plenty of reasons why you could want custom error codes that don’t really align with HTTP codes, and customised error messages are also a sensible use case for that.
Of course duplicating the actual HTTP status code in your body is just silly. And if you use custom error codes, it often still makes sense to use the closest matching HTTP status code in addition to it (so yeah, I agree the 200 in your example doesn’t make a lot of sense). But neither of those preclude good reasons for custom codes.
Ugh this just reminded me that I ran into this exact issue a couple years ago. We were running jobs every hour to ingest data from an API into our data warehouse. Eventually we got reports from users about having gaps in our data. We dug into it for days trying to find a pattern, but couldn't pinpoint anything. We were just missing random pieces of data, but our jobs never reported any failures.
Eventually we were able to determine the issue. HTTP 200 with "error: true" in the response. Fml
I had a similar one at a past work too. A test which was asserting a response status 500.
Like, instead of the test asserting the correct error/status code was being returned, it was instead asserting any error would simply getting masked as a 500.
Basically, asserting the code was buggy....
That made me angry a couple of times but I still miss that place sometimes.
At a prior job, our API load balancers would swallow all errors and return an HTTP 200 response with no content. It was because we had one or two clients with shitty integrations that couldn't handle anything but 200. Of course, they brought in enough money that we couldn't ever force them to fix it on their end.
I once worked on a project where the main function would run the entire code in a try-catch block. The catch block did nothing. Just returned 200 OK. Didn't even log the error anywhere. Never seen anything so incredibly frustrating to work on.
There was nothing RESTful or well planned about this API's interfaces, and the work to do something like that would have been nontrivial. Management never prioritized the work.
Assuming there was some API key system in place, could just check on the key to see if it belongs to one of those clients. If yes, 200. Else, real APIs.
I use HTTP error codes in my API, and still occasionally see a GET /resource/{"error":"invalid branchID provided"} from people who don't seem to know what they are.
My team recently migrated to graphql and they don't even do it right. The graphql layer still makes REST calls and then translates them to a gql format, so not only do we get no time or computing savings, we also get the bullshit errors
It make sense for a wrapper layer to do this and I had to fight against APIs that didn't. If I make a single HTTP call that wraps multiple independent API calls into one, then the overall HTTP code should reflect status of the wrapper service, and the individual responses should each have their own code as returned by the underlying services.
For example on one app we needed to get user names by user id for a bunch of users. To optimize this, we batched calls into groups. The API would fail with an error code if one of the user ids in the batch was bad or couldn't be found. That meant we wouldn't be getting data for any of the users in the batch and we didn't know which userId was bad either. Such a call should return 200 for the overall call and individual result for each id, some of which could be errors.
I looked into it once at my last company, but none of us knew it and we had a tight deadline. For our scale and usecase, it definitely seemed like needless complication for most things compared to any payoff of switching.
I always loved how Sierra took its error message and turned it into an intentionally quitting the game message because every time they closed the game, instead of closing properly it crashed.
me with gRPC error codes: nil, parameter error, app error -- OK, you fucked up, we fucked up. Edit: forgot NotFound.
I really should read about the various ones that exist at some point, but I've always got bigger fires to put out.
Edit, since it seems unclear, gRPC != HTTP and does not use the same status codes. I meant that I felt like I was using fewer than I should, though I just checked and basically not.
This is basically the difference between HTTP 4xx and 5xx error codes. 4xx means the client did something wrong (invalid request, tried to load something that doesn't exist, doesn't have access), whereas 5xx means the request was OK but something broke on the server.
Yeah, I know how http status codes work. I just followed the existing pattern at my current place with gRPC and this post made me realize I don't know most gRPC error codes and best practices.