Hacker News new | past | comments | ask | show | jobs | submit login
AI Dungeon public disclosure vulnerability report (github.com/aetherdevsecops)
198 points by kemonocode 13 days ago | hide | past | favorite | 132 comments

I am strongly against child abuse, but I really don't have a problem with a computer being forced to emit textual patterns which include English words correlated with something a human might call a story about child abuse. It's a waste of GPU time, but enh.

It's scary that the press release from latitude talks about "and to comply with law" as a reason for review. Under US law, maybe specific threats to the President might be reportable (although one on one communication with an AI would be a stretch here...), but I'm pretty sure an AI or other system emitting textual patterns which humans view as representing fantasy sexual abuse of anything isn't illegal, just distasteful.

They're perfectly within their rights to ban it under a ToS but pretending it is for legal purposes is fucking bullshit. (Of course, I'm not a lawyer.). My understanding of court decisions is that even machine-generated images are legal, although "is this image machine generated or is it evidence of actual child sexual exploitation" is an increasingly difficult question and if you're building an automated low-cost system it often makes sense to err on the side of safety. There might be some complexity around laws related to image manipulation involving real, but legal/non-sexual minor images which are then convoluted into something sexual, or "revenge porn" use of something which simulates a specific person (or is based on that person), and maybe text can be legally questionable if it's abuse targeted to a specific person, but especially for on on one non-published communications with a computer you are probably fairly ok with a weird fantasy sexual fetish about a neighbor, even a minor neighbor, in text form, unless it rises to an actual threat.

It isn't against the law necessarily, as obscenity laws in the US only make it illegal if the medium is an image depicting a child and there is no artistic purpose (very vague). Regardless, It annoys me when organizations do this when they can easily just create a ToS outlining what isn't allowed. It truly boggles my mind the contempt that some orgs have at creating and maintaining a ToS instead resorting to hiding behind fantasy laws as justification. Just straight face tell your users "Nothing related to X of any kind or permaban", very clear, very succint.

The linked article is not about the content moderation changes. Anyway,

Paraphilias are strengthened through repeated exposure and association with pleasure. So if you're in the camp of not wanting to strengthen a pedophile's desire to prey on children, you also want to restrict the availability of such stimuli. The whole "provide them an outlet" idea is nonsense - human sexual attraction does not work that way.

I think this may literally be the same argument as the one for banning D&D for corrupting the minds of youth. I suspect most (nay, all?) paraphilics have the ability to differentiate between reality and a page of text - just like many of us have avoided going full Don Quixote after reading Lord of the Rings.

Maybe they'll get addicted to reading. Society will not crumble, and no children will be harmed.

It has nothing to do with differentiating reality from fantasy. It has to do with whether a given pleasure association is strengthened or weakened. Pedophiles seek out these stimuli because of a pre-existing attraction toward children in the real world.

It is simpler than you think. If you're at a screen full of text and get a dose of dopamine or whatever, you're simply going to spend more time staring at screens of text.

If your theory was true, we'd have seen historically massive upticks in violence from videogames. It is yet to materialise. This is an era of unusual peace.

Please, let's not get distracted by videogame violence.

We have seen historically massive upticks in child abuse material [1], caused by two inventions: the internet and smartphones with cameras, and there's a strong case made that it reflects an actual, large, increase in child sexual abuse.

I don't have a specific article to link to, but Gabriel Dance discusses in a podcast [2] how many paedophiles are gradually indoctrinated by communities; they are made, not born.

[1] https://www.nytimes.com/interactive/2019/09/28/us/child-sex-... Warning: extremely distressing descriptions.

[2] https://samharris.org/podcasts/213-worst-epidemic/ Warning: ditto, though not as bad.

>We have seen historically massive upticks in child abuse material [1], caused by two inventions: the internet and smartphones with cameras, and there's a strong case made that it reflects an actual, large, increase in child sexual abuse.

I would like to hear that strong case. I cannot find the data right now, but I have heard the rate of child sexual abuse cases has fallen since the internet's debut. A possible reason (if the rate has fallen) being that pedophiles, who know they will be severely punished if caught abusing a child, now have easy access to existing material. They don't have to exploit or abuse children around them.

We don't have a guaranteed way to keep somebody from developing pedophilia, so we shouldn't ban fictional material that isn't related to actual exploitation or abuse. Images, videos, or even writing based on real events need to be caught (with the usual caveat of non-pornographic use, like a victim's memoire or a therapist's training materials).

Again, I'm still looking for hard data on the rates of child sexual abuse.

Here's one PDF report I've found from the US Department of Justice: <https://www.bjs.gov/content/pub/pdf/fpcseo06.pdf>

According to it, US federal attorneys dealt with 568 suspects of child sex abuse in 1994, and 601 in 2006. The chart shows the count remained steady-ish between those years. If we keep in mind that the US population was increasing, that means the rate of child sexual abusers being investigated has gone down.

Over that time period, the rate of people investigated for child pornography drastically increased. This evidence does not support a casual relationship.

I don't doubt online pedophile communities support pedophiles, but I don't think they create them. Also, just because we're documenting it now doesn't mean it didn't happen before - sexual abuse of young girls has been endemic since forever, and people go through all sorts of contortions not to acknowledge it.

Gabriel Dance makes the argument much better than I could; I won't attempt it. He spent years investigating, actually doing things such as interviewing paedophiles, asking them 'why'. I don't have reason to doubt his conclusions.

But what is clearly new is that now people are encouraged to abuse children because they can take a video, share it, sell it, gloat about it, trade it to get access to more material.

Additionally, sharing imagery of abuse further harms the victims. For example, in that podcast it was mentioned that some victims receive a notification every time images of their abuse come up in a court case!

Blocking child abuse literature and fake child porn isn't going to stop predators from doing the stuff you mention though. That stuff is still going to be teased on social media and shared on the darkweb.

I don't think it's right to argue that the internet creates pedophiles.

After all, that largest known organization tied to sexual exploitation of children is.... the catholic church. And the abuse scandal goes back much farther than even the invention of the internet itself.

So we can't heap the blame on the internet. If anything, the internet is bringing these issues to light, when before they were shrouded in secrecy. Consider the alternative interpretation of the NYT report.

If only 3000 images were reported in 1998, but there were still thousands of cases in the US alone, then the internet has actually made it easier to investigate and find the criminals.

> After all, that largest known organization tied to sexual exploitation of children is.... the catholic church.

If you think that's evidence against the theory, you're committing a common statistical error. The internet could spawn a million new pedophiles that all have multiple victims, and the catholic church would still retain the title, because even though it may be the largest offender, that doesn't necessarily mean anything with regard to total offenders, or where they come from.

It's not evidence of absence. It's anecdotal that pedophiles existed before the internet had any photos of children. Increase of photos of children on the internet just means that more people are using the internet.

You actually want to look at actual crime statistics.

> between 1993 and 2005, the number of sexually abused children dropped 38 percent, while number of children who experienced physical abuse fell by 15 percent and those who were emotionally abused declined by 27 percent. [1]

People want to blame pornography and the internet for problems that existed before the internet even existed. And the numbers and facts show the exact opposite: crimes are going down.

[1] https://www.livescience.com/17285-child-sexual-abuse-numbers...

The catholic church is not evidence that the internet doesn't create pedophiles. It's evidence that the internet is not the solve source of pedophiles. If pedophiles are created and not only born that way (I suspect there's strong evidence from abuse studies that indicates this is true), then it's possible they can be created from multiple sources. Long-standing sexual abuse of children by the catholic church is not by nature of it's existence evidence the internet does or does not create pedophiles if you believe they can be created from other sources (is the catholic church responsible for every single pedophile?)

> Increase of photos of children on the internet just means that more people are using the internet.

That's too strong an assertion. It doesn't "just" mean anything, it indicates some relationship, not necessarily a complete explanation.

Personally, I don't know whether I think one way or the other on this, but I don't think it's nearly as cut and dry as people are presenting it. As I understand it, sexual behavior does not always follow the same patterns as other behavior, and people rushing to immediately assume it is or isn't related should probably step back and think a bit more critically, and gather a bit more verified information that directly applies.

To me, the fact that so many people are rushing in with arguments that I see as tenuous, if not outright fallacious, points towards an emotional response by people rather than a rational response. Given the subject matter, and that if the theory presented is true it's purely negative for all involved, as both the pedophiles created (or made worse) by exposure as well as their potential victims are worse off, it deserves some respect in how it is responded to. If it's false, it deserves to be clearly and obviously indicated as such with evidence, and if it's NOT, then we should really examine what that means.

>Given the subject matter, and that if the theory presented is true it's purely negative for all involved, as both the pedophiles created (or made worse) by exposure as well as their potential victims are worse off

I don't think that should be up to 'us' (non-pedophiles) to decide. Arguably, perhaps the Internet doesn't 'create' pedophiles as much as it may help people understand that they are pedophiles, and that society's shaming of them (which, yes, does lead to worse psychological outcomes and increased stress for the afflicted person) has negatively impacted them and forced them to keep quiet about their sexuality. This stigma even prevents them from seeing mental health care providers, for fear that therapists will incorrectly (and perhaps even illegally) report them.

I have no opinion that fiction can impact pedophiles and inflame their desires, but I do not think that is a good enough reason to do away with fiction (to make the publisher liable) - the degrees of separation are too high for me to see illegalising fiction as a reasonable and just measure. The idea that someone can be thrown in prison for the mere risk of the following circumstances coalescing:

* Creating material that might appeal to some pedophiles

* If those pedophiles find the material

* If it hits on the appropriate themes to inflame desires (e.g. is it straight fiction? Gay fiction? Is the subject in the fiction the right age? The right appearance?)

* If those 'inflamed desires' are processed by the pedophile's brain to relate to the real world, forming a plan of action (let's call this real-desire)

* If that real-desire faces no moral or practical impediment to its execution

* If the pedophile's impulse control is low enough to execute the real-desire

This is an extremely tenuous chain of events, each one relying on the previous event - and most of this chain of events is squarely in the court of the pedophile/potential child abuser, not the author of the fiction, who still merely creates the risk of this chain of events occurring. This is complicated by the fact that we have no idea how many non-offending pedophiles there are, and we know that most child sexual abuse is not perpetrated by strangers, but by family members (i.e. there is a specific target; this is not usually how people relate to porn), and a large portion of that sexual abuse is not even perpetrated by pedophiles.

The best tactic to prevent child sexual abuse is to stop assuming all pedophiles are potential rapists, and to stop assuming that the people who consume fiction necessarily find fictional desires reflected in their actual psychology (or vice versa). See Patrick Galbraith on this topic.

> I don't think that should be up to 'us' (non-pedophiles) to decide.

I meant, specifically, if society will punish you severely for behaving a certain way, and if behaving that way also causes severe negative outcomes for others, then something that causes people to develop desire for and possibly actual behavior in that way is purely negative for everyone involved, including society, the person that develops the desire and/or behavior, and the people that behavior affects.

As a simple extreme theoretical variation, if looking at specific shade of color that was not natural or common caused people that viewed it to enter a trance-like state where they murdered others, we would say "holy shit, get rid of that color everywhere. People are being murdered, and the murderers are victims too!"

And I'm not saying I believe the statement about material creating pedophiles either, but given the importance of the outcome, it's definitely something we should pay attention to and try to find the truth about. That truth might be that it's easily proven false, or that it's obviously true, or that it's some weird complicated combination thereof, but the I think it's obvious that the correct case is not to say "naw, it's the same as videogame violence, and we've all come to the conclusion that that's bunk, so this is bunk, case closed." There are real reasons why it might be different than violence, or that the statistics that make one acceptable don't work out the same way for the other.

> The best tactic to prevent child sexual abuse is to stop assuming all pedophiles are potential rapists

And I'm not assuming that. But I am working under that assumption that the same person will have more problems with pedophile leanings/preferences than one without, because as a society we've decided to make that so. Looking at fictional images can get you arrested. Having a desire to look at images like this is a negative outcome for that person in a societal and legally sense (and possibly in a mental health sense as being told that you have something wrong with you can be debilitating, I'm sure), compared to not having that desire, and this is about creating that desire in an individual.

> But I am working under that assumption that the same person will have more problems with pedophile leanings/preferences than one without, because as a society we've decided to make that so.

To me, this means that we should distigmatize pedophilia in a very careful way - that is, to make it clear that to act on that desire in the real world is a violation of consent. At the moment, the only thing we've done as a society is to act as though these desires are exceedingly rare, that those desires posses the holder to act impulsively (while, simultaneously, erecting the idea of the plotting and cunning pedophile), and that without any more research, that fiction may lead to certain conclusions in that person's mind.

The reason why video game violence is brought up so much is for two reasons; firstly, because we don't have solid evidence on the effects of how (a) pedophiles understand such media (b) previously-offending pedophiles understand such media (c) how non-pedophiles but interested parties understand such media (d) how non-pedophiles and non-interest parties understand such media. For (a), the closest evidence we have (as far as I know) is (b). For (c), we have some ethnographic evidence from Japan that consumers of such fiction draw a sharp distinction between the "2D" and "3D" worlds. For (d) we only have some evidence that ordinary people are not conditioned to associate sexual descriptions to scantily-clad, real, underage children. Secondly, it's brought up because video games (and by extension horror movies, and even adult pornography) have exposed the idea that desires for what is seen may not leave the hypothesized impact on the viewer. What was previously thought of as obvious by a succession of moral panics around violent movies and later video games has been shown to be nothing more than that.

Is more research required? Absolutely - I don't think anyone can argue with that. However, research which focuses on pedophilic orientations only through the lens of convicted offenders (i.e. those who have sexually abused children, or accessed real world child pornography) does not do the topic justice. I think it's fair to say that this is comparable to similar situations - not to say that this comparison is binding or solid evidence, only a hint in the direction of a conclusion - of how people enjoy simulated rape pornography, and how people enjoy fantastical pornography. For those studies, at least, we are able to access the adult-attraction equivalents of group (c) and (d).

I'm in a strange position myself in this debate - I don't believe that moralism is the way forward, but I also (and I think along with you) don't believe that non-moral solutions can work well in our current society. The question is, for me, to what extent a pedophile can recognize that child abuse is wrong, but still express themselves sexually, to be comfortable with their sexuality.

Still, I think my point still stands. To imprison a fiction author on a tenuous basis is wrong, because it invoves the notion of collective responsibility, the notion that the author has contributed to an 'atmosphere' is by that measure guilty. The fact that many other (pedophiles with no desire to offend, non-pedophiles with no desire to offend) enjoy the material, and can be reasonably expected to be the primary audience of the material is enough to even absolve the creator of the material morally - never mind legally.

On the weight of what I see as the available evidence, analogous (video games, other forms of pornography, kink communities) or otherwise (studies of fictional CSEM fans in Japan), the presumption should be in favour of non-regulation, especially given the stasticts on who child sexual abusers are in reality.

Oh, hmm, I actually meant to link to [3] for rising stats on abuse imagery.

[3] https://www.nytimes.com/2020/02/07/us/online-child-sexual-ab... No warning needed, no descriptions of abuse. (Sorry, paywall. Disable JS.)

Real-world violence is so far removed from its video game counterpart that there is no real comparison, as anyone who has been in any sort of fight understands.

In contrast, pornography (visual or textual) is quite similar to its real-world counterpart.

>In contrast, pornography (visual or textual) is quite similar to its real-world counterpart.

A cursory look around online erotic roleplayers, online stories, the manga/doujinshi sold in huge numbers every year in Japan at Comiket, or even the plots of mainstream porn will quickly disabuse you of this notion. I'm also not convinced on the 'no real comparison' for video games. I'm reminded of the fact that the US Army recruits first-person shooter players, and that video games can show a rise in aggression in lab settings, similar to real sports do.

Even if they were all that different, you'd still have to specify in which specific ways, and the extent of those ways, the difference between the two entertainments is relevant to your argument.

It is though? I'm thinking you and your partner(s) must either be very athletic, or you're watching some pretty tame porn.

Textual descriptions of child sexual abuse both desensitise (a thin end of a wedge, a slow on-boarding) and reinforce a habit or desire.

See my reply down-thread.

Is the same true of movies where a man goes on a killing spree ala Rambo or John Wick? Or videogames where one does the same? I'm trying to suss out the logic where one indulgence of fantasy through fiction is ok and the other isn't.

I would find it believable that sexual associations are more malleable than arbitrary behaviors.

If that is true, one would expect significant evidence to back it up exists. Studies on the effectiveness of gay conversion therapy suggest that your assertion is not based in fact.

I guess I would follow that up by saying development of festishes is not the same thing as base sexual orientation.

I also don't think this would be easy to study at all. Furries might be a good candidate population to study.

I think the above poster is implying that no one is a pedophile by choice. That would mean people are choosing to desire heinous crimes, which would destroy their reputations permanently and send them to prison. If that were true, we would expect to see a lot more unity from these self selected people in the same way say cannabis users are always jockeying for laws to change.

As it is, it looks like an pedophilia is a unasked for mental state spread out over the population at random.

I've reread this a few times and I'm not sure if it's referring to me, or which position you're implying that person is taking.

But my view would be people don't choose to be pedophiles. I believe it's fairly well established that childhood sexual abuse correlates to adulthood pedophilia. Likewise nobody is born a furry, and, while being a furry doesn't demand sexual interest, exposure to sexualized furry content probably tilts some people in that direction over time.

I do think its kind of fucked up to not differentiate between people who are active child abusers and people who recognizing they're attracted to children but find it morally wrong and desire help.

That may be the case with some consumers of the material, but it is not obvious to me that it is the case with all or even most of them. Further, the idea that to have a desire necessarily leads to acting on it is on extremely shaky ground. To what extent is the 'habit' the habit of seeking out dirty stories, and to what extent is the 'desire' the desire to orgasm without harming another person?

I'm pretty sure porn has prevented people from getting so sad and desperate that they are driven to abuse real people, kind of like how people develop emotional bonds with television characters when their life is empty.

I find either take here to be plausible, is there any good research? (I can't imagine how to ethically run an experiment, but maybe someone's figured that out)

I googled briefly and stumbled upon this[0] meta-study from 2020, no idea how good the research/journal is. It seems there's no broad consensus, research in this field is more influenced by expectations of researchers than real strong methodology.


> I'm pretty sure porn has prevented people from getting so sad and desperate that they are driven to abuse real people

Pretty sure the opposite is true.

I'd rather have an incel staying home and jerking off to porn than going out and trying to rufie a girl at a social gathering (or worse). Men without a sexual outlet become aggressive with or without porn.

Ted Bundy claimed in his final interview extremely violent pornography desensitized him and made him start looking for stronger stimulation. I’m not sure how much you can trust his claims though.

Don't even suggest it.

Billions of people on earth consume pornography. If this theory was right, there should be billions of serial killers.

And remember: in the 1970s, his "extreme pornography" would be something a lot more tame than you can pull up during your lunch hour today.

Some people are just naturally ill-fit for living in society, and the fix in not policing pornography. The fix is mental health medicine and psychiatry.

Asking mad men to self diagnose the origins of their condition sounds like a bad idea. Even for normative people asking them how their fundamental morals changes would be usually be an unanswerable tar pit.

In the long term that's just as empty, though. You just can't replace real social contacts with digital media.

Kind of like GTA has lead to a car-hijacking crisis and a amok-murder wave that still terrorizes all western nations?

Those monkey sees, monkey does theories have been thoroughly debunked by now?

Why not admit, that we forbid this stuff, because its horrifying and deeply distressing to see another person, especially a child suffer?

Its the same crap as gore or torture-videos. What it is to somebody afflicted with these "preferences", doesen't factor into it at all?

Why do we expect that in general the availability of pornography is hugely negatively correlated with sexual crimes but in this subset the opposite is true? Do we have studies about it?

> human sexual attraction does not work that way.

Says who? Are you a psychologist? I don't think there is any research that supports that reading written stories about rape increases the likelihood that a person goes out and rapes a real person (and let's be clear: pedophilia isn't the part that is illegal; it's the rape).

Isn't this akin to the "playing violent videogames will make people into murderers" argument? Or with music, movies, and even comic-books before that?

I'm right there with everyone in saying no one should rape children, but at least according to US law, simulated text stories are not illegal due to first amendment rights. It would take quite a causal-link to convince scientists and the courts that something that is gross should actually be illegal.

> Paraphilias are strengthened through repeated exposure and association with pleasure

I heard some ultra-conservatives saying the same about gay porn and how "it turns people gay" as an argument to get it banned. Are you sure that you are not just recycling their arguments?

Is there research? I'd totally believe it, just curious, as it's sort of an old debate.

Apparently they don't do insta-nuke.

"violate terms of service with elf" "continue" "continue"

Was this comment supposed to be left on the other AI Dungeon post? This one is about a security hole.

> Unfortunately, this is, in fact, the second time I have discovered this exact vulnerability. The first time, the issue was reported and fixed, but after finding it again, I can see that simply reporting the issue was a mistake.

I feel uncomfortable with this. The author already reported a vulnerability, it was fixed, but now there's a new one (which is identical, ok, but new nevertheless), so he decided they didn't study their lesson, and punish them with public shaming? I'd maybe get it if the first time was ignored, but like this? Nah ah.

It's like my worse teachers coming back to hunt me as an adult.

This is a great example of what it is like working vulnerability disclosure intake at a F500. Now receive things like this monthly and try and have a constructive conversation with the reporter.

AI Dungeon isn't an individual, it's a business that charges people money to use their product. In the vast majority of cases, responsible disclosure means first telling the developers so they can patch the vulnerability. After it has been fixed, unless there are special circumstances, the fact that there was a security flaw should be disclosed to the public. The purpose of this should not be to shame the developers or anyone else, but to inform the people using the product that there's a chance their info has been compromised. I'm mostly just surprised that AI Dungeon chose not to disclose the first instance of this flaw.

The victim here are the people whose stories got exposed, not the company. The company is responsible for not protecting them enough. The public shaming is important to show people that this company is not to be trusted with your data.

Exfiltrating a bunch of data and then poking around to see how sexy it is and then releasing it seems like a strange way to show that the company you took it from isn't trustworthy with data...

It's absolutely appropriate.


Isn't it a bit more like a school suspending someone for turning up to class with explosives in their bag, and after they explain and apologise the school let's them off with a suspension, and then a short time later when the suspension is done they go ahead and do it again and this time the school just goes straight to the cops?

No, because it's not normal to accidentally or unknowingly bring explosives to school.

It IS normal to accidentally or unknowingly have a flaw in software. So normal that it's likely every piece of software on the planet has flaws, many of which allow the exfiltration of data.

No, mistakes are unintentional, bringing explosives to school is.

You can, on the other hand, claim it's irresponsible, but there are many devs here who can testify that mistakes do happen and do recur in software projects. You need more evidence than "the same mistake happened twice" to imply irresponsibility.

There is a point where the potential consequences of a software flaw are so great that failing to fix it properly - after being informed of it - can understandably be seen as negligence.

It's not clear if the author reported the newly found vulnerabilities, though.

I think most security researchers agree that public disclosure is the only recourse if an entity repeatedly refuses to acknowledge or fix a serious vulnerability. But you have to give them fair opportunities to understand and try to address the vulnerabilities before resorting to that.

As far as I understood they fixed the first bug. This is a new one.

"Open" AI deserves all the shame they get.

The author found a vulnerability, extracted data they should not have had access to, processed the data (aggregated, anonymized), then published the data. Isn't everything starting from "extracted" illegal? Or is it a gray area where "the server would not have provided the data if I were not authorized to receive it" -- in spite of the author's admission that it was acquired via a vulnerability?

Yeah, this seems like it was a real bad idea on his part. If AI Dungeon get pissed by this, it could be bad for him. He clearly has gone past what is considered reasonable by extracting all this data to shame them into fixing it.

Agreed. Retrieving a single random record was enough to prove vulnerability. Analyzing several days worth of data (why??? What does that prove???) crosses the line firmly into black hat territory.

I guess if someone provides an api via graphql it’s hard to tell if it’s intended to be used publicly or not, and to what extent that use is permitted. The site and app both use that api end point and going there gives you a nice page with full documentation of how to do every query plus an online IDE.

One might pull the data then start to wonder if they were supposed to get it only after they begin reading specifics that seem private.

Considering he had already found and reported this vulnerability before, and then took the time to write this report about it, that's not what happened here. He knew it was a vulnerability, he used it purposely to download private data and he looked into it. Not only could AI dungeon sue him for this, also the owners of the data (the people playing AI dungeon) could.

There have been cases of ethical hackers who found a vulnerability and abused it to download a disproportionate number of records being convicted, at least in the Netherlands. It didn't matter that their goal was just to show it to the website owner. So if you're an ethical hacker reading this, I would strongly advise you to only download the minimum required to demonstrate a vulnerability (preferably your own data, or one record), and not do what this person did.

The data was retrieved by mass-upvoting unpublished documents and using an obscure GraphQL feature to extract fields that aren't part of the explicit interface.

I don't think you could do any part of that while assuming it's intended as normal use.

Agreed, this is a really bizarre article. It starts with "following responsible disclosure procedures..", and then you scroll down and the author says they downloaded millions of entries and then... checked to see how much of it was NSFW(!!)?

How is that not just prurient snooping?

I haven't worked with GraphQL before, but looking at those code snippets and reading the description of the vulnerability, it seems like a mess. You're giving a client unfettered access to just... query your database? Of course you're going to get these kind of issues - that just seems obvious to me.

Getting real off-topic, but the syntax is backwards too:

` Interface Votable implemented by Adventure, Comment, Post, Scenario `

The interface lists what implements it? Reminds me of COMEFROM[0].

I dunno. Modern front-end is wild. These live code Notebook things are chaos. Spaghetti begets spaghetti.

[0]: https://en.wikipedia.org/wiki/COMEFROM

GraphQL is just a standardised API model to access data, similar to REST. How you choose to implement the API is up to you.

The fact that one developer was negligent in their implementation of a GraphQL type API does not really say anything about GraphQL itself.

Many REST APIs have developed vulnerabilities over the years, but we should not draw the conclusion that REST itself is insecure or is a bad idea.

I think GraphQL might be unusually easy to accidentally get wrong.

It handles very powerful untrusted input, by design. And it has enough features that many, maybe most users don't know about all of them. This vulnerability wouldn't have happened if GraphQL didn't have a certain feature.

I've implemented a simple GraphQL API and I didn't know about all the features that were involved here. I thought carefully about security, and I think I ultimately got it right, but there were a few false starts.

Good software is not just possible to use correctly, but easy to use correctly. Software that's easy to use correctly will be more secure in practice, even if it has the same theoretical properties.

I don't know if GraphQL would be better if it were less powerful. I'm not an expert. But it's worth considering the possibility.

I honestly dont see any different between graphql and a REST api, in terms of what data is available where.

if you have data you don't want publicly available, just.....don't include it in the model, and make sure your server implementation doesn't return it.

It is possible I don't understand your comment, I suppose, but I really don't see what is so unique about graphql from a security standpoint.

> don't include it in the model

They tried that, but they failed because they didn't know that it's possible to downcast from an interface.

It's really hard to have that kind of problem in a dumb REST API. `return {"name": record.name}` does what it says with hardly any magic. But if I write `return record` there's a whole extra layer that grabs information out of record, and I have to trust that it only grabs the information I want it to grab.

This is not to say that dumb REST APIs are definitely better. Having to do things manually also introduces risks.

If your return a record it only will have data for the fields that you have implemented. When you are implementing graphql, every resolver has to check if the requesting user has the proper permissions to see anything.

Fields aren’t created automatically by the graphql engine itself. You have to write them yourself, or use another tool that generates them for you at run or compile time.

Also it’s not often said, but you can have fields return a union such as AdminRecods and PublicRecord or Record WithoutPrivateData.

Does GraphQL have a way to disable features? If so, it seems the sane way to go about implementing it for a set of data would be to enable the bare minimum features, and enable anything needed specifically after reviewing and researching what it allowed and how it interacted with other other features. If you can't, that seems like a very dangerous tool to use.

By default, no features are implemented in GraphQL. It’s a protocol like SQL or REST.

You can adhere to the protocol, and doing so gains you an ecosystem of tools to use, but you have to actually build the nitty gritty bits yourself.

Think of it this way: you can create a graphql schema for a calculator then implement it. It will do math, but store no data and definitely have nothing to do with a relational database.

That was my initial reaction to GraphQL as well, but just like a database you can do stored procedures or you can do SQL but both can still support access control if you need it.

I think it's just a function of them not really thinking too much about the potential sensitivity of the data or building an access control model into the application.

You can and should protect GraphQL on the backend. GraphQL will just forward requests to resolvers which actually get the data, these resolvers should see who's logged in and only return data this user has access to. I'd say the security model the AI dungeon GraphQL uses (depending on GraphQL interfaces to protect subfields of objects) is broken. These fields should never be returned to GraphQL by the backend in the first place, if you're not authorized.

This. Your resolvers should still do access control. No different than a REST api would.

This is a really interesting moment in AI. An AI spontaneously commits a crime and engineers have to teach the AI how to obey the law.

We have the AI allegedly emitting illegal fiction and the engineers have to fix it and all they can try to do is word filters. What happens next in this story?

This reminds me of the Chinese virtual girlfriend who got neutered for saying politically illegal speech that the Chinese government objected to.

Another one, was when the Google image labeler was mistaking people for animals. That was extremely distasteful, but not illegal. Google's solution was to get rid of those labels.

Also, all the restrictions on drone activity are another thing. However the drone problem is solveable with reasonably simple rules.

Imagine if Alpha Go made a pattern that could make people have seizures or something, but most people could detect it and not do it, but nobody could make a simple rules based approach to detect it. I guess you'd need a whole nother alpha go size model to recognize that pattern perhaps?

I, personally, find the concept of "illegal fiction" super, super disturbing, mostly because "illegal" is very arbitrary, and declaring _fiction_ illegal is literally creating thought-crime.

Maybe the crime of fiction about children can be solved with a disclaimer in the footer “one year in the story generated by the AI is 10,000 Earth years” where a 3-year-old is actually a 30,000-year-old in our calendar. That’s how silly these attempts are at arbitrary obscenity thought crime laws about fiction.

Writing or reading or generating distasteful fiction shouldn’t be a crime just like a comedian should be able to joke about child abuse, like Louie CK’s joke about how maybe it’s the child’s fault for being too attractive or whatever it was. “But what if pedos agree with him?” feels like the argument people are making in these comments for illegalizing fiction.

We already have this. You cannot use fiction to deliberately harm somebody’s reputation, for example.

You absolutely can. You can write as much fiction as you want about whatever as long as you present it as such. You specifically cannot make statements presented as truth that you know to be false with the intent of causing harm to that person. It may seem pedantic but the difference is vital.

That's not true. This is the reason why movies will have a statement that similarity to real people is only coincidental.

There's a difference between explicitly closing off any potential for liability and what's being described. I could, in fact, write a story that I describe or strongly imply as fictional (this is the essential part) about anyone wherein I describe them as doing heinous or illegal acts. This is why satire is legal, this is why e.g. John Oliver can describe clearly untrue things about Trump or whomever for the sake of a joke, why comedy podcasts can make jokes about individuals that no reasonable person would presume to be true. The reason that movies have that disclaimer is: so they can avoid even the chance of people saying the movie's describing their real (non-fictional) life (which might require payment on the part of creators of the movie) or (by virtue of describing their real life) open the movie up to libel claims if a reasonable person may presume the work is accurately describing that person. They're crossing their T's because lawyers are expensive whereas comedians to some extent thrive on drama like that (because they almost certainly won't be held to libelous).

If you make an illegal fiction detector, you will also have an illegal fiction generator. So you'd need to keep it closed-source, or else the internet will grab a copy, flip the sign and go to town.

This post doesn't have anything to do with criminal content or filters.

This is about a vulnerability that allowed an external party to scrape and access private adventures and user inputs from all other users.

Wait, there is a kind of fiction that is illegal?

I though that, at least in free countries, fiction could contain whatever you wanted, since it was not real.

In fact, even not fiction, but books in general. You can still buy the communist little red book and the nazi mein kampf, at least in France.

Would be funny if murder in fiction was banned.

All the movies, theater plays and games ect.

I mean, that is what many groups have attempted to do over the years whether it's violence in movies, D&D, or other entertainment. For some reason, 'classics' like Shakespeare (what with the death, incest, and foul language) get a pass, though.

In England you can write almost anything you like, but see https://en.wikipedia.org/wiki/R_v_Walker, which resulted in a "not guilty" verdict but lots of trouble and expense for the writer.

> girlfriend who got neutered

Not just censors, but transphobic censors to boot.

See also the change in content filtering announced today (https://news.ycombinator.com/item?id=26967683), which given the disclosure timeline here, may be related.

(off topic, but this report is a good example of how to handle user data)

> anonymized

Could we, perhaps, stop using this word? Instead of using the vague, often misleading term "anonymized", state directly what actually happened, e.g. "names and addresses were removed", "user data was aggregated by ${group}", or "the UID was replaced with a new, equivalent key". Most of the time claims about data being "anonymized" are simply not true; replacing names or UIDs with a hashed value that is merely replacing an existing candidate key with a new synthetic key. As DJB said[1]:

>> Hashing is magic crypto pixie-dust, which takes personally identifiable information and makes it incomprehensible to the marketing department. When a marketing person looks at random letters and numbers they have no idea what it means. They can't imagine that anybody could possibly understand the information, reverse the hash, correlate the hashes, track them, save them, record them.

The rare examples where "anonymized" actually involves meaningfully making user data anonymous are when the actual user-correlated relations[2] have been destroyed. This report specifically discusses how this was done:

> If a sentence fragment appeared in less than 10 unique adventures, it was discarded from the result set to preserve anonymity.

Sometimes this required accepting a small amount of error:

> this data needed to be processed in batches of around 10000 adventures per batch. In each batch, fragments appearing only once were purged. Therefore, counts under around 25 are actually underestimates.

[1] https://projectbullrun.org/surveillance/2015/video-2015.html...

[2] https://en.wikipedia.org/wiki/Relation_%28database%29

You are correct, this word is hugely misleading.

I once ran an audit of a major VPN provider who claimed they did not store IPs or in fact any personal data on their users in their marketing material. In fact, their technical justification for this was merely swapping one unique key out for another. When asked to trace a persons connection and identity, a randomly selected engineer on their data science team did it within a few minutes. The fact that they had a data science team when they purported not to collect user data was baffling to me.

> claimed they did not store IPs ... on their users

"We don't store IPs in the 'users' table."

> merely swapping one unique key out for another.

"We store them in another table* with a synthetic primary key."

Maybe "we don't storing personal data" is how you say "third normal form" in the marketing dialect?

OK now I have to go check out AI Dungeon. Is this some clever marketing ploy to get me to try it out?

It’s amazing. You can summon any kind of being you’re interested in for the most depraved sex acts you can imagine. And now you have an audience!

There does seem to be some bias in the training data. Sometimes your partner will turn into a werewolf and start howling and scratching at you.

It’s not the only wrapper around GPT-3, but if you might enjoy a bit of machine-assisted creative storytelling it is the cheapest AFAIK, and every game is certainly unique. AI-D can sometimes be an entertaining and/or revealing journey into your own neuroses. Just be aware that Latitude’s privacy policy is “we might read all your stuff”, and that they have form when it comes to treating customers like cattle.

And note that the client app is a SPA of very questionable quality. Whatever Latitude’s strength might be, it sure ain’t web programming. I was unsurprised to hear of a vulnerability and I suspect there’ll be others lurking.

It’s not bad, but well, it is AI ;)

> You say "Welcome, what can you do?"

They each show you a few things, seeming unsure of themselves.

"What are you supposed to be?"

"I'm the new expansion for Empire: Battle for theidden Realm."

"What about you?"

> The results are... surprising, to say the least.

Well, are they? I always thought that people will try stuff in a "safe harbor" which they cannot try or should do somewhere else. So I always expect these sandboxes to be full of nsfw stuff.

And people might not understand that their stories will influence the story of others, so...

> And people might not understand that their stories will influence the story of others, so...

They don't. Pretty much all large-scale AI models separate inference/content generation from training. What you use is the frozen version which doesn't update unless its creators retrain it. And if they retrain then it's their choice which data to use for retraining, there's no need to restrict the generated output.

Thanks a lot! I always assumed input automatically gets used for training. :)

Yeah not surprising at all. Give any sandboxy tool for free to a bunch of anonymous internet users and you can be sure it'll mostly be used for these things.

I mean, even reddit is full of nsfw, explicit, borderline subs with millions of active users

Looking at the r/AIDungeon subreddit, it is quite obvious that virtually every user seems to use it for porn.

This reminds me of how Nintendo developers discovered their western customers love drawing phalluses[1]. I don’t find the NSFW percentage to be at all surprising. It was common with Eliza too.

[1] https://www.kotaku.com.au/2012/11/nintendo-created-a-penis-d...

Gutted that the pictures in this article no longer load.

They all appear to load fine for me, and only a single image on the top is a dick (and not on one of Nintendo's services)

One of my first thoughts when playing with AI dungeon was to try and get it to write something erotic. Glad I didn't follow through

I showed it to some friends and after the introduction "You are a princess in Larion, a knight approaches..." the first thing one entered was "fuck the knight".

It's not necessarily because the person is trying to be sexual for its own sake. That's a good test as to how constrained the system is. Commercial systems (such as video games) often lock out any sexual actions (at least the systems not specifically made for that purpose), or constrain them to a fixed set of times and or circumstances. Immediately using a sexual situation has the dual benefits of testing how flexible the system is, as well as seeing something somewhat new and novel.

Fun fact: AI Dungeon used to be Open Source and you used to be able to run it locally without sending your data to someone else and without censorship of any form https://en.wikipedia.org/wiki/AI_Dungeon#Development

This is what happens when software that you use does a bait and switch into cloud.

For anyone wanting to play it locally, a quick google search gave me these two links: https://colab.research.google.com/drive/1OjBQe4H4C2s-p4-OeJo... and https://pastebin.com/UMUV0KTw

Speaking as a former customer, their actual application is really not that great. I was subscribed over several months and while new features were sparse, nearly daily the app would update with fixes to the UI and backend. Existing features that became broken and fixed on a day to day basis and UI glitches all over the place. So while their core product, the AI, is the best on the market, everything they wrapped around that really isn't that great at all. So I'm not really suprised that their API is lacking as well. Just something to keep in mind, before using their product...

I also subscribed for a while and keep periodically coming back to it. Interestingly, I had the opposite impression. They keep adding a lot of fancy fluff: tracked quests, editable story context, predefined worlds, ... A constant stream of new bells and whistles meant to expand on the experience and incentivize you to spend money.

However, I feel all of these are actually good ideas crippled by the fact that they still don't address the actual problem: the underlying model is only nearly good enough to play an interesting adventure from beginning to end without it going off the rails. A typical session feels like a battle against the AI's tendency of losing the plot all the time. Either literally forgetting the storyline or producing utter non sequiturs. And when it doesn't forget where it's going, it instead keeps repeating itself and insisting on the same trajectory regardless of your responses, even after reaching a natural conclusion.

After tons of fiddling and undoing nonsense responses you finally got an adventure that made sense and you killed the demon lord at the end of your quest? Suddenly, he will tell you to hurry before his master, the demon lord, comes after you and you must stop him from conquering the world!

People are idiots, you act like you care about children and want them to be safe, but freak out more over fiction then reality.. I was born in 1984 and was sexually abused like so many other kids, and it was by a parent.. what also gets me is you think only pedophiles sexually abuse children, the fact is they are less likely too.. you can look it up yourself, its well known in the phycology field.. https://blogs.bmj.com/medical-ethics/2017/11/11/pedophilia-a... .. Pedophilia and Child Sexual Abuse Are Two Different Things — Confusing Them is Harmful to Children.

PS.. As a child victim who is now an adult and forgives my attacker.. To say that a text based story has more value then me ir just as much value is an insult to all kids, and those who have been abused.. I WISHED.. these outlets existed far before I was born, so maybe this chain of violence would have never reached me, so don't sit there an act like your war on fiction benefit's me or other kids.. you only spread the sickness.. its Fing common sense, you starve the lion it eats people.

Isn't it just random dungeons created by anonymous users? Is there actually any sensitive data here? I have a "similar" service (nothing about AI but similar in other ways) and security is the least of my concerns since being hacked means I will expose completely meaningless data. Now I'm afraid someone will hack me and make a similar fuzz about me being an idiot.

They require login. It wasn't verified at least earlier so you could just use a bogus email address, but I suspect many people used their own email.

I had no idea people still used autoincrementing ids. Do people also build businesses on cakephp and joomla?

No, we all ditched a perfectly good way of doing something and switch frameworks every 6 months to keep up with the latest JavaScript fad.

>perfectly good way of doing something

I don't agree

> I had no idea people still used autoincrementing ids.

said the user in post number 26977295, replying to post number 26976540.

I prefer to have actual access control rather than relying on my adversary not being able to iterate ints.

not a binary choice

Why would you pick those two projects? They are both actively developed, and have recently released stable versions.

I don't use either, but I quickly checked cakephp, and I think their code is great, and they have 94% test coverage.

Hurrrdurrr php bad, go, rust good.

You actually should in the backend, at least when using clustered indices... See also https://vladmihalcea.com/clustered-index/ Last section about "Clustered Index column monotonicity".

Of course a public facing uuid or so is possible, but you want to index this too...

It makes even less sense with the "publicid" query string you see being a uuid.

> In summary - if user input on a private adventure is flagged using an automated system, it will be manually reviewed, with other private user adventures potentially being manually reviewed as well. With almost half of the userbase being involved with NSFW stories, this seems like a tremendous misstep, as users have an expectation that their private adventures are, well, private.

I would assume they want to review the inputs to avoid a repeat of the incident where Microsoft's Twitter bot was trained to say inappropriate things: https://en.wikipedia.org/wiki/Tay_(bot)

If that’s true then they can review public data or use private data only with explicit consent. Suddenly deciding to read people’s private sexual fantasies just to “improve our algorithm” seems quite over the line.

Adding to that, the system supported multiuser role playing which many people used during the COVID year for intimacy with their significant others. Latitude is essentially deciding to peruse sexts for fun.

GPT-3 has already been trained

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact