jtokoph 65 days ago [-]
To any future hackers: You don't need OCR. HQ has a simple websocket server that will stream the questions and possible answers in real time. Set up an http proxy on your phone to inspect the requests the app is making. You'll find lots of helpful stuff.
applecrazy 64 days ago [-]
Ooh. Sounds interesting. I've taken an OCR approach before (see my profile for post) since I thought the iOS app had cert pinning, but this method takes the cake and (presumably) will be faster in a game situation.
throwaway2016a 65 days ago [-]
That works until the app starts using certificate pinning.
bdod6 65 days ago [-]
Author here: At mux, we experimented using machine learning to predict HQ Trivia answers. We managed to get 80-90% accuracy across a dataset of around 500 questions.

The trickiest questions were relational questions (e.g. What's heavier, a pineapple or a Siamese cat?). Would appreciate any feedback on our approach (and happy to answer questions!).

conanbatt 65 days ago [-]
Time to bring in the big questions. The tortoise is on its back. And you are not helping it. Why?
bdod6 65 days ago [-]
I think anyone playing HQ should be encouraged by our results. I know a lot of people turned off by playing because of all the "bots" playing. Based on our analysis and results from over a hundred games...I think it's clear that bots are not sophisticated enough to solve HQ.

We're also using a more sophisticated approach than most bots I've read about, and we continuously train out model on new data. Even so, we would only expect to win 7 out of 100 games.

Our goal was not to hurt the HQ community, but rather to challenge ourselves into solving a difficult data science problem.

mezzode 65 days ago [-]
Pretty sure they were making a Blade Runner reference
bdod6 65 days ago [-]
65 days ago [-]
selectodude 65 days ago [-]
How is this different from coding, say, a wall hack in an online FPS?
bdod6 65 days ago [-]
There's a lot less teabagging when we win.
calbear81 65 days ago [-]
I thought this was going to be a retrospective from the HQ Trivia team about how they were mediocre given the scaling challenges and hiccups they are facing and then they solved it through ML!
argonaut 65 days ago [-]
This seems pretty misleading, since honestly 99% of the machine learning that goes on here happens when running the questions/answers through Google Search. There are probably millions of man-years of machine learning / information retrieval that have gone into Google Search.
xkcd-sucks 64 days ago [-]
The concept of machine learning is pretty misleading, because it's founded upon billions of man-years of human learning
petercooper 65 days ago [-]
Then we find HQ eventually pivots to being a machine learning research platform once someone invents a perfectly scoring bot ;-)

Joking aside, I'd say HQ Trivia are getting savvier with the questions. A final question the other day was along the lines of "Which two female artists collectively have the same number of Grammys as Beyoncé?" with the answer being "Adele + Madonna", I believe.

bdod6 65 days ago [-]
Yep. I actually mention that specific question in the article as an unsolvable question for machine learning, at least given our current constraints.

Those are generally rare questions though because difficult questions for bots are also difficult questions for humans. HQ can't have too many of those questions without degrading the player experience.

Because of that, I don't think we will ever get beyond 10/11 questions right per game. That still leads to a decent chance at winning at least one game per week though.

petercooper 65 days ago [-]
Ha, so you did! I got down to the Nick Hornby question which reminded me of it and then commented here ;-)

You are right about the experience issue, though. It feels almost like they're trying to make it so you can't win with questions worded in that way, since even someone who actually knew the numbers of Grammys all the artists listed had would struggle to add them together in time.

bdod6 65 days ago [-]
Yep! We felt the same way when we saw that question. I think HQ will prioritize questions that are hard but still feel possible. Otherwise their engagement will start dropping off.
nicolashahn 65 days ago [-]
Is there a dataset of past HQ questions and answers?
bdod6 65 days ago [-]
Yes, we have been archiving each game going back to October. We augment the questions and answers though so that we get more relevant results when run our web scrapes.
nicolashahn 65 days ago [-]
I don't suppose you'd be willing to publish what you've gathered at some point in the near future?
bdod6 65 days ago [-]
We might in the future, but we're unsure what the copyright is on those questions.
axit 65 days ago [-]
Would be awesome if you guys publish the raw data (questions and answers). Would love to try to build a model as a learning exercise.
bdod6 65 days ago [-]
We're not sure about the copyright, but that's something we will look at doing if that's not an issue.