Saturday, August 8, 2009

Real time search (particularly of the chans) just got feasible

After discussing it with my room mate, I've made some breakthroughs here:

the basic concept is, real time search for the *chans. 4chan, 7chan, 420chan, 711chan, 99chan - whatever ones float your boat. You archive about the front 4 pages of every one of their boards and drop the threads as they get beyond the 10-15 minute stale mark, with some obvious tweaking in less active boards. Then you present a nice search interface into it which returns links to and previews of the threads, and the user clicks the link to open a new tab or summat to the chan where the thread is taking place.

Originally the notion was "scrape the data off the pages and hope the chans don't notice you're doing it or change the structure of their pages much" - this meant dealing with the various chan softwares out there and crafting scrapers to get data from all of them. It also meant needing to update (perhaps frantically) at slight changes. It was a scary thought that they might start doing it maliciously, because you aren't really their friend. You're just someone hitting their servers kind of hard and not generating ad revenue.

The reason for all this? Think about it. You want a market? You got it. Around a million or two users who are predominantly male, single, 18-30, have a lot of time on their hands, and are probably aroused and impulsive over it. This should have marketing people salivating like mad - so much money to be made! Of course it's not a very "clean" or "safe" environment, but jesus.

The flipside for the people? What are people on chans for? To waste time? To not be doing anything? Maybe. What's likelier is that there are pieces of the chans that give them something they're looking for. A particular style of humor. A particular sexual fetish. A particular type of story - maybe creepy threads. They go to a chan they know and scour it - reading and looking, waiting for one of the less-than-a-dozen really interesting things (to them) pops up, then they read and follow it, and are either satisfied or continue. If they use a chan up they move to another, and often just work back and forth.

That's a large amount of unfocused viewing with little gratification. The thrill of the hunt I guess but - let's not bother with that. It would be much nicer if you could capture from all streams and grep out what you like. So you go to one interface, type in "creepy thread", and then it shows you threads on 10 or 20 different chans in the past 15 minutes that have had the term "creepy thread" come up in them, and bam. You've just made a way, way better experience for the end user, and dramatically cut down on server load for the people running the chan.

Why that cut-down on server load matters is the magic. This is as far as I'd gotten beforehand - page scraping and thread passing, hoping the chan owners wouldn't notice you was the plan. But why not make their deal sweeter too? The only source of revenue for a chan owner is advertisements. But ads are going to be ineffective unless they're broad, meaning they have to be blandly pornographic, and that's still ineffective. What this service offers is focus, and a reduction in server load. At optimal usage, you might have, say, half the server hits per month, but 10-30% higher probability of clickthrough rates because you can start targeting your ads. Someone comes to your chan because they searched "creepy thread"? Well, show them horror movie or scary story ads. They're way likelier to bite on that than a skimpily clad woman advertising "for-pay porn" on a site where half the users are present to get porn free.

The providers of the interface can target and present ads as well - maybe just unobtrusive google ads, whatever. The very fact that the user is showing their interests is enough to allow some level of targetting, meaning a higher clickthrough rate, and better revenue. It also ensures lower costs and a faster, easier experience for the end user. Literally, everyone wins.

Given this, the new idea is to work with the chan runners and get them to run some software for us, that will do the scraping on the server side and send an efficient package something like once a minute to our servers for inclusion in the search. This would make the job easier for us, and less painful for them.

Anyway, yeah. I think this should be done, and might work on it when I'm not actively engaged elsewhere. If you'd like to do it instead, go ahead - but this only really becomes effective if the chans play nice together on it. You'd need one service covering at least the big 4 or 5 for it to really be useful.

Wootsauce.

edit: there's the worry of "what if it's that stupid random unfocused time that allows you to run into new things and to spawn new things, which keeps the chans from becoming stale and dead" - and my response is that you can still go surf them regularly. Hopefully this'll clear out the leeches and the seekers, and leave mostly oldfags and trolls actually sticking to a chan. It'll also likely improve content of specialized threads, because people with a vested interest can be guaranteed to find them.

And the interface for this is crucial - you make the interface suck, it won't work.

..okay I'm done