[WIP] Automated CH metadata extractor

high_octane · Post by **high_octane** » Wed Aug 14, 2019 5:36 am

Essentially, I'm planning and working on a software project to make the lives of those interested in creating a database of CH metadata a lot easier, myself included. More specifically, this topic will be about my tentative outline regarding the automation of CH metadata collection via machine learning, and subsequent foray into usability. I understand this is a very niche topic, my apologies.

In order for this project to come to fruition, it will need to be very accurate. It won't be as accurate as manually watching CH videos for millions of hours while jotting down relevant info, but it will, hopefully, come very close. Since I'm utilizing machine learning (OpenCV), I will need a ton of photos of various pornstars in order to create a specifically trained model for them.

NOTE: I've never attempted any work with machine learning in projects prior, so I still need to take some time to familiarize myself with the library.

Here's a list of metadata that will be extracted from the CHs:

media info (width, height, video_encoder, audio_encoder, container, fps, creation_date, duration, etc...)

author(s)

good name (in case the name of the video is insufficient for uniquely identifying the CH)

# of beats (for calculating difficulty (will also include timestamps for use with external device synchronization))

# of rounds

global and per round statistics

models/pornstars

what the model is doing in the scene (to determine genre stats)

and anything I forgot to include

Once the AI has detected the above criteria, it will finally output some form of human-readable data structure. It'll either be a CSV or JSON file, I haven't decided yet.

This will allow for the creation of a very rich and detailed database; one which will allow you to get figures for a particular CH round organized like so:

Code: Select all

CH Name: Round 1: models(25% modelname0, 25% modelname1, 50% modelname2), genre(20% HJ, 50% Tease, 30% BJ), difficulty(medium 1.4 BPS) etc...

I'll create a GitHub and/or GitLab repository and link it here once I've gotten things planned out and in a usable state, so stay tuned.

Git Repository:
https://gitlab.com/high_octane/chext

I've decided to solely use GitLab. If you already have a GitHub account, you can easily sign into Gitlab with it.

In the future, if anyone becomes interested in collaborating, I'll greatly welcome and appreciate it. In terms of chatroom-esque software for collaboration, I cannot use Discord anonymously (and believe me, I have tried more times than I wish to recall), but I can use Riot.

Riot room for further discussion (give it a little time and it'll load):
https://riot.im/app/#/room/#chext:matrix.org

This is a huge undertaking. Will I survive?

fragrantEmulsion · Post by **fragrantEmulsion** » Wed Aug 14, 2019 3:25 pm

Can I have repo access?

high_octane · Post by **high_octane** » Wed Aug 14, 2019 4:45 pm

fragrantEmulsion wrote: Wed Aug 14, 2019 3:25 pm Can I have repo access?

Sure thing. I don't a repository up at the moment, but I'll get around to that soon. Access to the repo will be available to everyone.

I just set up a room on Riot for further discussions of this project, which I just added to the OP. This room is also accessible to anyone. I don't want to cloud up this forum too much with this, because it isn't exactly "On Video", but it is about Cock Hero.

That said, I will post major updates here. Discussion here is also fine if people don't feel like making an account for Riot.

Qiubi · Post by **Qiubi** » Wed Aug 14, 2019 4:47 pm

and... for the mere mortals that don't understand about this things.... what are you doing exactly? a program to watch random videos?

high_octane · Post by **high_octane** » Wed Aug 14, 2019 5:03 pm

Qiubi wrote: Wed Aug 14, 2019 4:47 pm and... for the mere mortals that don't understand about this things.... what are you doing exactly? a program to watch random videos?

I'm creating a program which will analyze a Cock Hero video, and output various information about it. Thanks to progression in machine learning technology, it will become easier to extract that info. For instance, I can train the AI to look for things that look like the beginning of a round. If it identifies it, it will log a timestamp of when it saw it (and a timestamp of the first beat in the round for calculating difficulty). Then, when it identifies the next round, it will have a duration for the first round it found.

Then, once all of the info is collected, a different project which focuses on interactive CH can take that data (especially the round timestamp data) and use it to piece together a ton of different rounds into a new experience. How the rounds are concatenated could also be based on genre info or pornstar info, which is also something that my project aims to collect, and so on and so forth...

Sorry if that wasn't any clearer. I'm not the best at explaining things.

jamesredcool · Post by **jamesredcool** » Wed Aug 14, 2019 5:26 pm

Absolute mad lad. It sounds very cool

Qiubi · Post by **Qiubi** » Wed Aug 14, 2019 7:22 pm

high_octane wrote: Wed Aug 14, 2019 5:03 pm
Qiubi wrote: Wed Aug 14, 2019 4:47 pm and... for the mere mortals that don't understand about this things.... what are you doing exactly? a program to watch random videos?
I'm creating a program which will analyze a Cock Hero video, and output various information about it. Thanks to progression in machine learning technology, it will become easier to extract that info. For instance, I can train the AI to look for things that look like the beginning of a round. If it identifies it, it will log a timestamp of when it saw it (and a timestamp of the first beat in the round for calculating difficulty). Then, when it identifies the next round, it will have a duration for the first round it found.

Then, once all of the info is collected, a different project which focuses on interactive CH can take that data (especially the round timestamp data) and use it to piece together a ton of different rounds into a new experience. How the rounds are concatenated could also be based on genre info or pornstar info, which is also something that my project aims to collect, and so on and so forth...

Sorry if that wasn't any clearer. I'm not the best at explaining things.

thanks that was more clear than the other thing xDD

Rule63MePlease · Post by **Rule63MePlease** » Wed Aug 14, 2019 11:44 pm

This sounds very awesome! I have always wanted there to be an official system to rate the difficulty of a CH. Having a program that can count all the beats, beat changes/patterns, beats per round, beats per minute, total number of beats each round, and how many rounds a CH has would make that process are lot easier.
Next we will have to decide on what kind of score to give each of those numbers to get the total difficulty score.

This program could also be used for automating devices to play Cock Hero with,
kind of like this thing. http://cockheromachine.blogspot.com/2017/

doremi · Post by **doremi** » Wed Aug 14, 2019 11:53 pm

high_octane, do you realise that if you develop this software as an official project while enroled in a Computer Science program, you could be the very first to earn a Cock Hero Doctorate degree.

As for a huge image bank to choose from, wouldn't it be nice to feed on the PornHub server drives? I guess you could write a website rip feature and use pic and tags.

high_octane · Post by **high_octane** » Thu Aug 15, 2019 4:00 am

Rule63MePlease wrote: Wed Aug 14, 2019 11:44 pm This sounds very awesome! I have always wanted there to be an official system to rate the difficulty of a CH. Having a program that can count all the beats, beat changes/patterns, beats per round, beats per minute, total number of beats each round, and how many rounds a CH has would make that process are lot easier.

I too have yearned for that, and this program will help achieve exactly that.

Rule63MePlease wrote: Wed Aug 14, 2019 11:44 pm Next we will have to decide on what kind of score to give each of those numbers to get the total difficulty score.

I posted this in a different thread, and my method hasn't changed since then. Basically, in order to get the D-weighted difficulty value, you need to add a pref value to the base difficulty. And by "D-weighted", I mean "dick-weighted", of course.

high_octane wrote: Thu Jun 13, 2019 3:31 am This is my idea for calculating the difficulty of a round:

Code: Select all

/**
 * nbeats: number of beats in the round
 * len:    length of the round in seconds (measured from the first beat to the last)
 * prefs:  content the user finds the most arousing (scale from -2.0 to +2.0, where - values are
 *                                                   less arousing and + values are more arousing)
 *
 * returns difficulty value (-inf to 0.4 = easiest, 0.5 to 0.9 = super easy, 1.0 to 1.4 = very easy,
 *                           1.5 to 1.9 = easy, 2.0 to 2.4 = medium, 2.5 to 2.9 = hard,
 *                           3.0 to 3.4 = very hard, 3.5 to 3.9 = super hard, 4.0 to inf = hardest)
 */
static inline double get_difficulty(size_t nbeats, double len, double prefs)
{
    return ((double)nbeats / len) + prefs;
}

The prefs implementation is a bit naive here but whatever.

doremi wrote: Wed Aug 14, 2019 11:53 pm high_octane, do you realise that if you develop this software as an official project while enroled in a Computer Science program, you could be the very first to earn a Cock Hero Doctorate degree.

I guess I'm Dr. high_octane now.

But in all seriousness, I chose not to pursue tertiary education for various reasons. Most of the skills that I've learned are self-taught, because I had the will and desire to learn them. Now it is time to use my skills for the good of the Milovana community!

doremi wrote: Wed Aug 14, 2019 11:53 pm As for a huge image bank to choose from, wouldn't it be nice to feed on the PornHub server drives? I guess you could write a website rip feature and use pic and tags.

That's not a bad idea! I'll look into it. To have a very accurate model, I'll probably need around 5,000 pics of each pornstar. Obviously, there probably aren't that many images of a particular pornstar, so around 1,000 should suffice.

Rule63MePlease · Post by **Rule63MePlease** » Fri Aug 16, 2019 2:53 pm

I think the difficulty scale should range from 1 to 10, or 1 to 100. Going from 0.4 to 4.0 just seems like a very odd system.

high_octane · Post by **high_octane** » Fri Aug 16, 2019 7:49 pm

Rule63MePlease wrote: Fri Aug 16, 2019 2:53 pm I think the difficulty scale should range from 1 to 10, or 1 to 100. Going from 0.4 to 4.0 just seems like a very odd system.

The reason it looks strange like that is because it directly modifies the beats per second value. That scale is based on the range of practical tempos in music.

For instance, 0.4 bps equals 24 bpm, and 4.0 bps equals 240 bpm. No song in CH will probably ever be that slow or fast, but if someone really really likes the content in a round (+2.0 pref) that is only 2.0 bps (120 bpm) "medium" base difficulty, then that round would be considered "hardest" difficulty for them in the D-weighted calculation.

Likewise, if someone truly despises the content with all of their being (-2.0 pref), then the "medium" base difficulty round would be considered "easiest" in D-weighted.

The scale also accounts for values that positively or negatively exceed that range as well, it's just capped at those values for determining the actual difficulty label.

Another thing to note: These values are only meant to be used internally. The actual difficulty ratings are the labels, like "easy", "medium", "hard", etc... But it is also nice to display the actual numbers, too, to see exactly how a round's difficulty label was calculated.

D-weighted sounds really corny to me, so what about this as an example:

Code: Select all

D  = very easy (1.3)
Dp = medium    (2.1)

where 'D' means difficulty or base difficulty, 'Dp' means difficulty+preferences, and 'p' equals +0.8

To make this more accurate, I need to constantly modify 'p' throughout the round, based on the content. This would allow for calculations where both pleasant and unpleasant content is featured in a round.

3xTripleXXX · Post by **3xTripleXXX** » Fri Aug 16, 2019 11:06 pm

Both D and DP. Sounds like a winning scoring system!

Rule63MePlease · Post by **Rule63MePlease** » Sat Aug 17, 2019 5:50 pm

I don't think there should be a cap. Yeah it's very unlikely a CH will go so high, but I still think it would be better for the final score to be displayed as a 1-10 or 1-100 for the number. I mean, 24 could just = 1.0 or 10 and 240 would = 10.0 or 100, If some CH does go passed the max, then oh well.

Also a long time ago I came up with a formula for a CH score. Not for difficulty of a CH but how far a player made it vs how many days they went without fapping. I need to go find that again.

high_octane · Post by **high_octane** » Sat Aug 17, 2019 10:43 pm

Rule63MePlease wrote: Sat Aug 17, 2019 5:50 pm I don't think there should be a cap. Yeah it's very unlikely a CH will go so high, but I still think it would be better for the final score to be displayed as a 1-10 or 1-100 for the number. I mean, 24 could just = 1.0 or 10 and 240 would = 10.0 or 100, If some CH does go passed the max, then oh well.

The cap was referring to the label system. For instance, a D score of 3.2 with a pref value of 1.5 (Dp of 4.7) would be labelled the same difficulty (hardest) as a D score of 2.0 with a pref value of 2.0 (Dp of 4.0). Even if Dp was equal to 16.5, it would still be "hardest" difficulty. D and Dp have no bounds.

I'm not sure what the advantages are of using a range from 1-10 or 1-100. Maybe I'm misunderstanding something.

I could convert my floating-point range into your integer range, but I'm just not sure how that would improve things. Is it purely for aesthetics?

Rule63MePlease wrote: Sat Aug 17, 2019 5:50 pm Also a long time ago I came up with a formula for a CH score. Not for difficulty of a CH but how far a player made it vs how many days they went without fapping. I need to go find that again.

Perhaps your scoring system could be integrated into my difficulty calculation as well. It sounds interesting.

[WIP] Automated CH metadata extractor

[WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor

Re: [WIP] Automated CH metadata extractor