3.10.2016 (17:18)
About Mis-extracting Subtitle

26.9.2016 (08:13)

17.9.2016 (08:39)

23.11.2016 (06:29)

12.9.2016 (05:18)

29.2.2016 (17:52)

9.1.2016 (15:54)
leslie thomas

29.11.2015 (17:51)

zuggy replies:

16.12.2015 (10:12)

Ok Ok Anyway, Bloody is the actual subrip developer.

29.11.2015 (17:47)

Bloody replies:

30.11.2015 (01:29)

Maybe you'd like to pick up this sourcecode as a starting point:

Good luck...

1.11.2015 (16:04)

Bloody replies:

8.11.2015 (09:07)

Those size limits (no fonts higher than 45 pixels, for example) are hard-coded throughout the program. Very difficult to fix, at least for me.

28.9.2015 (15:06)

Bloody replies:

4.10.2015 (03:18)


I've just managed to reproduce the problem. It's defo a bug, probably with one of the newest versions. I'll look into it...

mr.rage replies:

9.10.2015 (08:04)

Thank you for your efforts in advance.

Bloody replies:

13.10.2015 (17:55)

Try the new update (1.56.1). Should (hopefully) be fixed now.

30.8.2015 (20:40)
Rashmikant Patel

25.8.2015 (23:04)
Rashmiant Patel

25.8.2015 (01:02)
Rashmikant Patel

Bloody replies:

25.8.2015 (01:24)

The "best guess" function is not very reliable. If you want to be sure that your matrix is 100% correct & reliable, i suggest you forget that "best guess" thing and enter all glyphs manually. It also goes much faster (workflow-wise) to just type the char than 1) having to look at the best guess proposal, 2) making a decision and 3) eventually ending up typing the char anyway (simply typing the right char eliminates step 1 and 2).

About "fill these characters": the problem there is that one never knows in advance if all of the glyphs will be recognized correctly as single-glyphs, or if, sometimes, things like "tt" or "rv" are treated as one glyph. Frankly, i have no idea what would happen then, since i've never used that button either...

Switching GUI fonts on-the-fly doesn't sound like an easy thing to do with Delphi once the window has been created. I'd imagine that changing fonts would likely cause layout issues.

Sorry if i don't have better news there, but i've never used those parts of the program and therefor haven't touched any part of that code.

Rashmikant Patel replies:

25.8.2015 (22:31)

Thanks for your reply. I will try the method you suggest. For your information, It may be that I have good hard subtitles so I am getting a better than 85% hit on the best guess, so it is handy to manually thypin the glyph to start but at some point I can change thesetting to run this automaically and have minimal editing that I can do in a Notepad editor faster. Thank for a great tool.

Bloody replies:

27.8.2015 (01:04)

That would still leave your matrix file with errors, i.e., all future subs would contain the same errors.

Better make 100% sure that you don't make mistakes while entering glyphs.

20.8.2015 (01:34)

Bloody replies:

20.8.2015 (04:22)

Too lazy for that. ;)

Well, as long as SubRip doesn't come with an installer i guess it's safe from Dice's AdWare attempts. Besides, now as everyone knows, they got to be careful, otherwise they would kill their precious site. So i guess they won't try anything like that again...

Besides, i'm an SF user since ~2000 and i never had any trouble there.

9.8.2015 (03:57)
Sven Bent

Bloody replies:

9.8.2015 (14:24)

What do you mean with "multiple subs"? Do you refer to using multiple matrix files? If so, please click on the help button (the '?' button) on the right in the "Collected font glyphs" area in the "Open VOB" dialog, or open Help/FindMatrix.txt or CLI.txt manually in a text viever. There i'm explaining the issue. In CLI.txt, you may also find a few other interesting things about matrix files, like in the /JOINMATRIX documentation.

Come back when you've finished reading & testing and still have unanswered questions.

Bloody replies:

9.8.2015 (14:32)

Addendum: remember that the matrix files i've provided in the latest releases are created with/for default settings for all "Advanced OCR Settings". With tighter settings, you'd only uneccesarily enlarge them (and you would have to type more glyphs while converting subs).

I'm mentioning this because i somehow suspect that you didn't read the short Readme.txt in the ChMatrix/ folder..

Sven Bent replies:

9.8.2015 (19:24)

sorry i meant multiple matrix files.

yeah I'm testing your multi matrix files with the default 980 2-2-6 vs my huge matrix with 1000 2-2-2.

and I'm letting subrip automatically find the most optimal of your sub matrix files as described in the readme.txt

However what I am pondering about if it does a first catch exit seach. Meaning the first matrix file it finds with 40 hits it will stop seaching for a more optimal file.

or if it will continue to seach for some more optimal liek one with 50hits.

in the first case you easily end up with a suboptimal matrix file and i think a full seach would be more optimal for accuracy

Sven Bent replies:

9.8.2015 (19:31)

ohoh and Im not doing scandinavian dvd's in my tests as i think that would pretty much skew the results totally in my big matrix's favour :D. just plain Region 1 USA DVD's

Bloody replies:

9.8.2015 (21:09)

If a 'good' matrix is found, it simply means that 40 of the first 60 glyphs from the subtitle have been positively identified. It can't get better than that. It means that the matrix in question indeed contains lots of glyphs for the current font, so that's the file where the glyphs belong.

The purpose of this is to put all glyphs for a subtitle in the right matrix. The perfect matrix file shall ideally only contain one single font, but no more. Nothing 'suboptimal' there.

Now if there were multiple 'good' matrix files, it would just prove that you have 'double' matrix files, i.e., more than one matrix containing glyphs for the very same font. But that's all there is to it.

If you don't check the 'auto-apply' checkbox, you'll see a dialog where you can decline to use that matrix. In that case, SubRip will continue the search and maybe find another one. But the match set will always be 40 glyphs, never more, because the very moment when 40 glyphs were identified, the FindMatrix function will stop immediately (for maximum search speed).

Bloody replies:

9.8.2015 (21:14)

Addendum: about Scandinavian glyphs: as a matter of fact, i did alot of scandinavian glyphs, at least in some of the more often-used matrix files. I do many languages...

Now, that's where the MatchSet value steps in: only 40 of the first 60 glyphs need to be identified, so there is a 'buffer zone' for unknown glyphs. For example, if a matrix contains all the english glyphs already, but no scandinavian ones, there may be like 5 or 7 glyphs not found (in the first 60 glyphs) because they're Scandinavian. But as long as it's at least 40, the matrix will still be detected as the right one for this subtitle font, and all you have to do is occasionally add one of those Scandinavian glyphs while the rest is already known.

Sven Bent replies:

9.8.2015 (23:10)

if you have a font on a new dvd that is close to two differene fonts in two different sub matrix files. the order of witch is search now detemins which to take. not which one matches the closest.

So subrip detects 40/60 glyphs ok on matrixfileA and thinks it ok, even thoug matrixfileB might have a even better font or the exact font. But it never gets checked cause matrixfileA was "good enough".

The two fonts can easily differ enough to not be considered the same. but a font that hits in between can be matched against both.

Bloody replies:

9.8.2015 (23:56)

No, it doesn't really matter. If you have 40 matches, then it's likely that there are even more, so that matrix will save you a lot of typing and produce fine results. And that is all you need. It's not about precision or accuracy.

More on the issue is in the EMail i just sent to you. Cya there...

Bloody replies:

10.8.2015 (00:13)

One more point: the FindMatrix does not just take the first 60 glyphs, but rather the first 60 UNIQUE (i.e. different from each other) glyphs. That means that the search set will already contain most lower-case, many upper-case and also a number of other chars already.

Now, some of those glyphs, like 'g', 'D' or '4' can't be possibly confused with anything else, they are 'unproblematic'. If such glyphs are found in a matrix, you can be sure that it's EXACTLY the right font. There simply is no way that a different matrix could contain a 'better matching' font (only the very same font).

Bloody replies:

10.8.2015 (00:25)

As matter of fact, it's more the other way around. The FindMatrix practicly never identifies a non-matching matrix as a 'good' one, but sometimes there is a 'good' one but there were only, say, 36 matches and the matrix is rejected as 'wrong' one, when it is indeed perfectly fine.

I've just chosen the 'MatchSet' value to be very strict, very conservative, in order to keep different fonts strictly separate, in order not to ruin my matrix collection.

I'm just mentioning this here again as it might also help other users.

