25.8.2015 (01:02) Reply # X
Rashmikant Patel

Bloody replies:

25.8.2015 (01:24)X

The "best guess" function is not very reliable. If you want to be sure that your matrix is 100% correct & reliable, i suggest you forget that "best guess" thing and enter all glyphs manually. It also goes much faster (workflow-wise) to just type the char than 1) having to look at the best guess proposal, 2) making a decision and 3) eventually ending up typing the char anyway (simply typing the right char eliminates step 1 and 2).

About "fill these characters": the problem there is that one never knows in advance if all of the glyphs will be recognized correctly as single-glyphs, or if, sometimes, things like "tt" or "rv" are treated as one glyph. Frankly, i have no idea what would happen then, since i've never used that button either...

Switching GUI fonts on-the-fly doesn't sound like an easy thing to do with Delphi once the window has been created. I'd imagine that changing fonts would likely cause layout issues.

Sorry if i don't have better news there, but i've never used those parts of the program and therefor haven't touched any part of that code.

Rashmikant Patel replies:

25.8.2015 (22:31)X

Thanks for your reply. I will try the method you suggest. For your information, It may be that I have good hard subtitles so I am getting a better than 85% hit on the best guess, so it is handy to manually thypin the glyph to start but at some point I can change thesetting to run this automaically and have minimal editing that I can do in a Notepad editor faster. Thank for a great tool.

Bloody replies:

27.8.2015 (01:04)X

That would still leave your matrix file with errors, i.e., all future subs would contain the same errors.

Better make 100% sure that you don't make mistakes while entering glyphs.

20.8.2015 (01:34) Reply # X

Bloody replies:

20.8.2015 (04:22)X

Too lazy for that. ;)

Well, as long as SubRip doesn't come with an installer i guess it's safe from Dice's AdWare attempts. Besides, now as everyone knows, they got to be careful, otherwise they would kill their precious site. So i guess they won't try anything like that again...

Besides, i'm an SF user since ~2000 and i never had any trouble there.

9.8.2015 (03:57) Reply # X
Sven Bent

Bloody replies:

9.8.2015 (14:24)X

What do you mean with "multiple subs"? Do you refer to using multiple matrix files? If so, please click on the help button (the '?' button) on the right in the "Collected font glyphs" area in the "Open VOB" dialog, or open Help/FindMatrix.txt or CLI.txt manually in a text viever. There i'm explaining the issue. In CLI.txt, you may also find a few other interesting things about matrix files, like in the /JOINMATRIX documentation.

Come back when you've finished reading & testing and still have unanswered questions.

Bloody replies:

9.8.2015 (14:32)X

Addendum: remember that the matrix files i've provided in the latest releases are created with/for default settings for all "Advanced OCR Settings". With tighter settings, you'd only uneccesarily enlarge them (and you would have to type more glyphs while converting subs).

I'm mentioning this because i somehow suspect that you didn't read the short Readme.txt in the ChMatrix/ folder..

Sven Bent replies:

9.8.2015 (19:24)X

sorry i meant multiple matrix files.

yeah I'm testing your multi matrix files with the default 980 2-2-6 vs my huge matrix with 1000 2-2-2.

and I'm letting subrip automatically find the most optimal of your sub matrix files as described in the readme.txt

However what I am pondering about if it does a first catch exit seach. Meaning the first matrix file it finds with 40 hits it will stop seaching for a more optimal file.

or if it will continue to seach for some more optimal liek one with 50hits.

in the first case you easily end up with a suboptimal matrix file and i think a full seach would be more optimal for accuracy

Sven Bent replies:

9.8.2015 (19:31)X

ohoh and Im not doing scandinavian dvd's in my tests as i think that would pretty much skew the results totally in my big matrix's favour :D. just plain Region 1 USA DVD's

Bloody replies:

9.8.2015 (21:09)X

If a 'good' matrix is found, it simply means that 40 of the first 60 glyphs from the subtitle have been positively identified. It can't get better than that. It means that the matrix in question indeed contains lots of glyphs for the current font, so that's the file where the glyphs belong.

The purpose of this is to put all glyphs for a subtitle in the right matrix. The perfect matrix file shall ideally only contain one single font, but no more. Nothing 'suboptimal' there.

Now if there were multiple 'good' matrix files, it would just prove that you have 'double' matrix files, i.e., more than one matrix containing glyphs for the very same font. But that's all there is to it.

If you don't check the 'auto-apply' checkbox, you'll see a dialog where you can decline to use that matrix. In that case, SubRip will continue the search and maybe find another one. But the match set will always be 40 glyphs, never more, because the very moment when 40 glyphs were identified, the FindMatrix function will stop immediately (for maximum search speed).

Bloody replies:

9.8.2015 (21:14)X

Addendum: about Scandinavian glyphs: as a matter of fact, i did alot of scandinavian glyphs, at least in some of the more often-used matrix files. I do many languages...

Now, that's where the MatchSet value steps in: only 40 of the first 60 glyphs need to be identified, so there is a 'buffer zone' for unknown glyphs. For example, if a matrix contains all the english glyphs already, but no scandinavian ones, there may be like 5 or 7 glyphs not found (in the first 60 glyphs) because they're Scandinavian. But as long as it's at least 40, the matrix will still be detected as the right one for this subtitle font, and all you have to do is occasionally add one of those Scandinavian glyphs while the rest is already known.

Sven Bent replies:

9.8.2015 (23:10)X

if you have a font on a new dvd that is close to two differene fonts in two different sub matrix files. the order of witch is search now detemins which to take. not which one matches the closest.

So subrip detects 40/60 glyphs ok on matrixfileA and thinks it ok, even thoug matrixfileB might have a even better font or the exact font. But it never gets checked cause matrixfileA was "good enough".

The two fonts can easily differ enough to not be considered the same. but a font that hits in between can be matched against both.

Bloody replies:

9.8.2015 (23:56)X

No, it doesn't really matter. If you have 40 matches, then it's likely that there are even more, so that matrix will save you a lot of typing and produce fine results. And that is all you need. It's not about precision or accuracy.

More on the issue is in the EMail i just sent to you. Cya there...

Bloody replies:

10.8.2015 (00:13)X

One more point: the FindMatrix does not just take the first 60 glyphs, but rather the first 60 UNIQUE (i.e. different from each other) glyphs. That means that the search set will already contain most lower-case, many upper-case and also a number of other chars already.

Now, some of those glyphs, like 'g', 'D' or '4' can't be possibly confused with anything else, they are 'unproblematic'. If such glyphs are found in a matrix, you can be sure that it's EXACTLY the right font. There simply is no way that a different matrix could contain a 'better matching' font (only the very same font).

Bloody replies:

10.8.2015 (00:25)X

As matter of fact, it's more the other way around. The FindMatrix practicly never identifies a non-matching matrix as a 'good' one, but sometimes there is a 'good' one but there were only, say, 36 matches and the matrix is rejected as 'wrong' one, when it is indeed perfectly fine.

I've just chosen the 'MatchSet' value to be very strict, very conservative, in order to keep different fonts strictly separate, in order not to ruin my matrix collection.

I'm just mentioning this here again as it might also help other users.

2.8.2015 (01:00) Reply # X
Sven Bent

Bloody replies:

2.8.2015 (10:18)X

Good question. This issue had already crossed my mind, but somehow i never cared to find out exactly.

Right now, any glyph marked as 'junk' will be tagged as 'junk', except for "take with next" glyphs (as it seems). This is not ideal.

The best solution here would be to manually decide for every single glyph/part if it's junk-or-not, to have full control over all glyphs, but...

If you mark only the first part as 'junk' (which is not possible right now) and then add the next part as regular (non-junk) glyph, the matrix would contain only the complete glyph, but not the left part of it. If you save/re-load the matrix, that first part would be gone and if detected again, the OCR engine would pop up and ask, you would answer 'junk' and 'extend-right', and then the next glyph (the complete one with parts 1+2) would pop up again, you'd enter it again and then it would be added to the matrix a second time (as a 'double', i.e., completely identical glyph), which is also not ideal (the OCR engine does not OCR the next part after a click on "extend-right").

Jeez, and i thought this latest bugfix release would be the last one for a while..

Problem is, this "take with next" thing is another pretty complicated part of the OCR code. Frankly, i don't see any quick solution for this. I'm taking a note, but i'm not sure if i can come up with an easy fix for this. I also have no plans for any new releases in the very near future.

So right now, my advice would be to mark the right-most part as 'junk' and then manually delete the "take with next" glyphs afterwards.

30.7.2015 (19:24) Reply # X

28.7.2015 (11:40) Reply # X

25.7.2015 (18:53) Reply # X

Bloody replies:

26.7.2015 (04:00)X

You never fail to come up with ideas... well, i believe this is not just about a single number, but rather multiple decisions/conditions. Probably not easy to implement. It would also significantly slow down the conversion. And if you're already using super-tight advanced OCR settings, i doubt that this would yield any more reliable results.

Sven Bent replies:

26.7.2015 (23:09)X

I tend to be very curious about the tools i use :D and im regretting a lot that I never pushed harder on learning to program. I gave up when my pascal teacher couldn't tell me to address more than 64kbytes in DOS, and then windows 95 came right after than and I gave up on programming. So instead i just poke real programmer now with ideas :D.

Bloody replies:

27.7.2015 (06:33)X

It's never too late to follow a passion... ;)

23.7.2015 (03:14) Reply # X

Bloody replies:

23.7.2015 (08:31)X

Well, now we're talkig about some of the inner workings of the OCR engine which i'm unfortunately not exactly familiar with. If i were, you'd probably no longer see any of those "this subtitle item can't be processed" messages anymore. I'm not sure if i ever find the courage (and the time) to take on those issues. I'm still kind of a noob when it comes to delphi programming, and the core of the OCR engine code isn't exactly well documented..

So for the time being, i'm afraid i can't help much with those kind of issues.

Bloody replies:

23.7.2015 (08:49)X

Also, this affects not just the DVD subtitle conversion, but also hardsubbed AVI files, something i'm unable to test. So if i mess up some part of that code, this hard-subbed stuff might stop working as it should, and who's gonna step forward and fix the mess i created?

SvenBent replies:

24.7.2015 (04:58)X

Yeah i was crossing my fingers it was a small thing, but i get that, easily after many different people, the source code can be complex or simply just not structured for an easy edit in that part.

And its only a minor issue, not really worth to much workhours

19.7.2015 (20:10) Reply # X
Mei Xiao

Mei Xiao replies:

20.7.2015 (02:52)X

I tried SubRip 1.50 beta 4, and it generates as many .bmp as there are subtitle pictures in the specified VOB - with lightning speed !!! Maybe the code for this feature in version 1.55.0 should be checked for some tiny overlook ?

Thank you so much for making such a good software available !

Bloody replies:

20.7.2015 (03:24)X

The odd thing is that i've never touched any part of that code, as i'm only working on DVD to Text conversion, but not on Image Sequence or AVI hardsubbed issues, therefor i'm a bit surprised how this stuff all the sudden shouldn't work.

I can give it a try and test a few VOBs, but somehow i got the feeling that things will probably work just fine here. I must also point out that i work on Linux only. All my recent releases were developed, compiled & tested mostly on Linux. I still have an old WindowsXP VM and i'll give it another try, see what happens.

There have been a few strange GUI layout issues with certain Windows7 configurations/themes, but i guess that's another issue.

In any case, since the old version still works, at least you got a fallback version at hand.

Bloody replies:

20.7.2015 (05:18)X

I've just tested 4 different DVDs and, as already expected, everything worked fine, under Wine as well as WinXP.

So it seems that Win7 is not really so compatible with older Windows versions. Perhaps you'd like to send an EMail to MS and ask them if they'd care to maintain a certain level of compatibility between Windows versions?

Unfortunately, i can't fix their bugs as i won't go and buy Windows7 because of the security issues involved, the enourmous malware infestation and, of course, because i don't want to give money to the corporate crime syndicate.

So all i can do is apologize and recommend to use v1.50b4 for now. :/

Bloody replies:

20.7.2015 (14:13)X

...or, to install VirtualBox, then install e.g. a Ubuntu VM, install Wine inside and then run SubRip in the VM under Wine - at least that should work more reliably... :D

I wonder if it was someday possible to install Wine on Windows? That would do the trick, haha.

Mei Xiao replies:

20.7.2015 (18:11)X


Wine on Windows ? What an idea !

I'll take your suggestion of Virtual Box on Win7 - Ubuntu VM - Wine - SubRip.

BTW, I evaluated pretty much all subtitle software I could find online, including the commercial ones. SubRip really leaves all of them in the dust.

Thanks again for the beautiful work !!

Mei Xiao replies:

20.7.2015 (20:34)X

Aha, turns out when I was testing with SubRip 1.55.0, the "+ TC & Res." option had been somehow checked off, which led to the odd behavior of producing a single file. Make sense, doesn't it ?

With this option checked, SubRip 1.55.0 does everything correctly exactly like SubRip 1.50 b4, with even a better performance on some aspects !

My bad, sorry !!

Bloody replies:

20.7.2015 (21:02)X

Same here, i still like SubRip's unique charm... although Subtitle Workshop also has it's moments...

What still puzzles me is why 1.50b4 works but 1.55.0 doesn't. Must have something to do with the Delphi version i'm using (v7), compared to whatever version was used to build the old versions 10 years ago. The problem is so trivial - a simple counter variable not being properly increased - it's embarrassing. Sorry if i couldn't help any further. SubRip is my first (and also my last) adventure in Delphi..

Bloody replies:

20.7.2015 (21:05)X

Ah, ok (now i've refreshed the guestbook in my browser and finally saw your latest post). Well, that's at least something. Now you can save yourself the trouble with the VM..

19.7.2015 (07:58) Reply # X
Mei Xiao

Bloody replies:

19.7.2015 (09:57)X

That's odd. When i try this, there are numerous BMPs generated. The files are numbered without leading zeros, though, and i'm converting from .sub, not .vob. But that shouldn't make any difference..

15.6.2015 (09:00) Reply # X

Bloody replies:

18.6.2015 (03:07)X


...and once you close the matrix file (or load another), or shutdown SubRip, that 'junk' char will be gone and no longer even have an index.

The only thing i can think of is to offer an option to leave any unknown glyph out (with nothing in the text), or alternatively, use a common, all-the-same tag a la "@AUTO-JUNK@" in the subtitle text for later RegExing, assuming that it would somehow mean less work for you than to simply hit <Ctrl.-T><Return> whenever such a char pops up in the "Add new character" window. I don't know which would mean less work, two kestrokes or manually verifying each such tag in the output text?

An implementation would be doable, but if you really want it badly, please let me know soon, as i'm working on a new update to be released within a few days from now. After that, it might be a while until another update...

29.5.2015 (16:51) Reply # X

1.5.2015 (21:32) Reply # X

Kishin replies:

29.5.2015 (16:48)X

check this:

Bloody replies:

13.6.2015 (20:47)X

You can also use ffmpeg to re-mux any video file into any output format, like:

ffmpeg -i infile.flv -c copy -map 0 outfile.avi

22.4.2015 (19:54) Reply # X

7.4.2015 (23:57) Reply # X

Bloody replies:

19.4.2015 (13:52)X

Now this looks like one of those "problem childs", i.e., DVD fonts in a quality so terrible that one might wonder why the DVD costs so much..

What i do in this case, i don't use the "next" button for this. I pick one of the largest parts of the Glyph (looking as unique as possible) and tell subrip that this is supposed to be a "U", while i enter nothing (empty text) for all the other fragments.

This often produces good quality results, but it doesn't always work perfectly.

When i'm done, i usually delete the matrix that was created by this subtitle as it serves no practical purpose in my matrix collection. I only keep matrix files for "sane" fonts while deleting the crap matrix files.

Make sure you don't put all and every font into just one large matrix because this just causes endless trouble in the long run.

Hope that helps.

