有没有字幕翻译开源程序？

14v45mJPBYJW8dT7

2023-10-22 21:41:48 +08:00

有字幕文件可以用第三方翻译 api

没有字幕文件的话，whipser+第三方翻译 api

qsmd42

2023-10-22 21:47:24 +08:00

youtube 不是自带就有翻译吗...

xinmans

2023-10-22 22:20:58 +08:00

@qsmd42 我是下载到本地的，所以如果没有默认的字幕文件而是实时翻译的，是没有办法一起下载的

xinmans

2023-10-22 22:21:13 +08:00

@rimutuyuan 第三方免费翻译 API 有哪些？ Google 翻译？

xinmans

2023-10-22 22:21:27 +08:00

@rimutuyuan 有英文字幕文件

xinmans

2023-10-22 22:21:39 +08:00

可以 ffmpeg 提取

xinmans

2023-10-22 23:54:51 +08:00

20231022 - CNN forensic analysis suggests what may have caused Gaza hospital bla.description'
'20231022 - CNN forensic analysis suggests what may have caused Gaza hospital bla.en.srt'
'20231022 - CNN forensic analysis suggests what may have caused Gaza hospital bla.mp4'
'20231022 - CNN forensic analysis suggests what may have caused Gaza hospital bla.webp'

字幕节选
176
00:04:23,696 --> 00:04:25,098
This is going to be a difficult,

177
00:04:25,098 --> 00:04:26,032
difficult fight.

178
00:04:26,032 --> 00:04:26,833
All right, Spider,

179
00:04:26,833 --> 00:04:27,967
thank you so much

180
00:04:27,967 --> 00:04:29,469
for explaining all this.

181
00:04:29,469 --> 00:04:31,471
Don't go far because obviously,

182
00:04:31,471 --> 00:04:32,805
you know, more things could be happening

183
00:04:32,805 --> 00:04:34,540
very, very soon. Thank you very much.

haha512

2023-10-22 23:58:44 +08:00

https://github.com/jianchang512/pyvideotrans
这个看看

stanishappy

2023-10-23 03:38:12 +08:00

Dual Subtitle 插件好像可以直接下载 YouTube 字幕的翻译版本，自动生成的字幕也可以。不过翻译质量嘛，只能说勉强能看

UKnowMe

2023-10-23 08:59:06 +08:00

同需求，目前我的解决方案是：
1. 对于随便看看的视频。和 #9 一样，使用 Dual Subtitle 插件： https://www.dual-subtitles.com/
2. 对于希望好好看的视频。下载下来 --> Whisper 转录英文字幕 --> 翻译成中文字幕 --> 导入字幕观看。其中翻译的软件使用 subtitle-translator-electron ，地址： https://github.com/gnehs/subtitle-translator-electron

目前方案 2 没找到很好的自动化翻译（也就是可以通过编码实现的），导致实际上体验不是很好。希望有知道的 @下我

daimiaopeng

2023-10-23 09:49:56 +08:00

有的，github 上，我记得是 pyqt 做的，星还很多

echoyangjx

2023-10-23 10:04:21 +08:00

https://github.com/raryelcostasouza/pyTranscriber

juicy

2023-10-23 10:19:42 +08:00

未开源的免费版： https://zimu.qijingdict.com/ ，应该能满足上面说的需求

xinmans

2023-10-23 14:35:20 +08:00

@haha512 谢谢，我只需要其中一部分自己包一下就能用了，多谢。

sp.py

from googletrans import Translator

text = f"{text.capitalize()}. " # .decode('utf-8').encode('gbk')
try:
#################### modify src='ja', dest="zh-cn" to define the source and target language ##########
## You can find all the possible language here:
# https://py-googletrans.readthedocs.io/en/latest/#googletrans-languages
# google 翻译
transd = translator.translate(text, src=video_config['source_language'],
dest=video_config['target_language']) # en zh-cn
result = transd.text # .decode('utf-8').encode('gbk')
except Exception as e:
print("Translate Error:", str(e))
continue

translator = Translator(service_urls=['translate.googleapis.com'])
r = sr.Recognizer()

# subtitles
if video_config['subtitle_out']=='双字幕':
combo_txt = text + '\n' + result + '\n\n'
else:
combo_txt = result + '\n\n'
if buffered:
# start_time += 2000
end_time -= 2000
start = timedelta(milliseconds=start_time)
end = timedelta(milliseconds=end_time)

index = len(subs) + 1
sub = srt.Subtitle(index=index, start=start, end=end, content=combo_txt)
subs.append(sub)

# whole_text += text
# whole_trans += result

final_srt = srt.compose(subs)
# todo 字幕合并
with open(sub_name, 'w', encoding="utf-8") as f:
f.write(final_srt)
qu.put(f" [get_large_audio_transcription] 生成字幕文件:final_str")
else:
qu.put(f"字幕文件已存在，直接使用 {sub_name=}")
updatebtn(mp4name, "开始合成字幕")

xinmans

2023-10-23 14:53:23 +08:00

借助 chatgpt ，完美实现我的需求。

pip install googletrans==3.1.0a0

要实现将 a.en.srt 文件中的英文翻译成中文，并生成 b.cn.srt 和 c.en_cn.srt 文件，可以使用`googletrans`库来进行翻译。以下是一个示例代码，可以帮助你完成这个任务：

```python
from googletrans import Translator

def translate_text(text, target_language):
translator = Translator()
translation = translator.translate(text, dest=target_language)
return translation.text

def translate_srt(input_file, output_file, target_language):
with open(input_file, 'r') as file:
lines = file.readlines()

translated_lines = []
for line in lines:
line = line.strip()
if line.isdigit() or '-->' in line:
translated_lines.append(line)
elif line:
translated_text = translate_text(line, target_language)
translated_lines.append(translated_text)

with open(output_file, 'w') as file:
file.write('\n'.join(translated_lines))

# 翻译英文到中文
input_file = 'a.en.srt'
output_file_cn = 'b.cn.srt'
translate_srt(input_file, output_file_cn, 'zh-cn')

# 生成双字幕文件
output_file_en_cn = 'c.en_cn.srt'
with open(input_file, 'r') as file:
lines = file.readlines()

double_subtitles = []
for i in range(len(lines)):
line = lines[i].strip()
if line.isdigit() or '-->' in line:
double_subtitles.append(line)
elif line:
translated_text = translate_text(line, 'zh-cn')
double_subtitles.append(line)
double_subtitles.append(translated_text)

with open(output_file_en_cn, 'w') as file:
file.write('\n'.join(double_subtitles))
```

请确保在运行代码之前已经安装了`googletrans`库（可以通过`pip install googletrans==4.0.0-rc1`进行安装）。

代码中的`translate_text`函数用于将文本翻译成目标语言，`translate_srt`函数用于翻译整个 SRT 文件，并将结果写入输出文件。首先，我们使用`translate_srt`函数将英文翻译成中文，并将结果写入`b.cn.srt`文件。接下来，我们生成双字幕文件`c.en_cn.srt`，其中包含原始英文和翻译后的中文。

请注意，由于`googletrans`库依赖于 Google Translate 的 API ，翻译的准确性和可用性可能会受到限制。此外，使用自动翻译工具翻译长篇文本时，可能需要考虑分段和限制翻译频率，以避免超过 API 的使用限制。

c.en_cn.srt
1
00:00:00,041 --> 00:00:00,667
A spokesman for the
发言人
2
00:00:00,667 --> 00:00:02,127
Israeli Defense Forces.
以色列国防军。
3
00:00:02,127 --> 00:00:03,336
The top spokesperson
最高发言人
4
00:00:03,336 --> 00:00:04,295
says that it will start
说它将开始