或许可以找出有逻辑关联的单词

2012-10-26 16:39:22 +08:00
 tioover
背英语很痛苦……

突然想如果将有逻辑关系的单词放在一起会不会就能形成一片网,能记得牢牢的

又想起以前的那个[人立方]( http://renlifang.msra.cn/ )

然后想实现,先获得很多很多很多的英文文本,然后取出单词,记录一个单词附近若干个的单词,如果一个单词附近经常出现某个单词,则认为这两个单词有逻辑关系

当然要把一些常用的词给剔除,比如介词连词,be动词,还有诸如good之类的

根据出现的频率加权将有逻辑关系的单词弄成一张图结构,就类似人立方

==================

还有想根据字符的位置将那些相似的词、同词根的词、合成词提取出来
3595 次点击
所在节点    奇思妙想
7 条回复
taine
2012-10-26 17:37:59 +08:00
背英语很痛苦……,说明你最基础的词汇量还没积累好,加油
chemhack
2012-10-26 17:56:22 +08:00
wordnet啊
darkgt
2012-10-26 19:39:24 +08:00
偶在做个东西:把我看过的美剧,提取含GT单词的视频片段,这样就有了例句,有了视频加深印象。而且你喜欢什么电影就拿什么电影来做,喜欢什么美剧就拿什么美剧来做。反正现在自己用效果挺好的。
原理巨简单:找个字幕,英文切词,stemming,然后ffmpeg一转。
希望早日有人山寨。
ostrichmyself
2012-10-29 14:13:53 +08:00
跟我的想法很类似...

Mark
best1a
2012-10-30 16:31:45 +08:00
写了一个提取词语的,针对某一篇文章
http://www.bbc.co.uk/news/business-19661899

结果:
also : marking
although : many
anant : agarwal
answering : whether
anyone : learn
battle : growing, higher, open, global, doubt, numbers
beat : system
bought : being
chancellor : growing, doubt, martin, global, open, higher, vice, numbers, battle
commencing : humanities
complete : degree
conversations : difficult
currently : studying
depends : marking
dissemination : research
distance : institutions
doubt : open, numbers, higher
engineering : research, while
front : computer, sitting
global : story, doubt, continue, reading, open, numbers, higher
growing : higher, main, numbers, reading, story, continue, open, doubt
ideas : years
individual : open
institutions : research
iris : recognition
irony : computer
issue : less
issues : challenging
large : open
latter : former
lecture : traditional
level : degree
lift : rising, agarwal
literature : english
made : most
main : global, quote
majority : degree, level, quite
martin : vice, numbers, global, higher, battle, growing, doubt, open
maths : research, while
numbers : higher, open, large
open : higher
paper : computer
peer : grade
professor : world
quite : degree, level
quote : reading, continue
research : while, social
rising : agarwal
same : traditional
saudi : arabia
schools : most
sciences : while, research
shaping : global, numbers, open, martin, doubt, growing, chancellor, vice, higher, battle
sitting : computer
social : while
someone : latest
story : quote
subjects : questions
taking : exams
testing : traditional
through : halfway
vice : doubt, global, open, numbers, growing, battle, higher
code : cheating
engineering : social
growing : global
main : continue, story, reading
maths : engineering, social
reading : continue
sciences : maths, social, engineering
story : continue, reading
taylor : prof

感觉要过滤掉很多常见词啊,结果各种乱
如果针对大量文章的不知用什么方法弄...
best1a
2012-10-30 20:01:46 +08:00
thedevil7
2012-11-02 17:30:04 +08:00
背单词确实很痛苦.. 同感...

@taine 特别是某些考试的词汇.... 比如 GRE ...

正在与之搏斗中..

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/50969

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX