Active Users:339 Time:28/04/2024 10:42:28 PM
Re: idea Tor Send a noteboard - 19/11/2012 10:54:00 PM
Average use of specific words on any given page such as
and, or, in, on, with, etc...


It's a little tricky to do things on a per page basis, I think. At least what I do to load the text of a book is use a conversion tool to turn an ebook into plain text, and then load that into python. The conversion discards the information about which page a piece of text is on.

It's possible to look at other things, though, like the number of 'and' divided by the total number of words in the whole book. For example for Winter's Heart, this number is 0.01853, which means that 18.5 out of every 1000 words are 'and'.

Looking at WH, COT, KOD and TGS, which are the ones I have handy at the moment, the corresponding numbers for the words you asked about are:

and
WH: 0.01853
COT: 0.01939
KOD: 0.01884
TGS: 0.01424

or
WH: 0.002438
COT: 0.003187
KOD: 0.002607
TGS: 0.001737

in
WH: 0.01043
COT: 0.01090
KOD: 0.01004
TGS: 0.009739

on
WH: 0.005582
COT: 0.005279
KOD: 0.005102
TGS: 0.004138

with
WH: 0.006379
COT: 0.006279
KOD: 0.006492
TGS: 0.006189

So, it seems Sanderson is in general a bit more stingy with these words. Interesting, I guess.
Fram kamerater!
Reply to message
Natural language processing - 18/11/2012 03:47:47 PM 701 Views
A few ideas - 19/11/2012 07:11:43 PM 312 Views
idea - 19/11/2012 07:43:36 PM 272 Views
Re: idea - 19/11/2012 10:54:00 PM 363 Views

Reply to Message